Panic Mode On (114) Server Problems?

Message boards : Number crunching : Panic Mode On (114) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 45 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1966390 - Posted: 21 Nov 2018, 7:44:56 UTC

If the servers follow trend, the RTS has to fall to almost zero before it triggers the splitters into action. I have been getting a few tasks but the majority are in download stall backoffs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1966390 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1966392 - Posted: 21 Nov 2018, 7:54:29 UTC - in response to Message 1966390.  
Last modified: 21 Nov 2018, 8:02:25 UTC

I have been getting a few tasks but the majority are in download stall backoffs.

Majority?
I think I've been able to get 2 WUs to download on one system, not a single WU on the other. Tried both download server IP addresses, but it's just not happening.
Things are seriously borked.


Edit- looks like some signs of life from the splitters. Now if only the download servers would come to life as well.
Grant
Darwin NT
ID: 1966392 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1966396 - Posted: 21 Nov 2018, 8:25:37 UTC - in response to Message 1966392.  

Tasks seem to be rolling in at somewhat of a reasonable rate now.
ID: 1966396 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66210
Credit: 55,293,173
RAC: 49
United States
Message 1966397 - Posted: 21 Nov 2018, 8:29:16 UTC

I'm up to date, all that I needed are downloaded, I reported 112, now they're all on My PC.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 1966397 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1966398 - Posted: 21 Nov 2018, 8:31:19 UTC - in response to Message 1966396.  

Tasks seem to be rolling in at somewhat of a reasonable rate now.

Thanks for the heads up.
I hit "Retry pending transfers" and they started moving at 300kB/s+.

Of course now the Ready-to-send buffer is empty, and the life the splitters were showing earlier appears to have petered out.
By tomorrow it'll hopefully have sorted itself out.
Grant
Darwin NT
ID: 1966398 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1966399 - Posted: 21 Nov 2018, 8:32:12 UTC - in response to Message 1966392.  

I just managed to clear all my stalled downloads so now they will be able to ask for work. Oh, well, nothing from the servers is all they get now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1966399 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1966402 - Posted: 21 Nov 2018, 9:01:53 UTC
Last modified: 21 Nov 2018, 9:05:58 UTC

Eeeew....my living room was cold this morning ;-)
Was able to clear stalled downloads and got three batches of tasks around 7 UTC, to almost full cache. Now back to No Tasks Available.
That means baby-sitting for a while, while BOINC is in back-off mode.
Edit : Splitters must be moving again, just got some new tasks, not resends.
Humans may rule the world...but bacteria run it...
ID: 1966402 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1966412 - Posted: 21 Nov 2018, 10:51:35 UTC - in response to Message 1966387.  

I just got a couple of good hits.

It's only taken 4 hours since the end of the outage.
And the splitters haven't fired up to replace the work that has been sent out.

Still getting "Project has no tasks available" for me.

Edit- and then 2 minutes later- work gets allocated.
Now if only they would download- says Download active, but nothing is actually happening.

. . Yep. that was what I was getting then as well, so I stopped trying and went out :)

. . Now I'm back and some work is flowing out. Time to restart the empty machines and hope.

Stephen

:)
ID: 1966412 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1966444 - Posted: 21 Nov 2018, 16:05:03 UTC

Err, would someone PLEASE "hang a tape" (or whatever the modern equivalent is...) on Beta? There has been no work available there since Sunday, Nov 18 - and it is now Wednesday., Nov 21!...
.

Hello, from Albany, CA!...
ID: 1966444 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1966479 - Posted: 21 Nov 2018, 20:10:05 UTC

Looks like the servers have started clearing the Validator backlog; although the splitters continue to struggle.
Good thing there's a fair amount of AP & Arecibo VLAR work about to help keep the return rate down.
Grant
Darwin NT
ID: 1966479 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1966484 - Posted: 21 Nov 2018, 20:35:04 UTC - in response to Message 1966479.  

Looks like the servers have started clearing the Validator backlog; although the splitters continue to struggle.
Good thing there's a fair amount of AP & Arecibo VLAR work about to help keep the return rate down.


I agree. While I'd like to see the RTS queue full, it is probably more important to keep making progress on the validation, assimilation and purging. Things are looking good, or at least much better than Tuesday right before the outage. The results out in the field is pretty close to what it was before, so hopefully everyone has full caches of WUs to work on.
ID: 1966484 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1966807 - Posted: 23 Nov 2018, 21:08:38 UTC

As I write I am guessing there is a noise of bomb storm? Reason returned result per hour is over 139,000
ID: 1966807 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1966815 - Posted: 23 Nov 2018, 22:07:36 UTC - in response to Message 1966807.  
Last modified: 23 Nov 2018, 22:09:20 UTC

As I write I am guessing there is a noise of bomb storm? Reason returned result per hour is over 139,000


. . When we had that serious noise bomb storm a little while ago the 'returned last hour' figure leapt up to around the 200,000 mark. 139,000 simply indicates that there are currently no more Arecibo VLAR tasks (very slow) being distributed. When the next Arecibo tape gets mounted or the current one gets 'unstuck' that will drop again.

Stephen

:)
ID: 1966815 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1966816 - Posted: 23 Nov 2018, 22:40:22 UTC - in response to Message 1966815.  

As I write I am guessing there is a noise of bomb storm? Reason returned result per hour is over 139,000


. . When we had that serious noise bomb storm a little while ago the 'returned last hour' figure leapt up to around the 200,000 mark. 139,000 simply indicates that there are currently no more Arecibo VLAR tasks (very slow) being distributed. When the next Arecibo tape gets mounted or the current one gets 'unstuck' that will drop again.

Stephen

:)

Fair point. When the more work is being returned per hour it means we are processing faster which is good considering we have so much information to get through.

It is nice to see the administration queue is under half a million
ID: 1966816 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1966818 - Posted: 24 Nov 2018, 0:00:56 UTC - in response to Message 1966816.  

. . When we had that serious noise bomb storm a little while ago the 'returned last hour' figure leapt up to around the 200,000 mark. 139,000 simply indicates that there are currently no more Arecibo VLAR tasks (very slow) being distributed. When the next Arecibo tape gets mounted or the current one gets 'unstuck' that will drop again.
Stephen

Fair point. When the more work is being returned per hour it means we are processing faster which is good considering we have so much information to get through.

It is nice to see the administration queue is under half a million


. . The assimilation queue is still a little over the half mil mark but getting close, much better than when it was nudging 1.5 Mil. The validation queue still over 5 mil but dropping as fast as the assimilation queue, down from about 6.3 mil. The two purge queues are both up by almost the same amounts so they are not quite coping with the extra load they getting with the faster performance of the preceding queues.

Stephen

<shrug>
ID: 1966818 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1967018 - Posted: 25 Nov 2018, 17:32:44 UTC

The system is looking good. It worked through all the validation and assimilation backlog. We have enough files to get us through to Monday as long as there aren't any noise bombs. No panic at the moment, just wishing for some Aricebo data.
ID: 1967018 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1967019 - Posted: 25 Nov 2018, 17:45:14 UTC - in response to Message 1967018.  

The system is looking good. It worked through all the validation and assimilation backlog. We have enough files to get us through to Monday as long as there aren't any noise bombs. No panic at the moment, just wishing for some Aricebo data.
Maybe they'll be relaxed enough on Tuesday to un-stick that 14no18aa tape, which has been stuck since around, well, 14 November.
ID: 1967019 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1967071 - Posted: 25 Nov 2018, 20:48:17 UTC

The website is either running a cron job currently or it is buckling under the load of so many tasks in the database. Unable to view any of my hosts to see status of any task category.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1967071 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22447
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1967078 - Posted: 25 Nov 2018, 21:00:57 UTC

Keith - I just randomly clicked on one of your computers (ID: 6279633 ) and noticed that it has dropped 31 "error while computing" in the last few hours :-(

First error task on the list gave this output:

Task 7177287944
Name 	blc06_2bit_guppi_58406_28274_HIP20533_0107.10298.818.22.45.67.vlar_0
Workunit 	3233271393
Created 	25 Nov 2018, 13:24:37 UTC
Sent 	25 Nov 2018, 17:29:12 UTC
Report deadline 	17 Jan 2019, 22:28:54 UTC
Received 	25 Nov 2018, 19:33:13 UTC
Server state 	Over
Outcome 	Computation error
Client state 	Compute error
Exit status 	193 (0x000000C1) EXIT_SIGNAL
Computer ID 	6279633
Run time 	1 sec
CPU time 	
Validate state 	Invalid
Credit 	0.00
Device peak FLOPS 	6,852.48 GFLOPS
Application version 	SETI@home v8
Anonymous platform (NVIDIA GPU)
Peak disk usage 	0.02 MB
Stderr output

<core_client_version>7.4.44</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
setiathome_CUDA: Found 3 CUDA device(s):
  Device 1: GeForce GTX 1070 Ti, 8116 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 19 
     pciBusID = 10, pciSlotID = 0
  Device 2: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 15 
     pciBusID = 8, pciSlotID = 0
  Device 3: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 15 
     pciBusID = 11, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GTX 1070 Ti is okay
SETI@home using CUDA accelerated device GeForce GTX 1070 Ti
Using pfb = 32 from command line args
Unroll autotune 19. Overriding Pulse find periods per launch. Parameter -pfp set to 19
corrupted size vs. prev_size
SIGABRT: abort called
Stack trace (31 frames):
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x85b6c0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f81fbe16890]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f81fad06e97]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f81fad08801]
/lib/x86_64-linux-gnu/libc.so.6(+0x89897)[0x7f81fad51897]
/lib/x86_64-linux-gnu/libc.so.6(+0x9090a)[0x7f81fad5890a]
/lib/x86_64-linux-gnu/libc.so.6(+0x90b0c)[0x7f81fad58b0c]
/lib/x86_64-linux-gnu/libc.so.6(+0x947d8)[0x7f81fad5c7d8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x27d)[0x7f81fad5f2ed]
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x83073c)[0x7f81f0c0c73c]
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x827b9a)[0x7f81f0c03b9a]
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x13c49f)[0x7f81f051849f]
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(__cuda_CallJitEntryPoint+0x10ac)[0x7f81f04b2d1c]
/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.410.73(elfLink_Finish+0x25)[0x7f81f8986f95]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1e18a4)[0x7f81f8da78a4]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1308be)[0x7f81f8cf68be]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x131055)[0x7f81f8cf7055]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82b73a]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x81eef0]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82a56b]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82ed9f]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82f50a]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x822d9c]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x80721e]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x841d5c]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x40e49b]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x41c7c5]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x426525]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x4080d8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f81face9b97]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x408f89]

Exiting...

</stderr_txt>
]]>

Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1967078 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1967088 - Posted: 25 Nov 2018, 21:22:54 UTC

Thanks Rob, that is the one I have been trying to monitor mainly. It is doing that a lot in the last week. Still can't get any of the tasks to show though. Not sure how you were able to pull them up. Thanks for the heads up.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1967088 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 45 · Next

Message boards : Number crunching : Panic Mode On (114) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.