The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 69 · 70 · 71 · 72 · 73 · 74 · 75 . . . 94 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13985
Credit: 208,696,464
RAC: 304
Australia
Message 2030382 - Posted: 1 Feb 2020, 21:58:12 UTC
Last modified: 1 Feb 2020, 22:14:36 UTC

10 hours since either of my systems received any work.
While there were probably many resends available during that time, due to the increasing backoffs after the first couple of unsuccessful requests for work the systems just stop asking till those backoffs timeout, or someone's there to hit Update.


In that 10 hours the "Result files waiting for deletion" backlogs cleared out, and it looks like there were 2 bursts of splitter activity- a couple of spikes in the In progress numbers, which caused spikes in the backlog levels.
"Workunits waiting for assimilation" dropped 250k (another 4 million to go).
"Results returned and awaiting validation" dropped by 1.4 million, before increasing slightly again (by a few 100k) (Another 6-8 million to go).
Grant
Darwin NT
ID: 2030382 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2030386 - Posted: 1 Feb 2020, 22:22:25 UTC

Running empty again. Shooting down the host to save some electric power.
ID: 2030386 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030394 - Posted: 1 Feb 2020, 22:59:41 UTC

Turned off the dedicated Seti host yesterday. Ran out of GPUGrid work and now nothing but Milkyway and Einstein tasks to heat up the house. I would hope that they can get the backlogs cleared this time before another Tuesday rolls around. So far I have not seen sufficient progress to meet that deadline.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030394 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13985
Credit: 208,696,464
RAC: 304
Australia
Message 2030398 - Posted: 1 Feb 2020, 23:10:16 UTC - in response to Message 2030386.  

Running empty again. Shooting down the host to save some electric power.
When not crunching my systems don't use much power.
I'll keep them up on the extremely slight possibility of getting some resends. The sooner the resends are cleared out, the sooner the backlogs will be gone & we can start processing new work again.
Until the backlogs are cleared then every time new work is released, things go backwards.
Grant
Darwin NT
ID: 2030398 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3866
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2030399 - Posted: 1 Feb 2020, 23:11:15 UTC - in response to Message 2030394.  
Last modified: 1 Feb 2020, 23:14:24 UTC

I would hope that they can get the backlogs cleared this time before another Tuesday rolls around. So far I have not seen sufficient progress to meet that deadline.


I don't know what caused me to forget about this for so long as a strong contender for the root cause of this, as it certainly was two weeks ago at the start of this debacle. So I checked in my hosts, and in the first host I found, the first validated work unit I looked at:

8496478459 	8781094 	31 Jan 2020, 11:03:11 UTC 	1 Feb 2020, 1:31:12 UTC 	Completed and validated 	15.40 	10.00 	2.43 	SETI@home v8 v8.22 (opencl_nvidia_SoG)
windows_intelx86
8498701021 	8723394 	1 Feb 2020, 9:13:15 UTC 	1 Feb 2020, 9:18:21 UTC 	Completed and validated 	24.59 	22.55 	2.43 	SETI@home v8 v8.05
windows_x86_64
8499068304 	8508092 	1 Feb 2020, 12:08:09 UTC 	1 Feb 2020, 15:57:46 UTC 	Completed and validated 	9.05 	7.51 	2.43 	SETI@home v8
Anonymous platform (NVIDIA GPU)


It looks like 3-quorum validation for -9 overflows is still on to prevent AMD RX cross-validation. So when we have a "shorty storm" of all overflow work, the resends, work units in the field and awaiting validation are going to go nuts.
ID: 2030399 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 2030404 - Posted: 1 Feb 2020, 23:25:39 UTC - in response to Message 2030213.  

Yeah Matt is really missed in times like these and he would've had those MBv7's put to bed long ago.

Cheers.

Wait, I've been out of the loop for a while. Matt left?
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 2030404 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38450
Credit: 261,360,520
RAC: 489
Australia
Message 2030405 - Posted: 1 Feb 2020, 23:36:19 UTC - in response to Message 2030404.  

Yeah Matt is really missed in times like these and he would've had those MBv7's put to bed long ago.
Wait, I've been out of the loop for a while. Matt left?
Yeah Matt has been over at the Breakthrough Listen project for a couple of years now.

Cheers.
ID: 2030405 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 2030406 - Posted: 1 Feb 2020, 23:37:21 UTC

they really should pull all the BLC 35 for the time being. they are very noisy and that is not good given the current situation with all the server problems.
ID: 2030406 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030407 - Posted: 1 Feb 2020, 23:40:04 UTC - in response to Message 2030405.  

Yeah Matt is really missed in times like these and he would've had those MBv7's put to bed long ago.
Wait, I've been out of the loop for a while. Matt left?
Yeah Matt has been over at the Breakthrough Listen project for a couple of years now.
Or maybe just at the Breakthrough Listen office down on Campus - that's where I found him back in July.

ID: 2030407 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38450
Credit: 261,360,520
RAC: 489
Australia
Message 2030408 - Posted: 1 Feb 2020, 23:41:01 UTC - in response to Message 2030406.  

they really should pull all the BLC 35 for the time being. they are very noisy and that is not good given the current situation with all the server problems.
We're on the last 7 of those files now at least.

Cheers.
ID: 2030408 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38450
Credit: 261,360,520
RAC: 489
Australia
Message 2030409 - Posted: 1 Feb 2020, 23:43:55 UTC - in response to Message 2030407.  

Yeah Matt is really missed in times like these and he would've had those MBv7's put to bed long ago.
Wait, I've been out of the loop for a while. Matt left?
Yeah Matt has been over at the Breakthrough Listen project for a couple of years now.
Or maybe just at the Breakthrough Listen office down on Campus - that's where I found him back in July.

https://i.imgur.com/Lninw9X.jpg
And now and again out at Parkes.

Cheers.
ID: 2030409 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13985
Credit: 208,696,464
RAC: 304
Australia
Message 2030410 - Posted: 1 Feb 2020, 23:52:04 UTC

A word of warning- if you do manage to sore some work, be prepared to have to Retry pending transfers a few 100 (it feels like a thousand) times.
The download servers are borked as well as everything else.
Grant
Darwin NT
ID: 2030410 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13985
Credit: 208,696,464
RAC: 304
Australia
Message 2030412 - Posted: 1 Feb 2020, 23:58:35 UTC - in response to Message 2030411.  

Splitters are splitting, and work is available.

Results ready to send: 327,418
Great. So the backlog that has been slowly clearing will now build up again.
1 step forward, 3 steps backwards.
Grant
Darwin NT
ID: 2030412 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38450
Credit: 261,360,520
RAC: 489
Australia
Message 2030413 - Posted: 1 Feb 2020, 23:59:28 UTC

Well my 3570K rig is now idling while my 2500K loads up.

Cheers.
ID: 2030413 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030417 - Posted: 2 Feb 2020, 0:21:07 UTC

Wow, there actually has been progress in reducing Results returned and awaiting validation from whence I looked last. Down about 600K from the previous snapshot from a several hours ago.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030417 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13985
Credit: 208,696,464
RAC: 304
Australia
Message 2030418 - Posted: 2 Feb 2020, 0:24:04 UTC - in response to Message 2030417.  

Wow, there actually has been progress in reducing Results returned and awaiting validation from whence I looked last. Down about 600K from the previous snapshot from a several hours ago.
The result of no new work for an extended period.
Now that new work is going out, say goodbye to that progress.
Grant
Darwin NT
ID: 2030418 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13985
Credit: 208,696,464
RAC: 304
Australia
Message 2030419 - Posted: 2 Feb 2020, 0:33:55 UTC
Last modified: 2 Feb 2020, 0:47:45 UTC

Well, it's gone from downloads timing out almost instantly, to sticky downloads.
5min+ of Elapsed time counting away, and not so much as a bit transferred on this latest batch. And the usual fix for this problem (Suspend & then re-enable network activity) isn't having any effect.

Even the uploads are having more issues than usual with the instant timeouts.


For Eric's to do list.
Once they get the database issues sorted out, then it's time to work on the download server & upload server issues.

Edit 10+ minutes, and nothing transferred.

12 min for things to finally start, 5 more minutes of suspending & enabling network access to get them all downloaded.
Grant
Darwin NT
ID: 2030419 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2030421 - Posted: 2 Feb 2020, 0:47:48 UTC - in response to Message 2030349.  

Ok, I've had it. Enough is enough.....



. . Yep it does get that way ...

. . I'm taking the time to try and rebuild/update my Core2 Duo rig.

. . Later versions of Linux and BOINC and special sauce. It's going to be a headache because the existing version is a repository BOINC in the system part of the disk. So moving it so a nice quicky SSD in the user part of the new drive will be a headache, but if I follow the Juan method I should be OK.

. . So in due course; the fullness of time; further down the track; sometime in the distant future ... I should be able to return this unit to SETI when there is more work being sent out and with a neater, slightly quicker system. This optimism is unlike me but what the heck ...

Stephen

<shrug>
ID: 2030421 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2030423 - Posted: 2 Feb 2020, 0:50:46 UTC - in response to Message 2030386.  

Running empty again. Shooting down the host to save some electric power.


. . Following Grumpy's example then ? :)

Stephen
ID: 2030423 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2030424 - Posted: 2 Feb 2020, 0:53:01 UTC - in response to Message 2030423.  

Running empty again. Shooting down the host to save some electric power.


. . Following Grumpy's example then ? :)

Stephen

It seems a bit extreme unless you wanted to build a new system anyway. ;)
ID: 2030424 · Report as offensive
Previous · 1 . . . 69 · 70 · 71 · 72 · 73 · 74 · 75 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.