The Server Issues / Outages Thread - Panic Mode On! (117)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 24 · 25 · 26 · 27 · 28 · 29 · 30 . . . 52 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2016633 - Posted: 25 Oct 2019, 21:38:02 UTC - in response to Message 2016629.  
Last modified: 25 Oct 2019, 21:39:49 UTC

Could it be the recent deadline expiry of the unfinished WOW! tasks? More workunits than usual in the database with three, four, ... task replications?

Edit - the validation backlog is currently 69. It's the "waiting for wingmate" backlog which is 4,453,648.
ID: 2016633 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13918
Credit: 208,696,464
RAC: 304
Australia
Message 2016639 - Posted: 25 Oct 2019, 22:02:51 UTC - in response to Message 2016633.  
Last modified: 25 Oct 2019, 22:41:37 UTC

It's the "waiting for wingmate" backlog which is 4,453,648.
But are we actually waiting on the wingman, or waiting for the Validators to do their work?
I noticed last night when looking at another user's system after they made some changes- all of their returned work over a period of an hour was going in to Pending. All of it. Not a one became Valid or Inconclusive (they had a roughly 3/4 of a day's cache).

So i vote for "Waiting on the Validators to do their work" option. As mentioned before, as much as we are concerned about WUs that take forever to Validate the huge majority do so within a few days. This present issue occurred with a sudden jump just on 3 days ago, and has continued to grow since then.
The usual value is around 3.7 million. You get the occasional spike, but this isn't a spike- it's a protracted period well above the usual number of Results waiting to be validated.
The Haveland graphs show this very clearly (along with the corresponding increase in WU awaiting assimilation (you can't assimilate what has been returned, but hasn't yet been validated)).

I would have thought any after effects of the WoW event would be a bit by bit increase in Awaiting validation numbers as people loaded up on work that wasn't returned, with a noticeable drop each day as WUs then time out & are re-released, Validated & cleared. That certainly isn't happening here.



Edit-
Actually the WU awaiting assimilation numbers are probably the best indicator there is an issue with the Validators.
Normally the Awaiting Validation numbers are the "Waiting on wingman" numbers- the matching result hasn't been returned yet, so the WU can't be validated, and the WU waiting assimilation number is 0.

But now we have work that has been returned by all parities involved, but it's hasn't been validated yet- hence the higher than usual Awaiting Validation & WU awaiting assimilation numbers.
Grant
Darwin NT
ID: 2016639 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2016649 - Posted: 25 Oct 2019, 22:24:06 UTC

The system does seem "sluggish". It seems like it is always in that "recovery after an outage" state. I think there is an issue. It is like a hard drive going bad, slowing down the system with retries and errors, but hasn't quite crashed yet.
ID: 2016649 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13918
Credit: 208,696,464
RAC: 304
Australia
Message 2016771 - Posted: 26 Oct 2019, 22:15:38 UTC

Awaiting Validation backlog has started to clear (and along with it he WU awaiting assimilation)- dropped by almost half overnight.
Grant
Darwin NT
ID: 2016771 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1646
Credit: 12,921,799
RAC: 89
New Zealand
Message 2016788 - Posted: 27 Oct 2019, 0:20:28 UTC

I am aware this requires manual intervention from users. When you receive work I generally look through my task list and see if I have any tasks ending with _2 or above if I do I suspend all work apart from the task currently running and push _2 and above tasks to the front returning them faster in the hope that they will validate and drop the validation/administration backlog
ID: 2016788 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13918
Credit: 208,696,464
RAC: 304
Australia
Message 2016791 - Posted: 27 Oct 2019, 0:27:20 UTC - in response to Message 2016788.  
Last modified: 27 Oct 2019, 0:29:33 UTC

I am aware this requires manual intervention from users. When you receive work I generally look through my task list and see if I have any tasks ending with _2 or above if I do I suspend all work apart from the task currently running and push _2 and above tasks to the front returning them faster in the hope that they will validate and drop the validation/administration backlog
This has nothing to do with that.
This is work that has been returned, but due to server issues hasn't yet been Validated.


Interestingly, as the MB Awaiting validation backlog continues to clear, the AP Awaiting validation has started to backup.
Grant
Darwin NT
ID: 2016791 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1646
Credit: 12,921,799
RAC: 89
New Zealand
Message 2016799 - Posted: 27 Oct 2019, 1:02:24 UTC - in response to Message 2016791.  
Last modified: 27 Oct 2019, 1:03:29 UTC

I will have to take your word on that one Grant. I have returned 2 results with _2 and as soon as I have returned them they have turned valid
ID: 2016799 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017151 - Posted: 29 Oct 2019, 18:17:48 UTC

no outage today? or did i miss it?
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017151 · Report as offensive
Holdolin

Send message
Joined: 10 Apr 19
Posts: 68
Credit: 88,777,750
RAC: 30
United States
Message 2017153 - Posted: 29 Oct 2019, 18:26:45 UTC

Didn't notice one here either. I only notice cause I tend to use the downtime to update, clean and inspect my systems. Ah well, 1 week without won't hurt.
ID: 2017153 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2017156 - Posted: 29 Oct 2019, 18:40:51 UTC

There is alot going on in the Berkeley area right now. Between fires, smoke, power outages, I'm just happy the system is up and working so well. I hope everyone on the seti team and their loved ones are safe.
ID: 2017156 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2017168 - Posted: 29 Oct 2019, 21:54:00 UTC - in response to Message 2017156.  

There is alot going on in the Berkeley area right now. Between fires, smoke, power outages, I'm just happy the system is up and working so well. I hope everyone on the seti team and their loved ones are safe.


. . Certainly that ... but can the system keep working properly without an outage?

. . Where's the KaBoom?

Stephen

???
ID: 2017168 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 2017170 - Posted: 29 Oct 2019, 22:41:32 UTC - in response to Message 2017168.  

There is alot going on in the Berkeley area right now. Between fires, smoke, power outages, I'm just happy the system is up and working so well. I hope everyone on the seti team and their loved ones are safe.


. . Certainly that ... but can the system keep working properly without an outage?

. . Where's the KaBoom?

Stephen

???


Its coming.... on Sunday morning, of course!




ROFLing.... (and under the table).

Tom
A proud member of the OFA (Old Farts Association).
ID: 2017170 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2017172 - Posted: 29 Oct 2019, 23:03:39 UTC - in response to Message 2017168.  

ID: 2017172 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2017174 - Posted: 29 Oct 2019, 23:48:43 UTC - in response to Message 2017170.  

. . Certainly that ... but can the system keep working properly without an outage?
. . Where's the KaBoom?
Stephen

Its coming.... on Sunday morning, of course!
ROFLing.... (and under the table).
Tom

. . Nah! Eric fixed that one ...

{arms behind back with fingers crossed on both hands}

Stephen

:)
ID: 2017174 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 2017187 - Posted: 30 Oct 2019, 7:14:14 UTC

Nice short outage, good recovery. Just happened 11 hrs or so later than normal ...
ID: 2017187 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13918
Credit: 208,696,464
RAC: 304
Australia
Message 2017190 - Posted: 30 Oct 2019, 7:59:17 UTC - in response to Message 2017187.  

Nice short outage, good recovery.
Well, the splitters are still having problems getting up to speed again...
Grant
Darwin NT
ID: 2017190 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 2017204 - Posted: 30 Oct 2019, 11:31:59 UTC

Twice in the last couple of days I have noticed that one of my Seti@home machines had a pretty good back log of "ready to report" tasks although they appear to have already uploaded.

So I manually kicked off an update. And down comes a bunch of tasks.
Since I am setup to poll once every 5 minutes or so, this is worrying....

Is this my problem or the servers problem?

Tom
A proud member of the OFA (Old Farts Association).
ID: 2017204 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2017205 - Posted: 30 Oct 2019, 11:40:13 UTC - in response to Message 2017204.  

It was probably SETI's maintenance outage (during which, you can't report tasks), coupled with BOINC's automatic extending backoffs.

You can't set a 5-minute repeat update: the server tells you to delay that long, as a minimum. You did the right thing, with your single mouse click.
ID: 2017205 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 2017208 - Posted: 30 Oct 2019, 12:20:24 UTC - in response to Message 2017205.  

It was probably SETI's maintenance outage (during which, you can't report tasks), coupled with BOINC's automatic extending backoffs.

You can't set a 5-minute repeat update: the server tells you to delay that long, as a minimum. You did the right thing, with your single mouse click.


Sorry I misspoke. I have the "store up additional work" preference set to 0.1 which usually results in a 5 minute update cycle.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2017208 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13918
Credit: 208,696,464
RAC: 304
Australia
Message 2017321 - Posted: 31 Oct 2019, 5:43:59 UTC

Looks like the splitters have finally recovered form the weekly outage.
Grant
Darwin NT
ID: 2017321 · Report as offensive
Previous · 1 . . . 24 · 25 · 26 · 27 · 28 · 29 · 30 . . . 52 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.