The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 107 · Next

AuthorMessage
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2036340 - Posted: 6 Mar 2020, 21:07:28 UTC - in response to Message 2036337.  

And I did send word to Eric about the servers being tied in a knot.
Whether there is much he can do about it is an open question at this point.
Make use of the Resend Deadline feature- set the deadline for resends to 3 days. Set the deadline for any new work (AP included) to 2 weeks.
The short deadline on Resends will clear out the ever increasing massive backlog (although i'm guessing it will take a week or so to have a significant impact). The 2 week deadline on all initial release work will stop the backlog from re-occuing in the short time the project is stll going to be issuing new work.


At this point, reduce it to 10 days. We are at the finish line.


We could reduce it to 5 in another week.
ID: 2036340 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2036342 - Posted: 6 Mar 2020, 21:15:58 UTC - in response to Message 2036340.  

It's time to crunch or shut down.
ID: 2036342 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2036344 - Posted: 6 Mar 2020, 21:36:50 UTC

Changing the deadlines has no effect whatsoever to the assimilation blockage.
ID: 2036344 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 2036359 - Posted: 6 Mar 2020, 22:19:32 UTC

Managed to pickup a couple of WUs, but downloads presently not happening without a lot of Retry pending transfers action.
Grant
Darwin NT
ID: 2036359 · Report as offensive     Reply Quote
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51583
Credit: 1,018,363,574
RAC: 1,004
United States
Message 2036361 - Posted: 6 Mar 2020, 22:24:56 UTC
Last modified: 6 Mar 2020, 22:26:18 UTC

Current result creation rate * 403.7581/sec
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 2036361 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2036363 - Posted: 6 Mar 2020, 22:34:11 UTC

I was able to persuade a task to download one at at time. Think the download server is unable to support all the http requests. Might be smart to reduce the number of connections back to the default of 2.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2036363 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 2036365 - Posted: 6 Mar 2020, 22:43:41 UTC - in response to Message 2036361.  

Current result creation rate * 403.7581/sec
Now the Scheduler needs to start dishing them out.
Most requests result in "Project has no tasks available". And as we've found over the last couple of months, after outage recoveries are rather protracted these days.
Grant
Darwin NT
ID: 2036365 · Report as offensive     Reply Quote
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 2036375 - Posted: 7 Mar 2020, 0:24:06 UTC

Don't be angry! Somehow, I just got a FULL queue of 150 GPU Tasks PER Card, x2.

Looking forward to Crunching tonight! 😀

Haven't had work since Tuesday's Outrage! 😱😱😱
Was beginning to think I wouldn't have work
all the rest of the month!

Got lucky on two Request passes on the Server
and filled the queue to capacity just a few minutes
ago.

Crunching will begin, again, on Hackintosh-Andromeda
just about an hour and a half, at 6 PM - Pacific.
With the CUDA90 App, (v0.97), from TBar, I think this
queue will last a few hours. Then, it's anybody's guess
IF the Servers will keep me working throughout the rest
of the night.

Good Luck to everyone.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 2036375 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2036384 - Posted: 7 Mar 2020, 0:49:48 UTC

The splitters were briefly turned on and the RTS buffer filled to normal levels. But that was sucked dry in about 30 minutes from all the empty hosts. Now back to no work but hard to verify since the replica is so far behind now.

I think that Eric's script to clear out the validator queue a bit helped for a while.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2036384 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 2036390 - Posted: 7 Mar 2020, 1:11:01 UTC - in response to Message 2036384.  

I think that Eric's script to clear out the validator queue a bit helped for a while.
Looking at the graphs it sorted out a few 100 thousand results.
The huge backlog waiting on results to be returned remains, it just had a bit shaved off the top.
Grant
Darwin NT
ID: 2036390 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2036394 - Posted: 7 Mar 2020, 1:59:37 UTC - in response to Message 2036384.  
Last modified: 7 Mar 2020, 2:10:22 UTC

The splitters were briefly turned on and the RTS buffer filled to normal levels. But that was sucked dry in about 30 minutes from all the empty hosts. Now back to no work but hard to verify since the replica is so far behind now.

I think that Eric's script to clear out the validator queue a bit helped for a while.



06-Mar-2020 14:19:39 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 14:24:47 [SETI@home] Scheduler request completed: got 65 new tasks
06-Mar-2020 14:29:52 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 14:34:59 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 14:40:07 [SETI@home] Scheduler request completed: got 51 new tasks
06-Mar-2020 14:45:16 [SETI@home] Scheduler request completed: got 7 new tasks
06-Mar-2020 14:50:23 [SETI@home] Scheduler request completed: got 26 new tasks
06-Mar-2020 15:07:56 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 15:33:30 [SETI@home] Scheduler request completed: got 15 new tasks
06-Mar-2020 15:38:37 [SETI@home] Scheduler request completed: got 1 new tasks
06-Mar-2020 15:43:46 [SETI@home] Scheduler request completed: got 5 new tasks
06-Mar-2020 15:48:53 [SETI@home] Scheduler request completed: got 1 new tasks
06-Mar-2020 15:54:05 [SETI@home] Scheduler request completed: got 1 new tasks
06-Mar-2020 15:59:12 [SETI@home] Scheduler request completed: got 1 new tasks
06-Mar-2020 16:04:19 [SETI@home] Scheduler request completed: got 2 new tasks
06-Mar-2020 16:09:26 [SETI@home] Scheduler request completed: got 2 new tasks
06-Mar-2020 16:14:33 [SETI@home] Scheduler request completed: got 2 new tasks
06-Mar-2020 16:19:40 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 16:24:49 [SETI@home] Scheduler request completed: got 5 new tasks
06-Mar-2020 16:29:58 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 16:35:05 [SETI@home] Scheduler request completed: got 3 new tasks
06-Mar-2020 16:40:13 [SETI@home] Scheduler request completed: got 0 new tasks

It lasted a little while. Nothing since 00:40 UTC. Edit* and the Replica is only 104 minutes behind now. What's an hour and a half between friends?
ID: 2036394 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1649
Credit: 12,921,799
RAC: 89
New Zealand
Message 2036395 - Posted: 7 Mar 2020, 2:05:12 UTC - in response to Message 2036390.  

I think that Eric's script to clear out the validator queue a bit helped for a while.
Looking at the graphs it sorted out a few 100 thousand results.
The huge backlog waiting on results to be returned remains, it just had a bit shaved off the top.

Hi Grant, are you referring to the results in progress and validation graph?
ID: 2036395 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2036399 - Posted: 7 Mar 2020, 2:17:09 UTC - in response to Message 2036395.  

He was referring to this one:

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2036399 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1649
Credit: 12,921,799
RAC: 89
New Zealand
Message 2036404 - Posted: 7 Mar 2020, 2:21:56 UTC - in response to Message 2036399.  

Thanks Keith I thought that's the one he was referring to
ID: 2036404 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 14013
Credit: 208,696,464
RAC: 304
Australia
Message 2036405 - Posted: 7 Mar 2020, 2:24:10 UTC - in response to Message 2036399.  

He was referring to this one:
Yep.
You can see a sharp drop in the Results returned and awaiting validation numbers. Unfortunately, it was only a drop in the ocean.
Grant
Darwin NT
ID: 2036405 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2036409 - Posted: 7 Mar 2020, 2:36:15 UTC

Has anyone considered there may be something about those 14 million results that the system just doesn't care for? If I were looking over some of those results I believe I would balk at those tasks which validated with a Quorum of 1. It would be nice if those tasks could be extracted and examined, even better if they could be temporarily removed and the system run without them to see how the system worked. They certainly don't seem to be in a hurry to leave by themselves.
ID: 2036409 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2036411 - Posted: 7 Mar 2020, 2:44:11 UTC

Assimilation logjam holds tasks in that 15 milllion result SSP slot for about three days only. So the quorum 1 tasks have been assimilated and deleted long ago.
ID: 2036411 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2036413 - Posted: 7 Mar 2020, 2:50:52 UTC - in response to Message 2036411.  

Are you sure they are less than 3 days old? How could you test how old they are? I was just giving an example of a task that could cause problems, of course, it could be anything causing the problem.
ID: 2036413 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2036416 - Posted: 7 Mar 2020, 3:11:37 UTC
Last modified: 7 Mar 2020, 3:12:37 UTC

Assimilation queue size is about three days worth of production and also my tasks disappear from the web site about three days after they have been validated. I seems to work with fifo principle.
ID: 2036416 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19958
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2036421 - Posted: 7 Mar 2020, 3:26:10 UTC - in response to Message 2036411.  
Last modified: 7 Mar 2020, 3:30:46 UTC

Assimilation logjam holds tasks in that 15 milllion result SSP slot for about three days only. So the quorum 1 tasks have been assimilated and deleted long ago.

No, they haven't, go to the very end of the listing for my MB Valid tasks and you will find, 10 tasks issued 30 Jan 2020, 1:55:47 UTC thru 30 Jan 2020, 11:30:16 UTC all validated by one result, mine, wingman still not reported.

Deadlines are the 22nd or 23rd March.
ID: 2036421 · Report as offensive     Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2026 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.