The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 107 · Next

AuthorMessage
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2037304 - Posted: 11 Mar 2020, 15:39:13 UTC - in response to Message 2037302.  
Last modified: 11 Mar 2020, 15:39:56 UTC

The replica has blasted through the 3 hour mark.

I am going to PANIC shortly, with or without permission.


3hrs is 10,800 seconds. the replica is only at 8,125 seconds behind, or ~2.25 hrs
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2037304 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19072
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2037311 - Posted: 11 Mar 2020, 16:46:57 UTC - in response to Message 2037304.  

The replica has blasted through the 3 hour mark.

I am going to PANIC shortly, with or without permission.


3hrs is 10,800 seconds. the replica is only at 8,125 seconds behind, or ~2.25 hrs

Typo, and didn't check it after.
Blame it on all the hand washing. I can't do a thing with my fingers afterwards.
ID: 2037311 · Report as offensive     Reply Quote
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 2037313 - Posted: 11 Mar 2020, 17:06:23 UTC - in response to Message 2037311.  
Last modified: 11 Mar 2020, 17:40:22 UTC

Blame it on all the hand washing. I can't do a thing with my fingers afterwards.
Don't make jokes about that. When I posted personal opinions and jokes about it on the BOINC forums, someone had to complain about me and wanted me gone as a moderator, because of the irreparable damage I personally did to the reputation of BOINC and its projects. So this is no joking matter. You could be expelled. I as well. But I don't care. :)

Besides, it's now a pandemic, so you best pick up on the hand washing. Perhaps wash the keyboard as well. Or just only type when in the shower!
ID: 2037313 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2037325 - Posted: 11 Mar 2020, 18:55:04 UTC - in response to Message 2037294.  

And now the replica is 7,051 seconds behind, and validation and assimilation is not catching up either.
Validation has nothing to catch up. My tasks validate within about a minute from my client reporting them if my result fills the quorum. The problem is entirely in the assimilation.
ID: 2037325 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2037332 - Posted: 11 Mar 2020, 19:33:11 UTC

I suggested a Search & Destroy mission days ago on All WUs with a "minimum quorum 1". Since then I have been unable to see the end of my Valid lists due to the Server refusing to load the pages. I have no idea what the Valid lists look like today, previously they had Hundreds of WUs marked as Validated with a minimum quorum 1 and an outstanding Wingman denying them Assimilation. Even if there aren't that many in the system, they could be causing 'other' problems holding up Assimilation. It was the only obvious problem with Assimilation I could find. IMO, definitely worth the trouble of Vaporizing them. Otherwise, they will meet their deadline on Mar 22-23, some time away...
ID: 2037332 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2037336 - Posted: 11 Mar 2020, 20:01:31 UTC - in response to Message 2037332.  
Last modified: 11 Mar 2020, 20:03:13 UTC

I suggested a Search & Destroy mission days ago on All WUs with a "minimum quorum 1". Since then I have been unable to see the end of my Valid lists due to the Server refusing to load the pages. I have no idea what the Valid lists look like today, previously they had Hundreds of WUs marked as Validated with a minimum quorum 1 and an outstanding Wingman denying them Assimilation. Even if there aren't that many in the system, they could be causing 'other' problems holding up Assimilation. It was the only obvious problem with Assimilation I could find. IMO, definitely worth the trouble of Vaporizing them. Otherwise, they will meet their deadline on Mar 22-23, some time away...


You did. I read it clearly.
Edit: I think we will be waiting 11 or 12 days for the resolution.
ID: 2037336 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2037339 - Posted: 11 Mar 2020, 20:10:15 UTC - in response to Message 2037336.  

I actually can pull up tasks right now. First time in a couple of days. This task looks interesting. Don't know what happened or why a 4th replication was generated when quorum was reached.
https://setiathome.berkeley.edu/workunit.php?wuid=3916150112
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2037339 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2037342 - Posted: 11 Mar 2020, 20:25:40 UTC - in response to Message 2037339.  

I actually can pull up tasks right now. First time in a couple of days. This task looks interesting. Don't know what happened or why a 4th replication was generated when quorum was reached.
https://setiathome.berkeley.edu/workunit.php?wuid=3916150112
Eric wrote this on 8 January, and evidently it's still active:

Hopefully final validation mod to reduce bad results from failing GPUs
If 1 of 2 is overflow, quorum is increased to 3
If 1 of 3 is overflow, results are validated.
If 2 of 2 are overflow, quorum is increased to 3.
If 2 of 3 are overflow, quorum is increased to 4
If 3 of 3 are overflow, results are validated.
4 results are always validated.
Task 1 was overflow, quorum increased to 3, task 3 sent out.
Task 3 came back next - was overflow. 2 of 3 overflow, quorum increased to 4, task 4 sent out.
Task 2 came back, 3 of 3 are overflow, validated.
ID: 2037342 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2037343 - Posted: 11 Mar 2020, 20:26:20 UTC - in response to Message 2037339.  

See if you can pull up the page containing the Jan 30-31 Valid results on a high end machine, see what WUs are there.

If Murphy's Law has anything to say about it, come Mar 22-23 many Resends will be issued on WUs that have been marked as Valid for 7 weeks. In true Murphy form, there will be just enough Resends to push the Database out of Server Ram causing Workflow to fall to a crawl. The final Week of SETI@Home will be spent with very few people receiving Work, and a board full of complaints. Oh well...
ID: 2037343 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2037345 - Posted: 11 Mar 2020, 20:36:16 UTC - in response to Message 2037343.  
Last modified: 11 Mar 2020, 20:38:56 UTC

See if you can pull up the page containing the Jan 30-31 Valid results on a high end machine, see what WUs are there.

If Murphy's Law has anything to say about it, come Mar 22-23 many Resends will be issued on WUs that have been marked as Valid for 7 weeks. In true Murphy form, there will be just enough Resends to push the Database out of Server Ram causing Workflow to fall to a crawl. The final Week of SETI@Home will be spent with very few people receiving Work, and a board full of complaints. Oh well...

Yep, I can. All your minimum quorum=1 tasks are visible for the already validated task list. Still have to wait till March 23 for the wingman to report.

None of the tasks are overflows. Most are cpu tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2037345 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2037346 - Posted: 11 Mar 2020, 20:37:39 UTC - in response to Message 2037342.  

Thanks for the explanation Richard. Looks like it is working as Eric configured.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2037346 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2037349 - Posted: 11 Mar 2020, 20:43:25 UTC - in response to Message 2037346.  
Last modified: 11 Mar 2020, 20:46:17 UTC

Thanks for the explanation Richard. Looks like it is working as Eric configured.
Looks like the culprit was Kittyman, returning Task 3 too quickly. Eric didn't think of that one!

Or Ville Saari for returning Task 2 too late. With W3Perl on drums, looks like you had quite a party there.
ID: 2037349 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2037351 - Posted: 11 Mar 2020, 20:56:17 UTC - in response to Message 2037349.  

Thanks for the explanation Richard. Looks like it is working as Eric configured.
Looks like the culprit was Kittyman, returning Task 3 too quickly. Eric didn't think of that one!

Or Ville Saari for returning Task 2 too late. With W3Perl on drums, looks like you had quite a party there.

Ha ha ha LOL. Yes the GPUUG Pirate party is still full in swing getting the Seti Toaster for kittyman.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2037351 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2037353 - Posted: 11 Mar 2020, 21:06:00 UTC - in response to Message 2037351.  
Last modified: 11 Mar 2020, 21:08:15 UTC

Thanks for the explanation Richard. Looks like it is working as Eric configured.
Looks like the culprit was Kittyman, returning Task 3 too quickly. Eric didn't think of that one!

Or Ville Saari for returning Task 2 too late. With W3Perl on drums, looks like you had quite a party there.

Ha ha ha LOL. Yes the GPUUG Pirate party is still full in swing getting the Seti Toaster for kittyman.

The Kitty Toaster party is scheduled for tomorrow, all friends are invited to participate!
ID: 2037353 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2037360 - Posted: 11 Mar 2020, 21:39:45 UTC - in response to Message 2037302.  

The replica has blasted through the 3 hour mark.

I am going to PANIC shortly, with or without permission.


. . Stay calm and carry on!

Stephen

:)
ID: 2037360 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19072
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2037376 - Posted: 11 Mar 2020, 22:33:04 UTC - in response to Message 2037360.  

The replica has blasted through the 3 hour mark.

I am going to PANIC shortly, with or without permission.


. . Stay calm and carry on!

Stephen

:)

I went out instead, but now I am back and the replica is 4¼ hours behind.
I would PANIC but it's late-ish and the neighbours might complain.
So I'll postpone the PANIC and make a decision after breakfast.
ID: 2037376 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2037384 - Posted: 11 Mar 2020, 22:59:23 UTC - in response to Message 2037376.  

The replica has blasted through the 3 hour mark.

I am going to PANIC shortly, with or without permission.


. . Stay calm and carry on!

Stephen

:)

I went out instead, but now I am back and the replica is 4¼ hours behind.
I would PANIC but it's late-ish and the neighbours might complain.
So I'll postpone the PANIC and make a decision after breakfast.


Panic should be well considered and pondered (just before you freak out :)

Tom
A proud member of the OFA (Old Farts Association).
ID: 2037384 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2037399 - Posted: 12 Mar 2020, 0:00:49 UTC

I know we are in the vicious circle. To give the administration queue a chance to catch up I wonder whether or not it would be worth turning the replica database off, then letting catch up once the administration is caught up. There were a reasonable number of resends around last night because I had over 32
ID: 2037399 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2037400 - Posted: 12 Mar 2020, 0:02:45 UTC - in response to Message 2037397.  

Replica now 4,440555555555556 hours behind.
The whole DB part of the system is pretty much F'ed up, and will never recover before March 31st

I think the project will be around for at least another 2 to 4 months after the 31st two cleanup resends et cetera
ID: 2037400 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34841
Credit: 261,360,520
RAC: 489
Australia
Message 2037458 - Posted: 12 Mar 2020, 3:45:24 UTC

20 days to go until Zulu. ;-)

Cheers.
ID: 2037458 · Report as offensive     Reply Quote
Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.