The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 51 · 52 · 53 · 54 · 55 · 56 · 57 . . . 107 · Next

AuthorMessage
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 2041808 - Posted: 31 Mar 2020, 0:01:32 UTC

I've set NNT

I want to go out gracefully.
Dave

ID: 2041808 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2041817 - Posted: 31 Mar 2020, 0:27:13 UTC - in response to Message 2041626.  

Assimilators have made some progress. Not enough to make any meaningful difference but the queue is now below 7 million wus.

Few hosts will continue to work after March 31 but at the end even them will deplete their lasts WUs.
And then S@h will end in a cold & darkness eternity as expected.


. . Entropy Rules!

Stephen

:(
ID: 2041817 · Report as offensive     Reply Quote
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2041819 - Posted: 31 Mar 2020, 0:28:24 UTC - in response to Message 2041817.  

The literal heat death of S@H!
ID: 2041819 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2041820 - Posted: 31 Mar 2020, 0:28:49 UTC - in response to Message 2041634.  

My slower computer is getting nice amount of stuff every now and then, The faster cruncher gets almost nothing. I'm starting to wonder if I should use the slower computer to 'farm' tasks and then transfer them over to the faster computer...


. . Good luck with that.

Stephen

:)
ID: 2041820 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2041824 - Posted: 31 Mar 2020, 0:33:26 UTC - in response to Message 2041720.  

Seems like something else breaks look this WU just received: https://setiathome.berkeley.edu/workunit.php?wuid=3829209762

Initial replication of 8?

There are a lot of them. What is to do? Suggestions?


. . I'm thinking we are now seeing the true underlying cause of the problems. When they did the OS upgrade/rollback something broke. As I understand it there should be no more than 4 replications (after 5th failed validation the WU should be dumped right?) yet I am seeing dozens of resends with _6/7/8/9 tails. Could that be the reason for the bloat choking the servers ? ? ?

Stephen

? ?
ID: 2041824 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2041830 - Posted: 31 Mar 2020, 0:42:37 UTC - in response to Message 2041731.  

Pity they didn't think of shortening the deadline at the same time.


. . That would have been a more productive move.

Stephen

:(
ID: 2041830 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2041831 - Posted: 31 Mar 2020, 0:45:13 UTC - in response to Message 2041735.  

Highly unlikely a lot of them was sended on 30 Mar 2020, 18:15:17 UTC (today) with a deadline of 20 Jun 2020, 0:09:02 UTC


. . I did not notice that, I presumed the over the top replication numbers were from earlier. Yes 5 replications on the 30th, maybe I am wasting my time prioritising these tasks ...

Stephen
ID: 2041831 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2041833 - Posted: 31 Mar 2020, 0:47:37 UTC - in response to Message 2041740.  

I think they're trying to make the database go nova - go out with a bang.

:)

. . Do you think I will see the sparks from over here in Sydney???

. . I'll be watching :)

Stephen

:)
ID: 2041833 · Report as offensive     Reply Quote
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24877
Credit: 3,081,182
RAC: 7
Ireland
Message 2041836 - Posted: 31 Mar 2020, 0:50:22 UTC

Winding down starting?
31/03/2020 01:39:32 | SETI@home | Started download of 28ja20ae.23328.8247.6.33.80
31/03/2020 01:39:43 | SETI@home | Temporarily failed download of 28ja20ae.23328.8247.6.33.80: transient HTTP error
31/03/2020 01:39:43 | SETI@home | Backing off 00:15:24 on download of 28ja20ae.23328.8247.6.33.80
31/03/2020 01:39:44 | | Project communication failed: attempting access to reference site
31/03/2020 01:39:46 | | Internet access OK - project servers may be temporarily down.
ID: 2041836 · Report as offensive     Reply Quote
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 2041837 - Posted: 31 Mar 2020, 0:50:38 UTC

I think I found the issue in a script that was supposed to trigger a resend on results unlikely to be returned. I turned the script off, so it should stop happening.

You didn't think this would go smoothly, did you?
@SETIEric@qoto.org (Mastodon)

ID: 2041837 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2041839 - Posted: 31 Mar 2020, 0:51:41 UTC - in response to Message 2041759.  

Still not understanding why I am only getting 2 AMD tasks, never any more than that. Was there some limitation put on the delivery for high end GPUs?

Edit: to clarify. I get 2 AMD tasks, only two. I don't get any more until I deliver one or both, and it never exceeds two.
Your AMD app has produced lot of errors lately and is being throttled until it has returned enough valid tasks for the server to trust it again.

I doubt that. The errors are all abortions, and are all along the SoG plan_class. This is happening across all plan_classes.


. . D'OH!

. . Excessive numbers of abortions will cause the schedulers to blacklist that host and throttle it as you are being throttled.

. . Self induced problem dude!

Stephen

:(
ID: 2041839 · Report as offensive     Reply Quote
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 2041840 - Posted: 31 Mar 2020, 0:53:52 UTC - in response to Message 2041837.  

Thank you for looking into it and responding so quickly, Eric.

Meow!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 2041840 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2041842 - Posted: 31 Mar 2020, 0:57:14 UTC - in response to Message 2041773.  

There doesn't seem to be any useful purpose to run this task. My host is the last one, and it hasn't run the task yet....Why should it?
https://setiathome.berkeley.edu/workunit.php?wuid=3873976498
????
Also..... I'm afraid Any Host on that list is going to get hit with an Error, whether you run the task or Not.
It seems My main Host has been awarded a 'few' Errors and has had it's 'Consecutive valid tasks ' number lowered.


. . The interesting thing is that the "Error can't validate" is from the initial issue not these resends so that problem has been around for quite a while, in Bernie's case back to the 8th January. Does that date ring any bells ???

Stephen

??
ID: 2041842 · Report as offensive     Reply Quote
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 2041844 - Posted: 31 Mar 2020, 0:58:28 UTC - in response to Message 2041840.  

Thank you for looking into it and responding so quickly, Eric.

Meow!



+1
Dave

ID: 2041844 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2041848 - Posted: 31 Mar 2020, 1:04:24 UTC - in response to Message 2041837.  
Last modified: 31 Mar 2020, 1:04:51 UTC

I think I found the issue in a script that was supposed to trigger a resend on results unlikely to be returned. I turned the script off, so it should stop happening.

You didn't think this would go smoothly, did you?

Thanks for you fast reply.

I have one question, since our controversial clients are programmed to crunch the resends firsts (as fast as possible) what could happening if some of our fastest hosts starts to do that and return the _6, _7 etc. well before the normal hosts sends their _0 or _1 tasks?

Did you recommend we abort this WU's or just leave it crunch in this way?
ID: 2041848 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2041852 - Posted: 31 Mar 2020, 1:06:34 UTC - in response to Message 2041837.  

I think I found the issue in a script that was supposed to trigger a resend on results unlikely to be returned. I turned the script off, so it should stop happening.

You didn't think this would go smoothly, did you?


. . LOL! Well some of us were optimistic enough to hope ...

. . That script sounds like a really good idea struck down by Murphy's Law (or human error).

. . Thanks for the fix. And prompt too :)

Stephen

:)
ID: 2041852 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2041857 - Posted: 31 Mar 2020, 1:35:19 UTC - in response to Message 2041839.  

Still not understanding why I am only getting 2 AMD tasks, never any more than that. Was there some limitation put on the delivery for high end GPUs?

Edit: to clarify. I get 2 AMD tasks, only two. I don't get any more until I deliver one or both, and it never exceeds two.
Your AMD app has produced lot of errors lately and is being throttled until it has returned enough valid tasks for the server to trust it again.

I doubt that. The errors are all abortions, and are all along the SoG plan_class. This is happening across all plan_classes.


. . D'OH!

. . Excessive numbers of abortions will cause the schedulers to blacklist that host and throttle it as you are being throttled.

. . Self induced problem dude!

Stephen

:(

I wish it was self induced. They turned on a plan_class which was shut down because of these errors, but were actively developing the replacement. Funny part is it has now been running for 3 months, and used to cause many, many more problems before. Throttling now makes zero sense, and keeping a faulty plan_class active makes even less.
ID: 2041857 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2041866 - Posted: 31 Mar 2020, 2:12:03 UTC

holy smokes, I don't know what happened. I left earlier with 0 SETI tasks (not over 100k as some would think, replica is delayed yo). but when I came back one of my systems has 5100 and counting! yay! guess I'll get a couple more days out of it after all! so happy to help the project with this last hurrah :D
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2041866 · Report as offensive     Reply Quote
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30608
Credit: 53,134,872
RAC: 32
United States
Message 2041877 - Posted: 31 Mar 2020, 3:45:55 UTC

The end was announced before the lockdown. So will that make a change of plans?

After all we should have a rip roaring party.
ID: 2041877 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2041878 - Posted: 31 Mar 2020, 4:06:08 UTC - in response to Message 2041866.  
Last modified: 31 Mar 2020, 4:07:33 UTC

holy smokes, I don't know what happened. I left earlier with 0 SETI tasks (not over 100k as some would think, replica is delayed yo). but when I came back one of my systems has 5100 and counting! yay! guess I'll get a couple more days out of it after all! so happy to help the project with this last hurrah :D

Somebody open the gates of the new work generation and we have a massive amount of new work available to DL. Your cache is refilling at about 2400 WU per hour. If the feeding frenzies continues it could reach your 20K level in about 8 hrs. Hope they not shut down the new work production before that. Was unclear at what exactly hour that shutdown will be done. AFAIK they just tell the date March 31. Lets see what the day bring to us. Happy hunting for new WUs.
ID: 2041878 · Report as offensive     Reply Quote
Previous · 1 . . . 51 · 52 · 53 · 54 · 55 · 56 · 57 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.