Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 51 · 52 · 53 · 54 · 55 · 56 · 57 . . . 107 · Next
Author | Message |
---|---|
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
I've set NNT I want to go out gracefully. Dave |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Assimilators have made some progress. Not enough to make any meaningful difference but the queue is now below 7 million wus. . . Entropy Rules! Stephen :( |
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
My slower computer is getting nice amount of stuff every now and then, The faster cruncher gets almost nothing. I'm starting to wonder if I should use the slower computer to 'farm' tasks and then transfer them over to the faster computer... . . Good luck with that. Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Seems like something else breaks look this WU just received: https://setiathome.berkeley.edu/workunit.php?wuid=3829209762 . . I'm thinking we are now seeing the true underlying cause of the problems. When they did the OS upgrade/rollback something broke. As I understand it there should be no more than 4 replications (after 5th failed validation the WU should be dumped right?) yet I am seeing dozens of resends with _6/7/8/9 tails. Could that be the reason for the bloat choking the servers ? ? ? Stephen ? ? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Pity they didn't think of shortening the deadline at the same time. . . That would have been a more productive move. Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Highly unlikely a lot of them was sended on 30 Mar 2020, 18:15:17 UTC (today) with a deadline of 20 Jun 2020, 0:09:02 UTC . . I did not notice that, I presumed the over the top replication numbers were from earlier. Yes 5 replications on the 30th, maybe I am wasting my time prioritising these tasks ... Stephen |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I think they're trying to make the database go nova - go out with a bang. :) . . Do you think I will see the sparks from over here in Sydney??? . . I'll be watching :) Stephen :) |
Sirius B Send message Joined: 26 Dec 00 Posts: 24920 Credit: 3,081,182 RAC: 7 |
Winding down starting? 31/03/2020 01:39:32 | SETI@home | Started download of 28ja20ae.23328.8247.6.33.80 31/03/2020 01:39:43 | SETI@home | Temporarily failed download of 28ja20ae.23328.8247.6.33.80: transient HTTP error 31/03/2020 01:39:43 | SETI@home | Backing off 00:15:24 on download of 28ja20ae.23328.8247.6.33.80 31/03/2020 01:39:44 | | Project communication failed: attempting access to reference site 31/03/2020 01:39:46 | | Internet access OK - project servers may be temporarily down. |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
I think I found the issue in a script that was supposed to trigger a resend on results unlikely to be returned. I turned the script off, so it should stop happening. You didn't think this would go smoothly, did you? @SETIEric@qoto.org (Mastodon) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Still not understanding why I am only getting 2 AMD tasks, never any more than that. Was there some limitation put on the delivery for high end GPUs?Your AMD app has produced lot of errors lately and is being throttled until it has returned enough valid tasks for the server to trust it again. . . D'OH! . . Excessive numbers of abortions will cause the schedulers to blacklist that host and throttle it as you are being throttled. . . Self induced problem dude! Stephen :( |
kittyman Send message Joined: 9 Jul 00 Posts: 51484 Credit: 1,018,363,574 RAC: 1,004 |
Thank you for looking into it and responding so quickly, Eric. Meow! "Time is simply the mechanism that keeps everything from happening all at once." |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
There doesn't seem to be any useful purpose to run this task. My host is the last one, and it hasn't run the task yet....Why should it? . . The interesting thing is that the "Error can't validate" is from the initial issue not these resends so that problem has been around for quite a while, in Bernie's case back to the 8th January. Does that date ring any bells ??? Stephen ?? |
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
Thank you for looking into it and responding so quickly, Eric. +1 Dave |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I think I found the issue in a script that was supposed to trigger a resend on results unlikely to be returned. I turned the script off, so it should stop happening. Thanks for you fast reply. I have one question, since our controversial clients are programmed to crunch the resends firsts (as fast as possible) what could happening if some of our fastest hosts starts to do that and return the _6, _7 etc. well before the normal hosts sends their _0 or _1 tasks? Did you recommend we abort this WU's or just leave it crunch in this way? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I think I found the issue in a script that was supposed to trigger a resend on results unlikely to be returned. I turned the script off, so it should stop happening. . . LOL! Well some of us were optimistic enough to hope ... . . That script sounds like a really good idea struck down by Murphy's Law (or human error). . . Thanks for the fix. And prompt too :) Stephen :) |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 |
Still not understanding why I am only getting 2 AMD tasks, never any more than that. Was there some limitation put on the delivery for high end GPUs?Your AMD app has produced lot of errors lately and is being throttled until it has returned enough valid tasks for the server to trust it again. I wish it was self induced. They turned on a plan_class which was shut down because of these errors, but were actively developing the replacement. Funny part is it has now been running for 3 months, and used to cause many, many more problems before. Throttling now makes zero sense, and keeping a faulty plan_class active makes even less. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
holy smokes, I don't know what happened. I left earlier with 0 SETI tasks (not over 100k as some would think, replica is delayed yo). but when I came back one of my systems has 5100 and counting! yay! guess I'll get a couple more days out of it after all! so happy to help the project with this last hurrah :D Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 31029 Credit: 53,134,872 RAC: 32 |
The end was announced before the lockdown. So will that make a change of plans? After all we should have a rip roaring party. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
holy smokes, I don't know what happened. I left earlier with 0 SETI tasks (not over 100k as some would think, replica is delayed yo). but when I came back one of my systems has 5100 and counting! yay! guess I'll get a couple more days out of it after all! so happy to help the project with this last hurrah :D Somebody open the gates of the new work generation and we have a massive amount of new work available to DL. Your cache is refilling at about 2400 WU per hour. If the feeding frenzies continues it could reach your 20K level in about 8 hrs. Hope they not shut down the new work production before that. Was unclear at what exactly hour that shutdown will be done. AFAIK they just tell the date March 31. Lets see what the day bring to us. Happy hunting for new WUs. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.