Panic Mode On (114) Server Problems?

Author	Message
juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1979491 - Posted: 9 Feb 2019, 1:03:03 UTC - in response to Message 1979485. Last modified: 9 Feb 2019, 1:05:22 UTC There aren't any failed download tasks to abort. All you see is the message in the Event Log of giving up on the task. Fri 08 Feb 2019 04:42:46 PM PST \| SETI@home \| Giving up on download of 12dc06af.32725.1295.12.39.192.vlar: permanent HTTP error Fri 08 Feb 2019 04:42:46 PM PST \| SETI@home \| Giving up on download of 12dc06af.6741.172935.15.42.62.vlar: permanent HTTP error Fri 08 Feb 2019 04:42:46 PM PST \| SETI@home \| Giving up on download of 12dc06af.11999.1704.14.41.109: permanent HTTP error Fri 08 Feb 2019 04:42:46 PM PST \| SETI@home \| Giving up on download of 02no11ab.8915.11110.7.34.154: permanent HTTP error But you then get forced into 24 hour backoff. You have to wait out another five minutes to again contact the schedulers and hope you don't get more lost tasks. But they never clear, remains in the list of the Tasks. That's why my suggestion to abort them. When you update in this case the backoff back to the normal 5 min <edit> Yes the DL is back to normal now. ID: 1979491 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1979493 - Posted: 9 Feb 2019, 1:08:59 UTC - in response to Message 1979491. Again, they are NOT in the list of Tasks. Only the server has a record of them on your machine. You never got them. They only way to get rid of them is stop BOINC, wait five minutes, contact the scheduler, get the master list successfully downloaded, get benchmarked and finally get rid of them from the database for your host. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1979493 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1979494 - Posted: 9 Feb 2019, 1:11:05 UTC I have one host that constantly has been getting the short straw and receiving no new work. Just about out. The other hosts are getting a reasonable 1 for 1 return. Still not back to full though. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1979494 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1979495 - Posted: 9 Feb 2019, 1:12:18 UTC - in response to Message 1979491. Last modified: 9 Feb 2019, 1:14:27 UTC There aren't any failed download tasks to abort. All you see is the message in the Event Log of giving up on the task. Fri 08 Feb 2019 04:42:46 PM PST \| SETI@home \| Giving up on download of 12dc06af.32725.1295.12.39.192.vlar: permanent HTTP error Fri 08 Feb 2019 04:42:46 PM PST \| SETI@home \| Giving up on download of 12dc06af.6741.172935.15.42.62.vlar: permanent HTTP error Fri 08 Feb 2019 04:42:46 PM PST \| SETI@home \| Giving up on download of 12dc06af.11999.1704.14.41.109: permanent HTTP error Fri 08 Feb 2019 04:42:46 PM PST \| SETI@home \| Giving up on download of 02no11ab.8915.11110.7.34.154: permanent HTTP error But you then get forced into 24 hour backoff. You have to wait out another five minutes to again contact the schedulers and hope you don't get more lost tasks. But they never clear, remains in the list of the Tasks. That's why my suggestion to abort them. What's happening now is you request new work, you get new WUs, and many of them are resends. Some are normal resends, others because of the download problems (all those _3, _4, _5s are a good hint). But as soon as they are allocated, they start to download. And in a matter of seconds (now that downloads are no long moving at a crawl) they error out as a permanent HTTP error. No time to abort them. And there are still plenty of _2, _3 resend WUs that aren't due to download errors, so aborting those would be aborting good work. And Aborting work goes against your Error count anyway (although at least you don't get ridiculous backoff times). Edit- the long backoff only occurs after the Download errors out. If you wait 5min, Update, then they get cleared from your system, now work gets allocated (or not), and you're back to 5min between Scheduler requests. Till the next batch of un-downloadable WUs, of course. Grant Darwin NT ID: 1979495 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1979496 - Posted: 9 Feb 2019, 1:17:16 UTC - in response to Message 1979494. Last modified: 9 Feb 2019, 1:18:39 UTC I have one host that constantly has been getting the short straw and receiving no new work. Just about out. The other hosts are getting a reasonable 1 for 1 return. Still not back to full though. Splitters have only just started putting out a more reasonable amount of work. Even so, it's been easier to get work after this unscheduled outage than it was trying to get work after the weekly outage. Generally it's been every few requests gets some work. After the weekly outage it took hours, and even when work finally started to be allocated it was something like 7-20 requests to get more. Grant Darwin NT ID: 1979496 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1979497 - Posted: 9 Feb 2019, 1:21:01 UTC - in response to Message 1979495. Edit- the long backoff only occurs after the Download errors out. If you wait 5min, Update, then they get cleared from your system, now work gets allocated (or not), and you're back to 5min between Scheduler requests. Till the next batch of un-downloadable WUs, of course. Exactallly . . . . LOL Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1979497 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1979502 - Posted: 9 Feb 2019, 1:41:26 UTC Wild speculation time. Eric mentioned15% of the storage space was affected? So with 600,000 WUs Ready-to-send, that's around 90,000 WUs MIA. Received in the last hour is around 110,000- but on a system that gets one of these missing WUs allocated, they can end up with some very long timeouts; and a great deal of people don't keep a close eye on their systems. So it could easily take 48+hrs for them to finally be cleared from the system. Grant Darwin NT ID: 1979502 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1979504 - Posted: 9 Feb 2019, 1:47:20 UTC - in response to Message 1979494. I have one host that constantly has been getting the short straw and receiving no new work. Just about out. The other hosts are getting a reasonable 1 for 1 return. Still not back to full though. It really is rather strange to watch. I have 1 system that struggles to get work, but when it does it's all good. The other system gets work pretty much each time it requests it, but without fail it gets many of those destined to download failure WUs. Grant Darwin NT ID: 1979504 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 1979505 - Posted: 9 Feb 2019, 1:50:35 UTC ive been getting a lot of "Project has no tasks available" :/ Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 1979505 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1979506 - Posted: 9 Feb 2019, 1:58:28 UTC . . @ Unixchick . . There has been a new series of tapes mounted. Blc31 tapes ... but they are from the same day :) LOL. Stephen :) ID: 1979506 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1979509 - Posted: 9 Feb 2019, 2:01:10 UTC - in response to Message 1979506. I saw the new tape loaded for the weekend. And the splitters are finally ramping up. Maybe we will start to make a dent in the unfilled caches. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1979509 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34773 Credit: 261,360,520 RAC: 489	Message 1979517 - Posted: 9 Feb 2019, 3:15:54 UTC After just getting in on updating my ready to download caches filled back up just before day break this morning, the system went down. :-( Then I decided to go into town and do a few things then go to the pub to have a few beers and a feed. On arriving home all was back to normal. So it proves that taking a break and a few beers will fix almost anything. :-D Cheers. ID: 1979517 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1979520 - Posted: 9 Feb 2019, 3:42:01 UTC Not quite to that point yet. Still getting the occasional 24 hour backoff that needs tending. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1979520 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1979527 - Posted: 9 Feb 2019, 4:41:52 UTC Last modified: 9 Feb 2019, 4:42:20 UTC Just picked up my first Invalid from this issue. It was an Inconclusive, but after that everything else ended up as a download error and it ended up as "Completed, can't validate" Grant Darwin NT ID: 1979527 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1979529 - Posted: 9 Feb 2019, 4:56:59 UTC - in response to Message 1979527. Last modified: 9 Feb 2019, 5:03:59 UTC Just picked up my first Invalid from this issue. It was an Inconclusive, but after that everything else ended up as a download error and it ended up as "Completed, can't validate" Bummer [Edit] 11 completed, can't validate so far on my farm Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1979529 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1979534 - Posted: 9 Feb 2019, 5:50:17 UTC - in response to Message 1979517. So it proves that taking a break and a few beers will fix almost anything. :-D Cheers. . . Well of course ... :) Stephen :) ID: 1979534 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1979535 - Posted: 9 Feb 2019, 5:53:14 UTC - in response to Message 1979419. I shall fade off into the background again. I did my bit, and was successful in contacting an admin to kick the servers. Be thankful, and that is all I shall ask. Till we meet again. Meow. You still are a +2 to all of us. Tom A proud member of the OFA (Old Farts Association). ID: 1979535 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1979536 - Posted: 9 Feb 2019, 5:54:29 UTC - in response to Message 1979534. So it proves that taking a break and a few beers will fix almost anything. :-D Cheers. . . Well of course ... :) Stephen :) Just got off work, haven't had the beer yet. But when I hit a manual update it started inhaling tons of tasks. I must be the weekend (someplace). Tom A proud member of the OFA (Old Farts Association). ID: 1979536 ·

12kpp Volunteer tester Send message Joined: 20 Mar 08 Posts: 1 Credit: 4,903,051 RAC: 0	Message 1979539 - Posted: 9 Feb 2019, 6:33:31 UTC http://setiathome.berkeley.edu/workunit.php?wuid=3328702611 odl WU could not complete due to server crash =( ID: 1979539 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1979542 - Posted: 9 Feb 2019, 7:10:16 UTC Minor panic when I got home just now- several WUs in project backoff having not been able to download. Re-tried pending transfers and luckily they then cleared without further hiccups. Grant Darwin NT ID: 1979542 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.