Message boards :
Number crunching :
Panic Mode On (114) Server Problems?
Message board moderation
Previous · 1 . . . 35 · 36 · 37 · 38 · 39 · 40 · 41 . . . 45 · Next
Author | Message |
---|---|
![]() ![]() ![]() ![]() Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 ![]() ![]() |
Unfortunately it seems we are going to have to go through all 10 retries to download each of those bad WUs. All my errors are resends ... ![]() |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Well, if anybody would like to contribute a data center budget to the Seti project, I for sure would like to hear about it. . . I'll let you now when I win the lottery :) Stephen . . PS some would consider your farm virtually a data centre of itself ... :) :) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
I guess SETI has become an addiction for some.. . . Firstly ... I smiled, it is indeed an addiction but only to certain personalities of which I am one. :) . . However my SETI budget could not buy even a 10 yo Porsche and run it too :) . . Now to Kittyman ... nice restraint, proud of you dude :) Stephen :) |
Sirius B ![]() ![]() Send message Joined: 26 Dec 00 Posts: 24926 Credit: 3,081,182 RAC: 7 ![]() |
I often do, but a Data Center win? I wish.Well, if anybody would like to contribute a data center budget to the Seti project, I for sure would like to hear about it. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Hey Mark - only 180 days to go ;-) . . Maybe we should all fly over to the US and throw a party :) . . On second thoughts the cost of the ticket would buy a 2080ti and I could dedicate it to his efforts :) Stephen :) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
I'm a 2013 Chevy Volt kind of guy. ok since we are comparing sizes ... 2008 Toyota Aurion here :) Nice car actually :) Stephen :) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Lots of "Permanent HTTP errors" and "unrecoverable errors" for queued download tasks. Those are the lost 15% of tasks. . . Did we lose the actual Tasks or just the WU downloads? I believe the latter is the case so they only need to be re-copied and resent. . . I am hoping. . . Thanks Grant for the clarification ..... :( :( :( Stephen ?? |
woohoo Send message Joined: 30 Oct 13 Posts: 973 Credit: 165,671,404 RAC: 5 ![]() |
I have a Porsche but not a Porsche RAC |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . FWIW ... I, not having used your workaround, have also received a large number of download errors since they resumed this am. I suspect that is a timeout issue for many of them since they have been in abeyance for a long time. . . To quote Hardy Har Har, . . Oh dear, oh dear, oh dear! :( Stephen :) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
. . Did we lose the actual Tasks or just the WU downloads? I believe the latter is the case so they only need to be re-copied and resent. No just the WU downloads. All my download error tasks have already gone out to two new wingmen. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13904 Credit: 208,696,464 RAC: 304 ![]() ![]() |
All my download error tasks have already gone out to two new wingmen. Many of the re-sends I've picked up have also failed as the WUs still aren't there to be downloaded. I think it's going to take them a while to sort this mess out. Edit- Here's an example. blc21_2bit_guppi_58406_31708_HIP20440_0117.31291.0.21.44.25.vlar Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13904 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I'm thinking this will affect the faster systems more than the slower ones. Slow systems will have a full cache, so it could be days before they get to any WUs that have been resent. For the faster systems, they'll be getting all the resends- and since so little new work is being generated at the moment most of the work going out will be these resends, until they reach their error limit & are declared duds & no are longer resent. As fast as Validated work reduces the Error count, the resends with no WU to actually download are just going to keep cranking it back up. My error count continues to increase each time I check my tasks. Given that these WUs are now erroring out as soon as the Manager attempts to download them, it should be mostly over in just a few hours. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Yes, a good dozen or so tasks in each new download are unrecoverable. So that forces the host into backoff timeouts. Until the lost tasks are cleared from the database, you will have to stay on top of your hosts to keep them reporting normally with manual updates. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
All my download error tasks have already gone out to two new wingmen. . . OK, despite the errors there are 2 systems reporting a successful download and the task is "in progress". The second of which was the final downloaad attempt which says to me that the task is still in the system on that one ... . . But from the posted news we have indeed lost some tasks completely. Stephen ?? |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Yes, a good dozen or so tasks in each new download are unrecoverable. So that forces the host into backoff timeouts. Until the lost tasks are cleared from the database, you will have to stay on top of your hosts to keep them reporting normally with manual updates. If you manually abort the failed DL, and update, the host return to ask for new Work, some of them are good and some back to DL failed. ![]() |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13904 Credit: 208,696,464 RAC: 304 ![]() ![]() |
. . OK, despite the errors there are 2 systems reporting a successful download and the task is "in progress". In progress doesn't mean there's been a successful download, it just means it's been allocated to the host and there hasn't been a result or an error returned yet. It could be the download errored out, but if there were previous download errors, this one hasn't been reported yet due to the backoffs (one of the systems hasn't contacted the Scheduler in over 6 hours). Or they could be ghosts. Or it could be a successful download (but unlikely given all the other download failures). Time will tell. . . But from the posted news we have indeed lost some tasks completely. Or they could run a query on the Database & any results that resulted in a WU erroring out completely on the 8th, 9th or 10th of Feb UTC could be re-processed. The WUs can still be processed, but as I said in a previous post, it's going to take a while for them to clean up the mess. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
There aren't any failed download tasks to abort. All you see is the message in the Event Log of giving up on the task. Fri 08 Feb 2019 04:42:46 PM PST | SETI@home | Giving up on download of 12dc06af.32725.1295.12.39.192.vlar: permanent HTTP error Fri 08 Feb 2019 04:42:46 PM PST | SETI@home | Giving up on download of 12dc06af.6741.172935.15.42.62.vlar: permanent HTTP error Fri 08 Feb 2019 04:42:46 PM PST | SETI@home | Giving up on download of 12dc06af.11999.1704.14.41.109: permanent HTTP error Fri 08 Feb 2019 04:42:46 PM PST | SETI@home | Giving up on download of 02no11ab.8915.11110.7.34.154: permanent HTTP error But you then get forced into 24 hour backoff. You have to wait out another five minutes to again contact the schedulers and hope you don't get more lost tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13904 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Looks like there's still a download issue- takes a minute or so for downloads to actually start downloading, occasionally pausing while downloading, and some barely cracking the 1kB/s mark by the time they finish. Whatever the download issue was, it seems to have sorted itself out now. When I can get work, downloads are back around 200+kB/s again. Grant Darwin NT |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
There aren't any failed download tasks to abort. All you see is the message in the Event Log of giving up on the task. But they never clear, remains in the list of the Tasks. That's why my suggestion to abort them. When you update in this case the backoff back to the normal 5 min <edit> Yes the DL is back to normal now. ![]() |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Again, they are NOT in the list of Tasks. Only the server has a record of them on your machine. You never got them. They only way to get rid of them is stop BOINC, wait five minutes, contact the scheduler, get the master list successfully downloaded, get benchmarked and finally get rid of them from the database for your host. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.