Message boards :
Number crunching :
Panic Mode On (114) Server Problems?
Message board moderation
Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · 43 · 44 . . . 45 · Next
Author | Message |
---|---|
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
Wow. I can see there has been a ton of "fun" happening here while I was on vacation. I managed to download some WUs so I hope things will run well for a while. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
[Edit] I must have a magnet. On the host that was in backoff for 35 minutes, when I reported 100 tasks and got 100 in return. 30 of them were unrecoverable download errors. This is getting tiresome. I'm still getting a few, but it is just a few. And now it's more spread between my systems, where initially one system was getting half of it's downloads being failures and the other would only get the occasional one. Grant Darwin NT |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
Gave up on mine:-) Downloaded a lot of Einstein WU's, set NNT on Einstein to stop it flooding me and adjusted the resource share so that I am running 2 GPU's on Einstein for the next couple of days. I am still getting some of the missing tasks but not enough to cause the larger back-off's that are causing problems. Bed time now. Kevin |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
not getting any GPU MB's only CPU MB's Ed F |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
not getting any GPU MB's only CPU MB's It looks like one of your systems has 200 GPU Ghost WUs then. Actually, it looks several of your systems have lots of Ghost WUs... Grant Darwin NT |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
how do I see ghosts?? what do they look like?? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
If you have more tasks in your "work in progress" than the project allotment of 100 cpu task per cpu and 100 task per gpu, then you have 'ghosts' So you should only have 200 tasks for a host with a cpu and one gpu for example. You can use the 'ghost recovery protocol' to get rid of them by reclaiming 'lost tasks' Ghost Recovery Protocol . . As follows; . . Set project to No New Tasks . . Disable network access and wait for a group of completed tasks to accumulate (enough to give you time to run through this procedure, the faster your upload speed the more you will need) . . Open windows to file transfer, event log and preferences . . Re-enable the network access and monitor the uploads in the file transfer and event log windows. When the last file has uploaded and the acknowledgement has appeared in event log, but BEFORE the work has been reported disable the network access again. This timing is critical as is the first step of setting NNT. I have the option to disable network access set in advance so I only need to click OK. . . Shut down Boinc and wait a short period. . . Restart BOINC, set manager to allow New Tasks. All the completed tasks should show under the tasks tab as ready to report. Re-enable the network and watch. You should get 20 resent tasks (they will show in event log as a list of resends). . . For large numbers of ghosts this will have to be repeated until all are recovered. . . If you have no tasks to upload then I don't know how you can trigger the resends. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I have noticed a few tasks running with a high resend count and initially thought people were aborting any resend they received. So I tracked a task and it appears they recovered the drive, or came up with a work around. |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
Ghosts are not the problem here ... I've been getting this: Sun 10 Feb 2019 01:00:54 AM EST | SETI@home | Sending scheduler request: To fetch work. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
I have noticed a few tasks running with a high resend count and initially thought people were aborting any resend they received. That's good news. It'll bring an early end to the Invalids & Errors. Hopefully they can re-issue those that have already errored out. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
Ghosts are not the problem here ... I've been getting this: Ghosts will reduce your Daily quota as they time out, but the biggest issue at the moment is the loss of a whole bunch of WUs that are resulting in download errors. However as you return work and it is validated, your Daily quota will increase significantly. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
Here we go- Web site & forums slow, Tasks lists not displaying, "Project has no tasks available". That time of day again (hopefully). Grant Darwin NT |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
Ghosts are not the problem here ... I've been getting this: Sun 10 Feb 2019 01:00:54 AM EST | SETI@home | Sending scheduler request: To fetch work. on this computer and the same thing on another when asking for CPU and NVIDA it will down load the same number of WU's as CPU WUs uploaded but not replace the NVIDIA WUs (This computer is epfubuntu, the other computer is epfubuntu-r) Ed F P.S. now project has no tasks available ... looks like along night ... |
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50 |
Looks like your WAG is correct, Got 4 more yesterday, but none during the last 11 hours. Also a couple of Invalids so far...again, a correct prediction :) Humans may rule the world...but bacteria run it... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
Looks like your WAG is correct, Got 4 more yesterday, but none during the last 11 hours. At least back then. Eric has posted in the News thread they got the problem system back up & online, but it will take a while for everything to re-sync. So apart from a few more errors while things re-sync, this should be the worst of messiness done with and other the the odd one here & there things should be pretty much over (for that particular issue). Grant Darwin NT |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I have noticed a few tasks running with a high resend count and initially thought people were aborting any resend they received. . . Isn't that one feature of RAID? Being able to rebuild the lost data when one drive fails? Stephen ? ? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Ghosts are not the problem here ... I've been getting this: . . You get that message when a large number of tasks have failed, such as errored out, been aborted etc. . . If you still have tasks to process I would recommend you try the process Keith outlined to you. If it shows as no ghosts then just wait for the quota to reach it's target and you will get new work as normal, should take less than a day if no further problems. If you do get resends repeat the process at intervals until there are no more ghosts. Stephen . . |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
I have noticed a few tasks running with a high resend count and initially thought people were aborting any resend they received. One or two (Raid 5 or 6). But when the entire storage server dies you're up the creek without a paddle if it stays dead. Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Back to Project has NO Tasks again... Hosts down by hundreds of tasks... Another day, another Panic. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
They started to reduce the backlog of wu and result deletions again when they fixed Georgem. Anytime the deleters crank up you can't get any work. Down to a less than a dozen gpu tasks now on one host. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.