Message boards :
Number crunching :
Panic Mode On (111) Server Problems?
Message board moderation
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 · Next
Author | Message |
---|---|
Wiggo Send message Joined: 24 Jan 00 Posts: 36613 Credit: 261,360,520 RAC: 489 |
Talking about weird hosts, how can a status get to look like this ? Usually by using an over zealous antivirus in most cases. Cheers. |
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50 |
With not much work returned, how has he gathered well over 9000 in progress on one PC ? Humans may rule the world...but bacteria run it... |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
If you can please post the HostID so we could follow. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Unfortunately, the auto-throttling mechanism of the schedulers doesn't work well or consistently. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50 |
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8158553 Humans may rule the world...but bacteria run it... |
Sirius B Send message Joined: 26 Dec 00 Posts: 24909 Credit: 3,081,182 RAC: 7 |
290 errors - all timed out. Expect the rest to time out :-( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
290 errors - all timed out. Expect the rest to time out :-( . . But the host is still contacting the servers, is it getting more work when it does ???? Stephen ?? |
Sirius B Send message Joined: 26 Dec 00 Posts: 24909 Credit: 3,081,182 RAC: 7 |
It appears that way. Last time I checked he had 27 AP's, now got 29 :-( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It appears that way. Last time I checked he had 27 AP's, now got 29 :-( . . That needs to be fixed ... :( . . But how? Stephen ?? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
It can't evidently. When MilkyWay tried, it messed up perfectly good hosts when they tried to get rid of the bad with the BOINC mechanism. The BOINC code would have to be rewritten. And we know they don't have the guts to try to do so . . . . or the actual resources to do anyway. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It can't evidently. When MilkyWay tried, it messed up perfectly good hosts when they tried to get rid of the bad with the BOINC mechanism. . . Resources would be the big issue I guess ... Stephen ? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13847 Credit: 208,696,464 RAC: 304 |
290 errors - all timed out. Expect the rest to time out :-( It's a ghost machine- it makes ghosts. Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22504 Credit: 416,307,556 RAC: 380 |
While an automated system may not work, this computer would be a prime target for a manual intervention - I'm pretty certain Eric has done a "forced stop" on errant computers in the past. And one with >9000 tasks in progress and not returning any results must be a prime candidate. I'm with Glenn on this, I'd guess that the computer doesn't actually have >9000 tasks lying around, but has had >9000 lost in transit to become ghosts. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
It can't evidently. When MilkyWay tried, it messed up perfectly good hosts when they tried to get rid of the bad with the BOINC mechanism. Actually Richard Haselgrove, who has regular contact with Eric as part of the PMC has stated numerous times that Eric is loathe to touch any part of the scheduler. And also he has said that the one person who might actually understand at least some of the code, (never named but I assume was referring to D.A.) really doesn't at all since the code is so ancient and the original programmers long gone. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13847 Credit: 208,696,464 RAC: 304 |
Actually Richard Haselgrove, who has regular contact with Eric as part of the PMC has stated numerous times that Eric is loathe to touch any part of the scheduler. And also he has said that the one person who might actually understand at least some of the code, (never named but I assume was referring to D.A.) really doesn't at all since the code is so ancient and the original programmers long gone. So it'd pretty much have to be re-written, almost from scratch. Grant Darwin NT |
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50 |
290 errors - all timed out. Expect the rest to time out :-( That is what I suspected too. Still, he gets chunks of work every day. So, when the host contacts the servers, they see the machine as being empty, even when the host page reports over 9000 tasks in progress ? There should be a way to report such hosts, and to fix the problem...even if it has to be done manually. Humans may rule the world...but bacteria run it... |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Actually Richard Haselgrove, who has regular contact with Eric as part of the PMC has stated numerous times that Eric is loathe to touch any part of the scheduler. And also he has said that the one person who might actually understand at least some of the code, (never named but I assume was referring to D.A.) really doesn't at all since the code is so ancient and the original programmers long gone. That's the gist I get out of it. Which leads back to Stephen's comment that the lack of resources is the biggest obstacle. We need a full-time system administrator ( the person tasked with contacting the bad seeds and if no reply, deleting the bad host) for keeping the servers running; a full-time developer who would rewrite the entire code base from scratch; and finally a full-time fundraiser to coordinate paying for these personnel and the operating costs. Then the project scientists could get on with real science (like Nebula) and not have to deal with the day-to day tasks of just keeping this shoe-string budget project running. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
There is already in the defacto "Invalid Host messaging" thread. But that would entail actually having staff read the thread about bad hosts and then doing something about it. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Maybe running back into our old problem of host has reached the limit of tasks in progress. Keith-Windows7 39 SETI@home 4/16/2018 13:27:20 Sending scheduler request: To fetch work. 40 SETI@home 4/16/2018 13:27:20 Reporting 4 completed tasks 41 SETI@home 4/16/2018 13:27:20 Requesting new tasks for CPU 42 SETI@home 4/16/2018 13:27:20 [sched_op] CPU work request: 221374.89 seconds; 0.00 devices 43 SETI@home 4/16/2018 13:27:20 [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices 44 SETI@home 4/16/2018 13:27:23 Scheduler request completed: got 1 new tasks 45 SETI@home 4/16/2018 13:27:23 [sched_op] Server version 709 46 SETI@home 4/16/2018 13:27:23 Project requested delay of 303 seconds 47 SETI@home 4/16/2018 13:27:23 [sched_op] estimated total CPU task duration: 5063 seconds 48 SETI@home 4/16/2018 13:27:23 [sched_op] estimated total NVIDIA GPU task duration: 0 seconds 49 SETI@home 4/16/2018 13:27:23 [sched_op] handle_scheduler_reply(): got ack for task 01ja17aa.17267.16435.14.41.253.vlar_1 50 SETI@home 4/16/2018 13:27:23 [sched_op] handle_scheduler_reply(): got ack for task 01ja17aa.16724.18889.13.40.22.vlar_1 51 SETI@home 4/16/2018 13:27:23 [sched_op] handle_scheduler_reply(): got ack for task 07ja17aa.2453.407059.9.36.7_1 52 SETI@home 4/16/2018 13:27:23 [sched_op] handle_scheduler_reply(): got ack for task blc14_2bit_guppi_58137_27894_HIP45839_0015.9893.818.22.45.20.vlar_2 53 SETI@home 4/16/2018 13:27:23 [sched_op] Deferring communication for 00:05:03 54 SETI@home 4/16/2018 13:27:23 [sched_op] Reason: requested by project 55 SETI@home 4/16/2018 13:27:25 Started download of 09ja17aa.20243.2938.9.36.95 56 SETI@home 4/16/2018 13:27:28 Finished download of 09ja17aa.20243.2938.9.36.95 57 SETI@home 4/16/2018 13:32:29 [sched_op] Starting scheduler request 58 SETI@home 4/16/2018 13:32:29 Sending scheduler request: To fetch work. 59 SETI@home 4/16/2018 13:32:29 Requesting new tasks for CPU 60 SETI@home 4/16/2018 13:32:29 [sched_op] CPU work request: 217778.85 seconds; 0.00 devices 61 SETI@home 4/16/2018 13:32:29 [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices 62 SETI@home 4/16/2018 13:32:32 Scheduler request completed: got 0 new tasks 63 SETI@home 4/16/2018 13:32:32 [sched_op] Server version 709 64 SETI@home 4/16/2018 13:32:32 No tasks sent 65 SETI@home 4/16/2018 13:32:32 This computer has reached a limit on tasks in progress Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Maybe running back into our old problem of host has reached the limit of tasks in progress. Apparently not. Just a sort hiccup it appears. Anybody want to guess the length of the outage tomorrow? Short or normal? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.