Panic Mode On (111) Server Problems?

Message boards : Number crunching : Panic Mode On (111) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36613
Credit: 261,360,520
RAC: 489
Australia
Message 1930083 - Posted: 15 Apr 2018, 9:50:36 UTC - in response to Message 1930081.  

Talking about weird hosts, how can a status get to look like this ?
State: All (9551) · In progress (9259) · Validation pending (0) · Validation inconclusive (0) · Valid (0) · Invalid (0) · Error (292)
Application: All (9578) · AstroPulse v7 (27) · SETI@home v8 (9551)

Usually by using an over zealous antivirus in most cases.

Cheers.
ID: 1930083 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1930150 - Posted: 15 Apr 2018, 18:14:00 UTC - in response to Message 1930083.  


Usually by using an over zealous antivirus in most cases.
Cheers.

With not much work returned, how has he gathered well over 9000 in progress on one PC ?
Humans may rule the world...but bacteria run it...
ID: 1930150 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1930154 - Posted: 15 Apr 2018, 19:04:53 UTC - in response to Message 1930150.  


Usually by using an over zealous antivirus in most cases.
Cheers.

With not much work returned, how has he gathered well over 9000 in progress on one PC ?

If you can please post the HostID so we could follow.
ID: 1930154 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1930159 - Posted: 15 Apr 2018, 19:40:43 UTC - in response to Message 1930150.  


Usually by using an over zealous antivirus in most cases.
Cheers.

With not much work returned, how has he gathered well over 9000 in progress on one PC ?

Unfortunately, the auto-throttling mechanism of the schedulers doesn't work well or consistently.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1930159 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1930174 - Posted: 15 Apr 2018, 21:43:08 UTC - in response to Message 1930154.  


Usually by using an over zealous antivirus in most cases.
Cheers.

With not much work returned, how has he gathered well over 9000 in progress on one PC ?

If you can please post the HostID so we could follow.

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8158553
Humans may rule the world...but bacteria run it...
ID: 1930174 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24909
Credit: 3,081,182
RAC: 7
Ireland
Message 1930175 - Posted: 15 Apr 2018, 21:49:20 UTC - in response to Message 1930174.  

290 errors - all timed out. Expect the rest to time out :-(
ID: 1930175 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1930179 - Posted: 15 Apr 2018, 23:54:16 UTC - in response to Message 1930175.  

290 errors - all timed out. Expect the rest to time out :-(


. . But the host is still contacting the servers, is it getting more work when it does ????

Stephen

??
ID: 1930179 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24909
Credit: 3,081,182
RAC: 7
Ireland
Message 1930181 - Posted: 15 Apr 2018, 23:58:30 UTC - in response to Message 1930179.  

It appears that way. Last time I checked he had 27 AP's, now got 29 :-(
ID: 1930181 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1930193 - Posted: 16 Apr 2018, 0:23:27 UTC - in response to Message 1930181.  

It appears that way. Last time I checked he had 27 AP's, now got 29 :-(


. . That needs to be fixed ... :(

. . But how?

Stephen

??
ID: 1930193 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1930203 - Posted: 16 Apr 2018, 1:28:51 UTC - in response to Message 1930193.  

It can't evidently. When MilkyWay tried, it messed up perfectly good hosts when they tried to get rid of the bad with the BOINC mechanism.

The BOINC code would have to be rewritten. And we know they don't have the guts to try to do so . . . . or the actual resources to do anyway.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1930203 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1930218 - Posted: 16 Apr 2018, 3:25:50 UTC - in response to Message 1930203.  

It can't evidently. When MilkyWay tried, it messed up perfectly good hosts when they tried to get rid of the bad with the BOINC mechanism.

The BOINC code would have to be rewritten. And we know they don't have the guts to try to do so . . . . or the actual resources to do anyway.


. . Resources would be the big issue I guess ...

Stephen

?
ID: 1930218 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13847
Credit: 208,696,464
RAC: 304
Australia
Message 1930228 - Posted: 16 Apr 2018, 4:45:23 UTC - in response to Message 1930179.  

290 errors - all timed out. Expect the rest to time out :-(


. . But the host is still contacting the servers, is it getting more work when it does ????

It's a ghost machine- it makes ghosts.
Grant
Darwin NT
ID: 1930228 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22504
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1930230 - Posted: 16 Apr 2018, 5:08:47 UTC

While an automated system may not work, this computer would be a prime target for a manual intervention - I'm pretty certain Eric has done a "forced stop" on errant computers in the past. And one with >9000 tasks in progress and not returning any results must be a prime candidate.

I'm with Glenn on this, I'd guess that the computer doesn't actually have >9000 tasks lying around, but has had >9000 lost in transit to become ghosts.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1930230 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1930250 - Posted: 16 Apr 2018, 6:54:54 UTC - in response to Message 1930218.  

It can't evidently. When MilkyWay tried, it messed up perfectly good hosts when they tried to get rid of the bad with the BOINC mechanism.

The BOINC code would have to be rewritten. And we know they don't have the guts to try to do so . . . . or the actual resources to do anyway.


. . Resources would be the big issue I guess ...

Stephen

?

Actually Richard Haselgrove, who has regular contact with Eric as part of the PMC has stated numerous times that Eric is loathe to touch any part of the scheduler. And also he has said that the one person who might actually understand at least some of the code, (never named but I assume was referring to D.A.) really doesn't at all since the code is so ancient and the original programmers long gone.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1930250 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13847
Credit: 208,696,464
RAC: 304
Australia
Message 1930251 - Posted: 16 Apr 2018, 7:10:09 UTC - in response to Message 1930250.  

Actually Richard Haselgrove, who has regular contact with Eric as part of the PMC has stated numerous times that Eric is loathe to touch any part of the scheduler. And also he has said that the one person who might actually understand at least some of the code, (never named but I assume was referring to D.A.) really doesn't at all since the code is so ancient and the original programmers long gone.

So it'd pretty much have to be re-written, almost from scratch.
Grant
Darwin NT
ID: 1930251 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1930252 - Posted: 16 Apr 2018, 7:20:20 UTC - in response to Message 1930228.  

290 errors - all timed out. Expect the rest to time out :-(


. . But the host is still contacting the servers, is it getting more work when it does ????

It's a ghost machine- it makes ghosts.

That is what I suspected too.
Still, he gets chunks of work every day. So, when the host contacts the servers, they see the machine as being empty,
even when the host page reports over 9000 tasks in progress ?
There should be a way to report such hosts, and to fix the problem...even if it has to be done manually.
Humans may rule the world...but bacteria run it...
ID: 1930252 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1930253 - Posted: 16 Apr 2018, 7:24:22 UTC - in response to Message 1930251.  

Actually Richard Haselgrove, who has regular contact with Eric as part of the PMC has stated numerous times that Eric is loathe to touch any part of the scheduler. And also he has said that the one person who might actually understand at least some of the code, (never named but I assume was referring to D.A.) really doesn't at all since the code is so ancient and the original programmers long gone.

So it'd pretty much have to be re-written, almost from scratch.

That's the gist I get out of it. Which leads back to Stephen's comment that the lack of resources is the biggest obstacle. We need a full-time system administrator ( the person tasked with contacting the bad seeds and if no reply, deleting the bad host) for keeping the servers running; a full-time developer who would rewrite the entire code base from scratch; and finally a full-time fundraiser to coordinate paying for these personnel and the operating costs.

Then the project scientists could get on with real science (like Nebula) and not have to deal with the day-to day tasks of just keeping this shoe-string budget project running.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1930253 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1930254 - Posted: 16 Apr 2018, 7:31:11 UTC - in response to Message 1930252.  


There should be a way to report such hosts, and to fix the problem...even if it has to be done manually.

There is already in the defacto "Invalid Host messaging" thread. But that would entail actually having staff read the thread about bad hosts and then doing something about it.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1930254 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1930366 - Posted: 16 Apr 2018, 20:37:19 UTC

Maybe running back into our old problem of host has reached the limit of tasks in progress.

Keith-Windows7

39 SETI@home 4/16/2018 13:27:20 Sending scheduler request: To fetch work.
40 SETI@home 4/16/2018 13:27:20 Reporting 4 completed tasks
41 SETI@home 4/16/2018 13:27:20 Requesting new tasks for CPU
42 SETI@home 4/16/2018 13:27:20 [sched_op] CPU work request: 221374.89 seconds; 0.00 devices
43 SETI@home 4/16/2018 13:27:20 [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
44 SETI@home 4/16/2018 13:27:23 Scheduler request completed: got 1 new tasks
45 SETI@home 4/16/2018 13:27:23 [sched_op] Server version 709
46 SETI@home 4/16/2018 13:27:23 Project requested delay of 303 seconds
47 SETI@home 4/16/2018 13:27:23 [sched_op] estimated total CPU task duration: 5063 seconds
48 SETI@home 4/16/2018 13:27:23 [sched_op] estimated total NVIDIA GPU task duration: 0 seconds
49 SETI@home 4/16/2018 13:27:23 [sched_op] handle_scheduler_reply(): got ack for task 01ja17aa.17267.16435.14.41.253.vlar_1
50 SETI@home 4/16/2018 13:27:23 [sched_op] handle_scheduler_reply(): got ack for task 01ja17aa.16724.18889.13.40.22.vlar_1
51 SETI@home 4/16/2018 13:27:23 [sched_op] handle_scheduler_reply(): got ack for task 07ja17aa.2453.407059.9.36.7_1
52 SETI@home 4/16/2018 13:27:23 [sched_op] handle_scheduler_reply(): got ack for task blc14_2bit_guppi_58137_27894_HIP45839_0015.9893.818.22.45.20.vlar_2
53 SETI@home 4/16/2018 13:27:23 [sched_op] Deferring communication for 00:05:03
54 SETI@home 4/16/2018 13:27:23 [sched_op] Reason: requested by project
55 SETI@home 4/16/2018 13:27:25 Started download of 09ja17aa.20243.2938.9.36.95
56 SETI@home 4/16/2018 13:27:28 Finished download of 09ja17aa.20243.2938.9.36.95
57 SETI@home 4/16/2018 13:32:29 [sched_op] Starting scheduler request
58 SETI@home 4/16/2018 13:32:29 Sending scheduler request: To fetch work.
59 SETI@home 4/16/2018 13:32:29 Requesting new tasks for CPU
60 SETI@home 4/16/2018 13:32:29 [sched_op] CPU work request: 217778.85 seconds; 0.00 devices
61 SETI@home 4/16/2018 13:32:29 [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
62 SETI@home 4/16/2018 13:32:32 Scheduler request completed: got 0 new tasks
63 SETI@home 4/16/2018 13:32:32 [sched_op] Server version 709
64 SETI@home 4/16/2018 13:32:32 No tasks sent
65 SETI@home 4/16/2018 13:32:32 This computer has reached a limit on tasks in progress
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1930366 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1930409 - Posted: 16 Apr 2018, 23:57:22 UTC

Maybe running back into our old problem of host has reached the limit of tasks in progress.

Apparently not. Just a sort hiccup it appears. Anybody want to guess the length of the outage tomorrow? Short or normal?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1930409 · Report as offensive
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 · Next

Message boards : Number crunching : Panic Mode On (111) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.