Panic Mode On (115) Server Problems?

Message boards : Number crunching : Panic Mode On (115) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 31 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1981795 - Posted: 23 Feb 2019, 7:37:21 UTC - in response to Message 1981786.  

Somewhere along the way, someone posted a solution that involved editing client_state and adding something or other to the end of the file. I would have to search posts to figure out when and where the post is. For some reason I think it might have been from TBar or Jord?

Or brute force and just edit out the recalcitrant task entry in client_state?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1981795 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1981796 - Posted: 23 Feb 2019, 7:37:40 UTC - in response to Message 1981786.  
Last modified: 23 Feb 2019, 7:39:48 UTC

. . So does anyone have any ideas on how to get Boinc Manager to 'realise' that this task is no longer here? I am sick of seeing it sitting there taunting me ... :(
Give this a read http://setiathome.berkeley.edu/forum_thread.php?id=82054&postid=1895513#1895513 Jord's answer has worked twice for me.

EDIT: It is simply adding <ready_to_report/> to the client_state which has the report info in it.
ID: 1981796 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1981815 - Posted: 23 Feb 2019, 11:56:27 UTC - in response to Message 1981795.  

Somewhere along the way, someone posted a solution that involved editing client_state and adding something or other to the end of the file. I would have to search posts to figure out when and where the post is. For some reason I think it might have been from TBar or Jord?

Or brute force and just edit out the recalcitrant task entry in client_state?


. . Hi Keith,

. . I considered that but not knowing if there is some kind of checksum on the file I was afraid of corrupting it and trashing the whole cache, or worse.

Stephen

:(
ID: 1981815 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1981819 - Posted: 23 Feb 2019, 12:30:54 UTC - in response to Message 1981796.  

. . So does anyone have any ideas on how to get Boinc Manager to 'realise' that this task is no longer here? I am sick of seeing it sitting there taunting me ... :(
Give this a read http://setiathome.berkeley.edu/forum_thread.php?id=82054&postid=1895513#1895513 Jord's answer has worked twice for me.

EDIT: It is simply adding <ready_to_report/> to the client_state which has the report info in it.


. . Ruh roh! Oops!

. . I followed the instructions, found the stderr section for the stuck task, added the <ready_to_report/> in the appropriate place and restarted Boinc. But the listing did not change and when the client reported the other completed tasks this one remains as before. So I stopped Boinc again and re-opened client_state to check my typing, and this is where the oops comes into it. There is now no entry for this task in client_state at all, yet there it is, still taunting me in the Manager listing :( aarrrggghhh! :(

. . How is that ???

Stephen

:(
ID: 1981819 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1981822 - Posted: 23 Feb 2019, 13:44:54 UTC - in response to Message 1981819.  

It has to be in the client_state file, since that is where the Manager get's its file list from.
ID: 1981822 · Report as offensive
Profile Bill Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 282
Credit: 6,916,194
RAC: 60
United States
Message 1981899 - Posted: 23 Feb 2019, 20:31:17 UTC - in response to Message 1981778.  

Yup, still downloading one GPU task at a time.

2/23/2019 2:22:01 PM | SETI@home | [sched_op] Starting scheduler request
2/23/2019 2:22:01 PM | SETI@home | Sending scheduler request: To fetch work.
2/23/2019 2:22:01 PM | SETI@home | Requesting new tasks for CPU
2/23/2019 2:22:01 PM | SETI@home | [sched_op] CPU work request: 1612515.13 seconds; 0.00 devices
2/23/2019 2:22:01 PM | SETI@home | [sched_op] AMD/ATI GPU work request: 0.00 seconds; 0.00 devices
2/23/2019 2:22:02 PM | SETI@home | Scheduler request completed: got 0 new tasks
2/23/2019 2:22:02 PM | SETI@home | [sched_op] Server version 709
2/23/2019 2:22:02 PM | SETI@home | Not sending work - last request too recent: 211 sec
2/23/2019 2:22:02 PM | SETI@home | Project requested delay of 303 seconds
2/23/2019 2:22:02 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
2/23/2019 2:22:02 PM | SETI@home | [sched_op] Reason: requested by project
2/23/2019 2:28:13 PM | SETI@home | Computation for task 21fe19aa.3967.370403.8.35.191_0 finished
2/23/2019 2:28:13 PM | SETI@home | Starting task blc31_2bit_guppi_58406_14875_HIP2978_0070.7976.818.23.46.149.vlar_0
2/23/2019 2:28:13 PM | SETI@home | [sched_op] Starting scheduler request
2/23/2019 2:28:13 PM | SETI@home | Sending scheduler request: To fetch work.
2/23/2019 2:28:13 PM | SETI@home | Requesting new tasks for CPU and AMD/ATI GPU
2/23/2019 2:28:13 PM | SETI@home | [sched_op] CPU work request: 1613575.28 seconds; 0.00 devices
2/23/2019 2:28:13 PM | SETI@home | [sched_op] AMD/ATI GPU work request: 1.00 seconds; 1.00 devices
2/23/2019 2:28:15 PM | SETI@home | Started upload of 21fe19aa.3967.370403.8.35.191_0_r354457254_0
2/23/2019 2:28:15 PM | SETI@home | Scheduler request completed: got 1 new tasks
2/23/2019 2:28:15 PM | SETI@home | [sched_op] Server version 709
2/23/2019 2:28:15 PM | SETI@home | Project requested delay of 303 seconds
2/23/2019 2:28:15 PM | SETI@home | [sched_op] estimated total CPU task duration: 0 seconds
2/23/2019 2:28:15 PM | SETI@home | [sched_op] estimated total AMD/ATI GPU task duration: 3054 seconds
2/23/2019 2:28:15 PM | SETI@home | [sched_op] Deferring communication for 00:05:03
2/23/2019 2:28:15 PM | SETI@home | [sched_op] Reason: requested by project

My understanding is that my computer is not requesting any AMD/ATI work, and therefore no tasks are being downloaded. I assume that once the GPU task is completed then it requests one task. Why it does this, I am not sure, unless it is somehow tied to the peak_gflops task.

If this is being a bigger problem, I can move this conversation to another thread.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1981899 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1981903 - Posted: 23 Feb 2019, 20:39:51 UTC
Last modified: 23 Feb 2019, 20:41:33 UTC

A question (Richard?)
Now you've got a "sensible" peak_flops on that computer would it work to flip back to the current release version of BOINC, as it is believed that the version Bill is running has a fetch bug in it?

Just checked - peak_flops is still crazy high, so cancel that thought....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1981903 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1639
Credit: 12,921,799
RAC: 89
New Zealand
Message 1981906 - Posted: 23 Feb 2019, 20:49:52 UTC
Last modified: 23 Feb 2019, 20:52:29 UTC

As I write we are returning over 133,000 results an hour. I'm guessing there is some noise bombs out there.I returned -9 result_overflow with a run time of 3 minutes 47 seconds
ID: 1981906 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1981911 - Posted: 23 Feb 2019, 21:18:19 UTC - in response to Message 1981903.  

A question (Richard?)
Now you've got a "sensible" peak_flops on that computer would it work to flip back to the current release version of BOINC, as it is believed that the version Bill is running has a fetch bug in it?

Just checked - peak_flops is still crazy high, so cancel that thought....
The idea of the first test BOINC client was that the crazy high peak flops would be caught and brought back down to earth.

That worked, but one of David's anti-cheating measures from nine years ago ignored the value calculated by the client, and worked peak flops out again on the server - and got the same crazy high value.

So the next step is to patch the server. That's been done and tested at Beta (which is what Beta is for, right?), but I haven't yet heard if the server patch has been transferred to Main.

And then there's a third (by my count) client patch, which has again been tested already, which means that even if the client uses its crazy peak flops value, or the server uses its crazy peak flops value, still doesn't throw away the data with error 197.

But means that if you get caught by a real looper, such as happened to me recently at Einstein, it will loop for a minimum of 12 hours before being killed.

Next question?
ID: 1981911 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1639
Credit: 12,921,799
RAC: 89
New Zealand
Message 1981920 - Posted: 23 Feb 2019, 22:25:17 UTC - in response to Message 1981911.  


But means that if you get caught by a real looper, such as happened to me recently at Einstein, it will loop for a minimum of 12 hours before being killed.

Next question?

Richard I knew able to show us/me an example of a looper and explain what you mean? I have never heard of a description of a task being called a looper before. Interested to learn
ID: 1981920 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1981922 - Posted: 23 Feb 2019, 22:51:33 UTC - in response to Message 1981822.  

It has to be in the client_state file, since that is where the Manager get's its file list from.


. . That is what I thought. I will check it again but I have searched forwards and backwards and nothing. Still, it is a long filename, maybe I made a typo in the search string ... I will double check ...

Stephen

?
ID: 1981922 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1981923 - Posted: 23 Feb 2019, 22:53:28 UTC - in response to Message 1981906.  

As I write we are returning over 133,000 results an hour. I'm guessing there is some noise bombs out there.I returned -9 result_overflow with a run time of 3 minutes 47 seconds


. . There is a high rate of noise bombs in the current workload. But the return rate has been much higher than 133,000.

Stephen

:)
ID: 1981923 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1981930 - Posted: 23 Feb 2019, 23:17:23 UTC - in response to Message 1981920.  

But means that if you get caught by a real looper, such as happened to me recently at Einstein, it will loop for a minimum of 12 hours before being killed.

Next question?
Richard I knew able to show us/me an example of a looper and explain what you mean? I have never heard of a description of a task being called a looper before. Interested to learn
The particular case is described in Another Intel GPU OpenCL thread.

I was testing the existing einsteinbinary_BRP4 version 134 (opencl-intel_gpu-Beta) application on new hardware, specifically an ultra-portable Intel(R) Core(TM) i5-8250U CPU with UHD Graphics 620. It worked through the preliminary setup procedure for the task (in about 3 seconds), but then failed to make any real impact on the GPU part: no progress was recorded, and no checkpoints were written. But it occupied a whole CPU core (25% of the 4-core CPU) - that's unusual for this particular application, which uses about 2% - 3% of a core when running on older hardware.

My interpretation is that the code is following an endless loop - when it should check its internal status and conclude that it's ready to move into a different part of the program, instead it goes back to an earlier point and tries again. In programming, that's called an endless loop (not desirable): I decided to summarise it as a 'looper'. My coinage.
ID: 1981930 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1981934 - Posted: 23 Feb 2019, 23:35:26 UTC - in response to Message 1981796.  

. . So does anyone have any ideas on how to get Boinc Manager to 'realise' that this task is no longer here? I am sick of seeing it sitting there taunting me ... :(
Give this a read http://setiathome.berkeley.edu/forum_thread.php?id=82054&postid=1895513#1895513 Jord's answer has worked twice for me.

EDIT: It is simply adding <ready_to_report/> to the client_state which has the report info in it.


. . OK, it happened. I have now trashed client_state.xml and have killed off boinc on this machine.

. . The extemely long file names do not fit in Notepad's search window so I must have missed a typo in the early part of the filename. On rechecking I found the listing was indeed still there. I also realised that part of the text shown in the message from Jord was missing, there was <stderr_txt> but no </stderr_txt> and other similar discrepancies. Following long standing advice I copied the text from Jord's message to add them to the file rather than type but I have just realised that in Jord's message the text was 'enclosed' so I probably copied extraneous characters as well. When I restarted Boinc it came up empty. Client_state was 730K (approx) it is now 59K ...

. . I repeat ... aaarrrgghhh!

. . I don't suppose Boinc keeps a security copy of client_state somewhere? Other than client_state_prev that is, because it is also 59K :(

. . I think I'm screwed on this one.

Stephen

:(
ID: 1981934 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1639
Credit: 12,921,799
RAC: 89
New Zealand
Message 1981935 - Posted: 23 Feb 2019, 23:46:13 UTC - in response to Message 1981923.  

As I write we are returning over 133,000 results an hour. I'm guessing there is some noise bombs out there.I returned -9 result_overflow with a run time of 3 minutes 47 seconds


. . There is a high rate of noise bombs in the current workload. But the return rate has been much higher than 133,000.

Stephen

:)

Thanks for your feedback Stephen. Yes I agree the return rate has been much higher up into the 140's. The current return rate is 134,740
ID: 1981935 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1639
Credit: 12,921,799
RAC: 89
New Zealand
Message 1981936 - Posted: 23 Feb 2019, 23:51:45 UTC - in response to Message 1981930.  

Thank you Richard for your explanation. I am sure you will get to the bottom of the looper issue
ID: 1981936 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1981941 - Posted: 24 Feb 2019, 0:29:29 UTC - in response to Message 1981934.  

For one, coping Jord's message would have got you reporting the task was run as sse3, which you don't have.

It seems BOINC flagged it now as a corrupt Client_state file and recreated it. If the _prev file is the same, the tasks are either aborted or ghosted.
It it possible you have to reattach to the project again as well.
ID: 1981941 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1981951 - Posted: 24 Feb 2019, 1:25:01 UTC - in response to Message 1981941.  
Last modified: 24 Feb 2019, 1:26:19 UTC

For one, coping Jord's message would have got you reporting the task was run as sse3, which you don't have.

It seems BOINC flagged it now as a corrupt Client_state file and recreated it. If the _prev file is the same, the tasks are either aborted or ghosted.
It it possible you have to reattach to the project again as well.


. . So there is no coming back from it ... :( oh well I guess I'll spend the next week doing ghost recoveries.

. . I guess I should just try to get some new work then ??

Stephen

:(
ID: 1981951 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1981969 - Posted: 24 Feb 2019, 5:43:19 UTC - in response to Message 1981941.  

For one, coping Jord's message would have got you reporting the task was run as sse3, which you don't have.

It seems BOINC flagged it now as a corrupt Client_state file and recreated it. If the _prev file is the same, the tasks are either aborted or ghosted.
It it possible you have to reattach to the project again as well.


. . Being thankful for small mercies, I did not have to re-attach to the project, but I have almost 300 ghosts and I am about to begin the 1st ghost recovery as soon as there are 30 completed tasks to report. Being an optimist, at least that damned phantom task is gone .... :(

Stephen

<shrug>
ID: 1981969 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1981993 - Posted: 24 Feb 2019, 12:17:38 UTC - in response to Message 1981969.  

Haha Steven, It may be gone for now but not forgotten.
It will be among the 300 ghosts and will be coming back for round 2 :D
ID: 1981993 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (115) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.