Message boards :
Number crunching :
Panic Mode On (115) Server Problems?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 31 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Somewhere along the way, someone posted a solution that involved editing client_state and adding something or other to the end of the file. I would have to search posts to figure out when and where the post is. For some reason I think it might have been from TBar or Jord? Or brute force and just edit out the recalcitrant task entry in client_state? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
. . So does anyone have any ideas on how to get Boinc Manager to 'realise' that this task is no longer here? I am sick of seeing it sitting there taunting me ... :(Give this a read http://setiathome.berkeley.edu/forum_thread.php?id=82054&postid=1895513#1895513 Jord's answer has worked twice for me. EDIT: It is simply adding <ready_to_report/> to the client_state which has the report info in it. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Somewhere along the way, someone posted a solution that involved editing client_state and adding something or other to the end of the file. I would have to search posts to figure out when and where the post is. For some reason I think it might have been from TBar or Jord? . . Hi Keith, . . I considered that but not knowing if there is some kind of checksum on the file I was afraid of corrupting it and trashing the whole cache, or worse. Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . So does anyone have any ideas on how to get Boinc Manager to 'realise' that this task is no longer here? I am sick of seeing it sitting there taunting me ... :(Give this a read http://setiathome.berkeley.edu/forum_thread.php?id=82054&postid=1895513#1895513 Jord's answer has worked twice for me. . . Ruh roh! Oops! . . I followed the instructions, found the stderr section for the stuck task, added the <ready_to_report/> in the appropriate place and restarted Boinc. But the listing did not change and when the client reported the other completed tasks this one remains as before. So I stopped Boinc again and re-opened client_state to check my typing, and this is where the oops comes into it. There is now no entry for this task in client_state at all, yet there it is, still taunting me in the Manager listing :( aarrrggghhh! :( . . How is that ??? Stephen :( |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
It has to be in the client_state file, since that is where the Manager get's its file list from. |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
Yup, still downloading one GPU task at a time. 2/23/2019 2:22:01 PM | SETI@home | [sched_op] Starting scheduler request 2/23/2019 2:22:01 PM | SETI@home | Sending scheduler request: To fetch work. 2/23/2019 2:22:01 PM | SETI@home | Requesting new tasks for CPU 2/23/2019 2:22:01 PM | SETI@home | [sched_op] CPU work request: 1612515.13 seconds; 0.00 devices 2/23/2019 2:22:01 PM | SETI@home | [sched_op] AMD/ATI GPU work request: 0.00 seconds; 0.00 devices 2/23/2019 2:22:02 PM | SETI@home | Scheduler request completed: got 0 new tasks 2/23/2019 2:22:02 PM | SETI@home | [sched_op] Server version 709 2/23/2019 2:22:02 PM | SETI@home | Not sending work - last request too recent: 211 sec 2/23/2019 2:22:02 PM | SETI@home | Project requested delay of 303 seconds 2/23/2019 2:22:02 PM | SETI@home | [sched_op] Deferring communication for 00:05:03 2/23/2019 2:22:02 PM | SETI@home | [sched_op] Reason: requested by project 2/23/2019 2:28:13 PM | SETI@home | Computation for task 21fe19aa.3967.370403.8.35.191_0 finished 2/23/2019 2:28:13 PM | SETI@home | Starting task blc31_2bit_guppi_58406_14875_HIP2978_0070.7976.818.23.46.149.vlar_0 2/23/2019 2:28:13 PM | SETI@home | [sched_op] Starting scheduler request 2/23/2019 2:28:13 PM | SETI@home | Sending scheduler request: To fetch work. 2/23/2019 2:28:13 PM | SETI@home | Requesting new tasks for CPU and AMD/ATI GPU 2/23/2019 2:28:13 PM | SETI@home | [sched_op] CPU work request: 1613575.28 seconds; 0.00 devices 2/23/2019 2:28:13 PM | SETI@home | [sched_op] AMD/ATI GPU work request: 1.00 seconds; 1.00 devices 2/23/2019 2:28:15 PM | SETI@home | Started upload of 21fe19aa.3967.370403.8.35.191_0_r354457254_0 2/23/2019 2:28:15 PM | SETI@home | Scheduler request completed: got 1 new tasks 2/23/2019 2:28:15 PM | SETI@home | [sched_op] Server version 709 2/23/2019 2:28:15 PM | SETI@home | Project requested delay of 303 seconds 2/23/2019 2:28:15 PM | SETI@home | [sched_op] estimated total CPU task duration: 0 seconds 2/23/2019 2:28:15 PM | SETI@home | [sched_op] estimated total AMD/ATI GPU task duration: 3054 seconds 2/23/2019 2:28:15 PM | SETI@home | [sched_op] Deferring communication for 00:05:03 2/23/2019 2:28:15 PM | SETI@home | [sched_op] Reason: requested by project My understanding is that my computer is not requesting any AMD/ATI work, and therefore no tasks are being downloaded. I assume that once the GPU task is completed then it requests one task. Why it does this, I am not sure, unless it is somehow tied to the peak_gflops task. If this is being a bigger problem, I can move this conversation to another thread. Seti@home classic: 1,456 results, 1.613 years CPU time |
rob smith Send message Joined: 7 Mar 03 Posts: 22391 Credit: 416,307,556 RAC: 380 |
A question (Richard?) Now you've got a "sensible" peak_flops on that computer would it work to flip back to the current release version of BOINC, as it is believed that the version Bill is running has a fetch bug in it? Just checked - peak_flops is still crazy high, so cancel that thought.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
As I write we are returning over 133,000 results an hour. I'm guessing there is some noise bombs out there.I returned -9 result_overflow with a run time of 3 minutes 47 seconds |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14666 Credit: 200,643,578 RAC: 874 |
A question (Richard?)The idea of the first test BOINC client was that the crazy high peak flops would be caught and brought back down to earth. That worked, but one of David's anti-cheating measures from nine years ago ignored the value calculated by the client, and worked peak flops out again on the server - and got the same crazy high value. So the next step is to patch the server. That's been done and tested at Beta (which is what Beta is for, right?), but I haven't yet heard if the server patch has been transferred to Main. And then there's a third (by my count) client patch, which has again been tested already, which means that even if the client uses its crazy peak flops value, or the server uses its crazy peak flops value, still doesn't throw away the data with error 197. But means that if you get caught by a real looper, such as happened to me recently at Einstein, it will loop for a minimum of 12 hours before being killed. Next question? |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
Richard I knew able to show us/me an example of a looper and explain what you mean? I have never heard of a description of a task being called a looper before. Interested to learn |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It has to be in the client_state file, since that is where the Manager get's its file list from. . . That is what I thought. I will check it again but I have searched forwards and backwards and nothing. Still, it is a long filename, maybe I made a typo in the search string ... I will double check ... Stephen ? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
As I write we are returning over 133,000 results an hour. I'm guessing there is some noise bombs out there.I returned -9 result_overflow with a run time of 3 minutes 47 seconds . . There is a high rate of noise bombs in the current workload. But the return rate has been much higher than 133,000. Stephen :) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14666 Credit: 200,643,578 RAC: 874 |
The particular case is described in Another Intel GPU OpenCL thread.But means that if you get caught by a real looper, such as happened to me recently at Einstein, it will loop for a minimum of 12 hours before being killed.Richard I knew able to show us/me an example of a looper and explain what you mean? I have never heard of a description of a task being called a looper before. Interested to learn I was testing the existing einsteinbinary_BRP4 version 134 (opencl-intel_gpu-Beta) application on new hardware, specifically an ultra-portable Intel(R) Core(TM) i5-8250U CPU with UHD Graphics 620. It worked through the preliminary setup procedure for the task (in about 3 seconds), but then failed to make any real impact on the GPU part: no progress was recorded, and no checkpoints were written. But it occupied a whole CPU core (25% of the 4-core CPU) - that's unusual for this particular application, which uses about 2% - 3% of a core when running on older hardware. My interpretation is that the code is following an endless loop - when it should check its internal status and conclude that it's ready to move into a different part of the program, instead it goes back to an earlier point and tries again. In programming, that's called an endless loop (not desirable): I decided to summarise it as a 'looper'. My coinage. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . So does anyone have any ideas on how to get Boinc Manager to 'realise' that this task is no longer here? I am sick of seeing it sitting there taunting me ... :(Give this a read http://setiathome.berkeley.edu/forum_thread.php?id=82054&postid=1895513#1895513 Jord's answer has worked twice for me. . . OK, it happened. I have now trashed client_state.xml and have killed off boinc on this machine. . . The extemely long file names do not fit in Notepad's search window so I must have missed a typo in the early part of the filename. On rechecking I found the listing was indeed still there. I also realised that part of the text shown in the message from Jord was missing, there was <stderr_txt> but no </stderr_txt> and other similar discrepancies. Following long standing advice I copied the text from Jord's message to add them to the file rather than type but I have just realised that in Jord's message the text was 'enclosed' so I probably copied extraneous characters as well. When I restarted Boinc it came up empty. Client_state was 730K (approx) it is now 59K ... . . I repeat ... aaarrrgghhh! . . I don't suppose Boinc keeps a security copy of client_state somewhere? Other than client_state_prev that is, because it is also 59K :( . . I think I'm screwed on this one. Stephen :( |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
As I write we are returning over 133,000 results an hour. I'm guessing there is some noise bombs out there.I returned -9 result_overflow with a run time of 3 minutes 47 seconds Thanks for your feedback Stephen. Yes I agree the return rate has been much higher up into the 140's. The current return rate is 134,740 |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
Thank you Richard for your explanation. I am sure you will get to the bottom of the looper issue |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
For one, coping Jord's message would have got you reporting the task was run as sse3, which you don't have. It seems BOINC flagged it now as a corrupt Client_state file and recreated it. If the _prev file is the same, the tasks are either aborted or ghosted. It it possible you have to reattach to the project again as well. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
For one, coping Jord's message would have got you reporting the task was run as sse3, which you don't have. . . So there is no coming back from it ... :( oh well I guess I'll spend the next week doing ghost recoveries. . . I guess I should just try to get some new work then ?? Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
For one, coping Jord's message would have got you reporting the task was run as sse3, which you don't have. . . Being thankful for small mercies, I did not have to re-attach to the project, but I have almost 300 ghosts and I am about to begin the 1st ghost recovery as soon as there are 30 completed tasks to report. Being an optimist, at least that damned phantom task is gone .... :( Stephen <shrug> |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Haha Steven, It may be gone for now but not forgotten. It will be among the 300 ghosts and will be coming back for round 2 :D |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.