Message boards :
Number crunching :
Astropulse 6.01 WU stuck at 66.172% for a day
Message board moderation
Author | Message |
---|---|
David Honey Send message Joined: 13 Sep 11 Posts: 1 Credit: 3,157,402 RAC: 0 |
I noticed I have an Astropulse WU ap_24my11ag_B1_PO_00033_20120319_05865.wu that has been showing the same 66.172% complete for the last day. It's still running and has clocked up over 130 hours of CPU time. This is on an i7 core machine. The estimate time remaining seems to say around 64 hours and increasing. The properties window says the estimated trask size is 1821052 GFLOPS. Is this WU stuck? Should I kill it? Or is this just an unusually demanding WU? Any ideas when it might finish? Best regards, David. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Exit BOINC completely (use task manager to make sure there are no more boinc-related processes running) and start it back up. If progress continues, it is fixed. If not, reboot the machine. If that doesn't fix it, then abort. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
I noticed I have an Astropulse WU ap_24my11ag_B1_PO_00033_20120319_05865.wu that has been showing the same 66.172% complete for the last day. It's still running and has clocked up over 130 hours of CPU time. This is on an i7 core machine. The estimate time remaining seems to say around 64 hours and increasing. The properties window says the estimated trask size is 1821052 GFLOPS. Best bet is to pause computing, shut down the BOINC client and then restart the client. That will usually get a stuck WU running again. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
I noticed I have an Astropulse WU ap_24my11ag_B1_PO_00033_20120319_05865.wu that has been showing the same 66.172% complete for the last day. It's still running and has clocked up over 130 hours of CPU time. This is on an i7 core machine. The estimate time remaining seems to say around 64 hours and increasing. The properties window says the estimated trask size is 1821052 GFLOPS. I'm not saying that this is the only reason this can happen, but I am saying the only reason I've had it happen to me (where the time to completion rises like that) was when another user logged onto the computer and threw the whole thing off. I was never able to make a work unit complete without errors once that happened no matter what I did. I could get it to complete, but the result was always invalid when it finished. I'm very interested in what happens in this case. Please let us know. |
shizaru Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0 |
Got one on my original account: Edit: Wrong link Here it is: http://setiathome.berkeley.edu/workunit.php?wuid=953762652 |
TPCBF Send message Joined: 18 May 99 Posts: 54 Credit: 4,594,980 RAC: 0 |
I got the same problem on two AP 6.01 WUs on the same host. After a while, they both got stuck at 72% and 48% respectively for more than a day. As this is a working host, not just some "crunching farm" machine, I wasn't able to do much until a short while ago, when I simply rebooted that machine remotely as it is now well past working hours and for now, it seems the jobs actually continue. Is there no safeguard against something like this? They might have been sitting idle for up to 48h... Ralf |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I got the same problem on two AP 6.01 WUs on the same host. After a while, they both got stuck at 72% and 48% respectively for more than a day. As this is a working host, not just some "crunching farm" machine, I wasn't able to do much until a short while ago, when I simply rebooted that machine remotely as it is now well past working hours and for now, it seems the jobs actually continue. The only safeguard BOINC provides is to kill the task if it runs far too long. For AP the bound is set to 10 times the estimate, IOW if the original raw estimated run time were 50 hours BOINC would kill the task at 500 hours. In April 2011 I tried to get something akin to a watchdog function added to the BOINC API functions which are built into science applications, see http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2011-April/017699.html and the subsequent posts in the thread. In brief, Dr. Anderson didn't see the need. Joe |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.