Astropulse 6.01 WU stuck at 66.172% for a day

Message boards : Number crunching : Astropulse 6.01 WU stuck at 66.172% for a day
Message board moderation

To post messages, you must log in.

AuthorMessage
David Honey

Send message
Joined: 13 Sep 11
Posts: 1
Credit: 3,157,402
RAC: 0
United Kingdom
Message 1210788 - Posted: 28 Mar 2012, 0:01:19 UTC

I noticed I have an Astropulse WU ap_24my11ag_B1_PO_00033_20120319_05865.wu that has been showing the same 66.172% complete for the last day. It's still running and has clocked up over 130 hours of CPU time. This is on an i7 core machine. The estimate time remaining seems to say around 64 hours and increasing. The properties window says the estimated trask size is 1821052 GFLOPS.

Is this WU stuck? Should I kill it?
Or is this just an unusually demanding WU?
Any ideas when it might finish?

Best regards,
David.
ID: 1210788 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1210803 - Posted: 28 Mar 2012, 0:26:13 UTC

Exit BOINC completely (use task manager to make sure there are no more boinc-related processes running) and start it back up. If progress continues, it is fixed. If not, reboot the machine. If that doesn't fix it, then abort.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1210803 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1210805 - Posted: 28 Mar 2012, 0:27:20 UTC - in response to Message 1210788.  

I noticed I have an Astropulse WU ap_24my11ag_B1_PO_00033_20120319_05865.wu that has been showing the same 66.172% complete for the last day. It's still running and has clocked up over 130 hours of CPU time. This is on an i7 core machine. The estimate time remaining seems to say around 64 hours and increasing. The properties window says the estimated trask size is 1821052 GFLOPS.

Is this WU stuck? Should I kill it?
Or is this just an unusually demanding WU?
Any ideas when it might finish?

Best regards,
David.


Best bet is to pause computing, shut down the BOINC client and then restart the client. That will usually get a stuck WU running again.

ID: 1210805 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1210870 - Posted: 28 Mar 2012, 4:23:38 UTC - in response to Message 1210788.  

I noticed I have an Astropulse WU ap_24my11ag_B1_PO_00033_20120319_05865.wu that has been showing the same 66.172% complete for the last day. It's still running and has clocked up over 130 hours of CPU time. This is on an i7 core machine. The estimate time remaining seems to say around 64 hours and increasing. The properties window says the estimated trask size is 1821052 GFLOPS.

Is this WU stuck? Should I kill it?
Or is this just an unusually demanding WU?
Any ideas when it might finish?

Best regards,
David.


I'm not saying that this is the only reason this can happen, but I am saying the only reason I've had it happen to me (where the time to completion rises like that) was when another user logged onto the computer and threw the whole thing off. I was never able to make a work unit complete without errors once that happened no matter what I did. I could get it to complete, but the result was always invalid when it finished.

I'm very interested in what happens in this case. Please let us know.
ID: 1210870 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1210934 - Posted: 28 Mar 2012, 8:15:38 UTC
Last modified: 28 Mar 2012, 8:18:00 UTC

Got one on my original account:

Edit: Wrong link

Here it is:

http://setiathome.berkeley.edu/workunit.php?wuid=953762652
ID: 1210934 · Report as offensive
TPCBF

Send message
Joined: 18 May 99
Posts: 54
Credit: 4,594,980
RAC: 0
United States
Message 1211423 - Posted: 29 Mar 2012, 5:06:40 UTC

I got the same problem on two AP 6.01 WUs on the same host. After a while, they both got stuck at 72% and 48% respectively for more than a day. As this is a working host, not just some "crunching farm" machine, I wasn't able to do much until a short while ago, when I simply rebooted that machine remotely as it is now well past working hours and for now, it seems the jobs actually continue.

Is there no safeguard against something like this? They might have been sitting idle for up to 48h...

Ralf
ID: 1211423 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1211590 - Posted: 29 Mar 2012, 16:57:05 UTC - in response to Message 1211423.  

I got the same problem on two AP 6.01 WUs on the same host. After a while, they both got stuck at 72% and 48% respectively for more than a day. As this is a working host, not just some "crunching farm" machine, I wasn't able to do much until a short while ago, when I simply rebooted that machine remotely as it is now well past working hours and for now, it seems the jobs actually continue.

Is there no safeguard against something like this? They might have been sitting idle for up to 48h...

Ralf

The only safeguard BOINC provides is to kill the task if it runs far too long. For AP the bound is set to 10 times the estimate, IOW if the original raw estimated run time were 50 hours BOINC would kill the task at 500 hours.

In April 2011 I tried to get something akin to a watchdog function added to the BOINC API functions which are built into science applications, see http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2011-April/017699.html and the subsequent posts in the thread. In brief, Dr. Anderson didn't see the need.
                                                                  Joe
ID: 1211590 · Report as offensive

Message boards : Number crunching : Astropulse 6.01 WU stuck at 66.172% for a day


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.