Lost/Missing Workunits

Message boards : Number crunching : Lost/Missing Workunits
Message board moderation

To post messages, you must log in.

AuthorMessage
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 904708 - Posted: 7 Jun 2009, 9:41:02 UTC
Last modified: 7 Jun 2009, 9:41:37 UTC

When I query my account from BOINC Mgr (6.6.20) Projects tab for SETI@home/Your Results, I see over 100 wu as "In Progress". My machines are both 8-core machines, and have about 30 or so wu in queue. And I process more than that every day. So there are about 30 or so listed as "In Progress" that I know nothing about, and am confused about why they are "In Progress". (They are the ones more than a day old).
How can I find out why they are there? And what can I do about them? They are obviously holding up credit for the other users that I am paired with on them.
Some of them may be due to a power outage I had a few days ago...
What info do you need from me to track them down?
Thanks,
Jon Ravin
ID: 904708 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 904742 - Posted: 7 Jun 2009, 11:04:47 UTC - in response to Message 904708.  

These may be ghost WUs that are sent and not received by your machine, there is little that you can do and just have to wait until the dates are overdue when they will be sent out again.
ID: 904742 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 904765 - Posted: 7 Jun 2009, 12:44:17 UTC - in response to Message 904742.  

Actually, it turned out to be my bad (with one exception) - or, as Roseanne Rosannadanna used to say, "Nevermind".

On May 31 and June 1 I switched my machines to the SSE3 64bit Optimized MB App (which increased my throughput by about 75%, by the way). At the time, I lost all of the WU I had in my queues. They just disappeared... (I hadn't realized that would happen). The WU from those 2 dates in my "In Progress" list are (I assume) those missing WU. There is also one dated May 21; maybe that's a "ghost".

Is there any way of returning the lost WU to my queues? Or at least marking them as available in the database, since they are not going to get finished until they are distributed to other users late this month, if left untouched?
ID: 904765 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 904767 - Posted: 7 Jun 2009, 12:54:00 UTC

The only way that I know of is to detach the host from SETI, then re-attach. You'll probably want to run your current cache down to nothing first - no point in causing a resend for work which you do have available to crunch.

I'm planning to do that this afternoon to release the ~150 tasks which the faulty BOINC v6.6.34 gobbled up - I'll let you know how I get on.
ID: 904767 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 904802 - Posted: 7 Jun 2009, 14:40:45 UTC - in response to Message 904767.  

Thanks for the suggestion. Please let me know if it works!
ID: 904802 · Report as offensive
Ray
Volunteer tester

Send message
Joined: 16 Jun 99
Posts: 30
Credit: 1,323,477
RAC: 0
Canada
Message 904832 - Posted: 7 Jun 2009, 15:46:10 UTC - in response to Message 904708.  

I have just checked one of your pc's (4912909) and you have a total of 531 results (Details), 37 (In progress), 313 (Pending), 153 (Valid) and 7 (Error). As of 11:21 a.m. Eastern time.

The only work units you see under BOINC will be the work units that are either waiting to be processed and ones that have been processed but not reported.

The ones "In Process" are ones that you're pc has yet to do,"Pending" are the ones you have completed but your wing man has yet to do. "Valid" are the ones that have been verified by the SETI server.

The "Your Results" button in the BOINC Mgr (6.6.20) is showing all the results from all 3 of your pc's that are attached to SETI.

Hope this gives you a bit of clarification into things.


ID: 904832 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 904874 - Posted: 7 Jun 2009, 17:25:28 UTC - in response to Message 904802.  

Thanks for the suggestion. Please let me know if it works!

Yes, it worked - you can see the results on host 3755243.

Notes:

1) If you use any optimised applications, BACK UP YOUR PROJECT FOLDER. It gets deleted when you detach. Best order is complete last task --> report completed tasks (using project update button) --> back up project directory --> detach --> restore project directory --> re-attach.

2) It's b****y difficult to get three CPU cores and a CUDA card to run out of work at the same time!

3) And it's best not to try to do it on a weekend when the servers are playing up particularly badly....

4) As you can see, I got my old Host ID back (no special action on my part). But you'll have to look for the 'client detached' results yourself (tasks issued 4 June and before), because none of the filters will find them.

5) Work estimates, cache size etc. will be screwy for a little while. The first CUDA task after the re-attach was a shorty: estimated over an hour, completed in 7m:30s as usual.
ID: 904874 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 904934 - Posted: 7 Jun 2009, 19:45:31 UTC - in response to Message 904832.  

I have just checked one of your pc's (4912909) and you have a total of 531 results (Details), 37 (In progress), 313 (Pending), 153 (Valid) and 7 (Error). As of 11:21 a.m. Eastern time.

The only work units you see under BOINC will be the work units that are either waiting to be processed and ones that have been processed but not reported.

The ones "In Process" are ones that you're pc has yet to do,"Pending" are the ones you have completed but your wing man has yet to do. "Valid" are the ones that have been verified by the SETI server.

The "Your Results" button in the BOINC Mgr (6.6.20) is showing all the results from all 3 of your pc's that are attached to SETI.

Hope this gives you a bit of clarification into things.



Ray - thanks; I had already figured most of this out. There are some "In Progress" ones that are from 1 June and 31 May that are the ones I am having a problem with, and were the ones I originally asked about that were blown away by changing to the optimized app = apparently BOINC or SETI discards the in queue or not completed WUs when changing to the Opt. apps.
Jon
ID: 904934 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 905031 - Posted: 7 Jun 2009, 23:27:00 UTC - in response to Message 904934.  

...
There are some "In Progress" ones that are from 1 June and 31 May that are the ones I am having a problem with, and were the ones I originally asked about that were blown away by changing to the optimized app = apparently BOINC or SETI discards the in queue or not completed WUs when changing to the Opt. apps.
Jon

It's a matter of matching the app version in the upgrade to the version chosen to do the work when it was downloaded. And an "app version" for BOINC 6.6.x actually includes 4 different criteria; the name, number, platform, and plan_class. Your transition was from stock 6.03 (which is a 32 bit application) to the 64 bit optimized version, there has been some uncertainty about how to handle the effective platform change.

For most users upgrading from stock to an optimized app there shouldn't be a loss of work, but not all transitions have been exhaustively tested. Each test would involve transitioning back to stock first and that's at least a nuisance.
                                                               Joe
ID: 905031 · Report as offensive
Gads

Send message
Joined: 3 Dec 99
Posts: 11
Credit: 1,356,605
RAC: 0
United States
Message 909026 - Posted: 19 Jun 2009, 2:28:03 UTC

I was working through my first AstroPulse assignment and about 4 hours from completion when I upgraded my hard drives and was unable to restore the image I'd created to the new drive. As a result I lost the AstroPulse work and several Multi-Beam tasks waiting to run and will not be able to return them. Will a failure to return the AstroPulse work unit by the deadline disqualify me from running them in the future because of a perceived inability to process the data in a timely manner? It appeared that one core in my X4 processor could crunch a work unit in about 48 CPU hours so once the pipeline was filled I could probably return about one or so complete units each day.
ID: 909026 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 909038 - Posted: 19 Jun 2009, 3:23:53 UTC - in response to Message 909026.  

Will a failure to return the AstroPulse work unit by the deadline disqualify me from running them in the future because of a perceived inability to process the data in a timely manner?

No.

ID: 909038 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 909041 - Posted: 19 Jun 2009, 3:31:10 UTC - in response to Message 909026.  

I was working through my first AstroPulse assignment and about 4 hours from completion when I upgraded my hard drives and was unable to restore the image I'd created to the new drive. As a result I lost the AstroPulse work and several Multi-Beam tasks waiting to run and will not be able to return them. Will a failure to return the AstroPulse work unit by the deadline disqualify me from running them in the future because of a perceived inability to process the data in a timely manner? It appeared that one core in my X4 processor could crunch a work unit in about 48 CPU hours so once the pipeline was filled I could probably return about one or so complete units each day.
You have the phenom 9500. If you use the optimized app for SSE3 astropulse you'll be finishing AP WU's in about 24-27 hours


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 909041 · Report as offensive

Message boards : Number crunching : Lost/Missing Workunits


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.