Work Unit problem

Message boards : Number crunching : Work Unit problem
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 621908 - Posted: 18 Aug 2007, 13:05:54 UTC - in response to Message 621880.  

Well, you could edit it in the project section of the "client_state.xml" file instead. You'll get better results.


Did that, set it to 0.9, it showed 9½ hours, then tried a few others
until settling on 0.3, shows 3 hours, MB WU's are taking less than that,
But it'll do for now, and i might get some work now, thanks.

Claggy.
ID: 621908 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 621930 - Posted: 18 Aug 2007, 14:04:38 UTC - in response to Message 621908.  

Well, you could edit it in the project section of the "client_state.xml" file instead. You'll get better results.


Did that, set it to 0.9, it showed 9½ hours, then tried a few others
until settling on 0.3, shows 3 hours, MB WU's are taking less than that,
But it'll do for now, and i might get some work now, thanks.

Claggy.


I have reset the DCF in client state a few times. Whenever I process one of these bum WUs, it kicks the DCF way up and Boinc goes into EDF panic mode.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 621930 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 621933 - Posted: 18 Aug 2007, 14:15:20 UTC - in response to Message 621930.  
Last modified: 18 Aug 2007, 14:16:05 UTC

Just be aware there are some truly meaty workunits amongst the noisy ones.
this one I got: http://setiathome.berkeley.edu/workunit.php?wuid=148360386 took quite a while but my claim is ~112 credits. I checked and thankfully the partner I am with is running 5.27 , so I'll probably get them. Not a big return for 28000 secs [old machine, used to get 62.4 cr / 14000 secs] , but far better than I've been getting for the noisy ones.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 621933 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 621940 - Posted: 18 Aug 2007, 14:27:09 UTC - in response to Message 621933.  

...thankfully the partner I am with is running 5.27 ...

I think you'll be all right with Byron - and more to the point, he's running BOINC v5.10.13.

December deadline, eh? That's meaty all right.
ID: 621940 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19407
Credit: 40,757,560
RAC: 67
United Kingdom
Message 622002 - Posted: 18 Aug 2007, 15:23:51 UTC - in response to Message 621933.  

Just be aware there are some truly meaty workunits amongst the noisy ones.
this one I got: http://setiathome.berkeley.edu/workunit.php?wuid=148360386 took quite a while but my claim is ~112 credits. I checked and thankfully the partner I am with is running 5.27 , so I'll probably get them. Not a big return for 28000 secs [old machine, used to get 62.4 cr / 14000 secs] , but far better than I've been getting for the noisy ones.

I've had similar high claim resultid=592412486 106cr in 4hrs. Most units on this computer are about 22/hr so this is a slight gain.

Andy
ID: 622002 · Report as offensive
Profile Tklop
Avatar

Send message
Joined: 11 May 03
Posts: 175
Credit: 613,952
RAC: 0
United States
Message 622285 - Posted: 18 Aug 2007, 23:26:07 UTC
Last modified: 18 Aug 2007, 23:28:03 UTC

Hello, again crunchers:

msattler: Here's the final outcome of that WU you and I discussed further down this thread...

Work Unit ID: 147604236 (09mr07ae.8908.15614.7.4.87)
Server State: Over
Outcome: Client Error
Client State: Compute Error
CPU Time in seconds: 54,599.63
Claimed credit: .06
Granted credit: -- (pending)

I let it run--but as you can see, it took a while to finally error out... Sadly, it looks like it sent it out again anyway though--see here:

http://setiathome.berkeley.edu/workunit.php?wuid=147604236

There used to be only three result ID's (including mine) and now there are four...

So, whomever the poor sod is who gets lumped with this nasty bugger, please forgive me... Both I, and my machine worked long and hard to remove it from the queue, but to no avail!

Anyway, thought a few of you might be interested in the length of processing time involved to finally get this one to error out... (For those of you who don't feel like browsing this thread for my specs, the computer crunching that one was a Pentium M 1.86GHz, with 1GB RAM, running Windoze XP Pro).

P.S. I liked the suggested fix, but I didn't see it here, until after the beastly WU had errored out...

Even so, I offer up a big Thank You to Joe for figuring that out!

[edit]

Also, yes... I too have run through some pretty monstrous but completeable work units from that bunch--and the credit payoff looks promising (once someone else actually manages to finish one! LOL)
Keep on crunching, all...
SETI@Home Forever!


___Tklop (Step-Founder, U.S. Air Force team)
ID: 622285 · Report as offensive
Profile bounty.hunter
Volunteer tester
Avatar

Send message
Joined: 22 Mar 04
Posts: 442
Credit: 459,063
RAC: 0
India
Message 622581 - Posted: 19 Aug 2007, 9:47:32 UTC

I've just had one of the problem wu's complete with a -9...

593535643
ID: 622581 · Report as offensive
Profile cRunchy
Volunteer moderator
Avatar

Send message
Joined: 3 Apr 99
Posts: 3555
Credit: 1,920,030
RAC: 3
United Kingdom
Message 622712 - Posted: 19 Aug 2007, 17:23:39 UTC


I just managed to get rid of a never ending WU (with the minus <triplet_thresh> issue) but now I'm getting units that state they will take 180+ hours yet finish in 8 hours.

Is this a result of this same range of problems we have been experiencing or is it a problem somewhere else?


ID: 622712 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 622717 - Posted: 19 Aug 2007, 17:28:09 UTC - in response to Message 622712.  
Last modified: 19 Aug 2007, 17:50:31 UTC

Not necessarily faulty workunits in my opinion. I have had some of those long ones finish in 5 to 6 hours on my old machines [with chicken optimised app] for about 112 credits. The prediction mechanism is up to kaka. I think it may be safe to ignore those predictions. [This isn't to say they couldn't also be noisy or faulty in some way too]

[More: My understanding, could be wrong, is that some better prediction formulae were made by Joe Segur, but somehow got left out of the current splitter code (not the app's problem, as I understand it). I also understand this may be examined when Eric gets back from holiday]

[cRunchy: I just looked at your results and I see that long one you let go. My you have much more patience than I do! :D If you want to crunch those results a bit faster, you might consider going for the chicken/lunatics app, as you are on XP it should be okay if you are game.]

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 622717 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 622724 - Posted: 19 Aug 2007, 17:51:50 UTC - in response to Message 622712.  
Last modified: 19 Aug 2007, 17:52:26 UTC

I just managed to get rid of a never ending WU (with the minus <triplet_thresh> issue) but now I'm getting units that state they will take 180+ hours yet finish in 8 hours.

Is this a result of this same range of problems we have been experiencing or is it a problem somewhere else?

To add to Jason's comment:

It isn't just that the estimates are long, but also because your computer thinks - for the time being - that all WUs are going to be as slow as the one you've just got rid of.

It's a safety mechanism called the 'Result Duration Correction Factor' (RDCF), and it's self-correcting: you don't have to do anything about it. As more 'normal' WUs pass through the system, your machine will gradually relax after its panic, and the estimates will get closer and closer to reality. Expect it to take about 20 or 30 WUs, though.
ID: 622724 · Report as offensive
Profile makosky
Volunteer tester

Send message
Joined: 7 Jul 00
Posts: 56
Credit: 3,908,782
RAC: 0
United States
Message 622748 - Posted: 19 Aug 2007, 18:54:08 UTC

i just got a new pc .. and my results from the old pc are still present on status page ..not showing new pc's work ...both pcs ids are on .. it tells me i have to communicate with host i have no idea what that means ..both results are being sent home help someone
ID: 622748 · Report as offensive
Bob Foertsch

Send message
Joined: 19 Jun 99
Posts: 25
Credit: 11,969,667
RAC: 25
United States
Message 622760 - Posted: 19 Aug 2007, 19:26:59 UTC

I've had this work unit grinding away for nearly 4 days. Should I abort it? What can I do to prevent someone else from having the same problem?

WU ID is 147512879

Wed Aug 15 20:42:12 2007|SETI@home|Starting 04mr07aa.8827.23385.12.4.227_4
Wed Aug 15 20:42:13 2007|SETI@home|Starting task 04mr07aa.8827.23385.12.4.227_4 using setiathome_enhanced version 523
Thu Aug 16 12:18:05 2007|SETI@home|Task 04mr07aa.8827.23385.12.4.227_4 exited with zero status but no 'finished' file
Thu Aug 16 12:18:05 2007|SETI@home|If this happens repeatedly you may need to reset the project.
Thu Aug 16 12:18:05 2007|SETI@home|Restarting task 04mr07aa.8827.23385.12.4.227_4 using setiathome_enhanced version 523
Thu Aug 16 12:21:11 2007|SETI@home|Task 04mr07aa.8827.23385.12.4.227_4 exited with zero status but no 'finished' file
Thu Aug 16 12:21:11 2007|SETI@home|If this happens repeatedly you may need to reset the project.
Thu Aug 16 12:21:11 2007|SETI@home|Restarting task 04mr07aa.8827.23385.12.4.227_4 using setiathome_enhanced version 523
Thu Aug 16 22:57:22 2007|SETI@home|Restarting task 04mr07aa.8827.23385.12.4.227_4 using setiathome_enhanced version 523
Sun Aug 19 09:56:27 2007|SETI@home|Sending scheduler request: To fetch work
Sun Aug 19 09:56:27 2007|SETI@home|Requesting 73 seconds of new work
Sun Aug 19 09:56:32 2007|SETI@home|Scheduler RPC succeeded [server version 511]
Sun Aug 19 09:56:32 2007|SETI@home|Deferring communication for 11 sec
Sun Aug 19 09:56:32 2007|SETI@home|Reason: requested by project
Sun Aug 19 09:56:34 2007|SETI@home|[file_xfer] Started download of file 16fe07ab.13278.19704.10.5.177

It currently shows 38:28:15 CPU time, 0.048% complete, 44:09:25 to completion.
While all my other work units have the completion time go DOWN, this was goes UP!
ID: 622760 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 622972 - Posted: 20 Aug 2007, 0:52:44 UTC
Last modified: 20 Aug 2007, 0:53:46 UTC



This thread is very very long..

Is there now a recommendation what to do is with this WUs?

To now I had one WU.. ~ 2 hours and the 'completion time' gone higher and higher..


http://setiathome.berkeley.edu/workunit.php?wuid=147939514

I aborted..


ID: 622972 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 623062 - Posted: 20 Aug 2007, 3:25:24 UTC - in response to Message 622972.  



This thread is very very long..

Is there now a recommendation what to do is with this WUs?

To now I had one WU.. ~ 2 hours and the 'completion time' gone higher and higher..


http://setiathome.berkeley.edu/workunit.php?wuid=147939514

I aborted..

That's what most people will probably do, many others will let them run until they either overflow on Pulses or BOINC kills them for Maximum CPU time exceeded. The small proportion of users who read these forums thoroughly can't have much effect on the problem.

When the same thing happened not long after _enhanced was first released, Eric Korpela produced a script which helped clean up the situation. If he weren't on vacation he'd probably do so again, or maybe he has already told Matt or Jeff where to find the script so they can tailor it to the present situation. More likely it wasn't saved.

The Hanging workunit and odd credit claims problem solved. thread is where that cleanup was discussed, and gives figures on how many bad WUs were involved that time. The few we've identified are probably only a fraction of what's out there, but each group has 256 WUs so it's a considerable number. For the record here's the list of groups mentioned in this thread:

04mr07ab.10282.4980
04mr07ab.14840.4980
04mr07ab.14852.4980
04mr07ab.32128.5798
04mr07ab.7106.5389
04mr07ab.7106.5798
04mr07ab.7106.6207
04mr07ab.7106.6616
05mr07aa.12210.24612
05mr07aa.12591.24612
05mr07aa.15859.24612
05mr07aa.32396.24612
05mr07aa.3769.20522
05mr07ab.6072.369046
05mr07ab.7301.368637
09mr07ae.8908.15614
                                                              Joe
ID: 623062 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 623093 - Posted: 20 Aug 2007, 6:14:39 UTC
Last modified: 20 Aug 2007, 6:17:02 UTC

It seems that editing of WU header should be done on both computers participating in WU calculation. In other case WU may ended with CPU time overflow and just will be sended on another computer to slow it down one more time. Cause error limit is setted on 5 - there will be 5 slowed hosts before one WU disappears.

For example:
http://setiathome.berkeley.edu/workunit.php?wuid=148035389
CPU time 82697.222107
stderr out <core_client_version>5.8.16</core_client_version>
<![CDATA[
<message>
Maximum CPU time exceeded
</message>
]]>


ID: 623093 · Report as offensive
CougarKy

Send message
Joined: 20 Aug 01
Posts: 5
Credit: 4,076,741
RAC: 1
United States
Message 623743 - Posted: 21 Aug 2007, 1:33:47 UTC

I have been receiving WU with to completion times of 119 hours+. However as the WU is crunched the to completion time drops dramatically. It ends up being only 5 or 6 hours to process.
ID: 623743 · Report as offensive
laconic
Volunteer tester

Send message
Joined: 19 Jul 99
Posts: 5
Credit: 1,280,678
RAC: 0
United States
Message 623744 - Posted: 21 Aug 2007, 1:33:49 UTC - in response to Message 623062.  

... For the record here's the list of groups mentioned in this thread:...

One more for you.

04mr07aa.16181.20522
laconic .
ID: 623744 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 623772 - Posted: 21 Aug 2007, 2:39:22 UTC - in response to Message 623743.  

I have been receiving WU with to completion times of 119 hours+. However as the WU is crunched the to completion time drops dramatically. It ends up being only 5 or 6 hours to process.

If you've run one of the nasty WUs to completion, BOINC will have adjusted your DCF (Duration Correction Factor) to match its extended time. It will gradually come down as you do normal work, taking about 20 to 30 WUs to get back to normal.

If you want to fix it quickly, you can close down BOINC, find the <duration_correction_factor> entry for <project_name>SETI@home in the client_state.xml file, and edit it. Given a shown estimate of 119 hours for an unstarted WU which will actually take 5 hours, multiply the value by 5/119 or 0.042. That should be close enough, if it's slightly too low it will fully correct when one WU completes, if it's slightly high it will creep down as usual.
                                                                Joe
ID: 623772 · Report as offensive
Profile Tklop
Avatar

Send message
Joined: 11 May 03
Posts: 175
Credit: 613,952
RAC: 0
United States
Message 623810 - Posted: 21 Aug 2007, 4:45:43 UTC - in response to Message 623744.  

... For the record here's the list of groups mentioned in this thread:...


And yet another... (Sigh)

04mr07ab.10898.4571

Thanks again, Joe, for the fix you posted... Has it been 'officially sanctioned' at this point? I only ask--because as best as I can tell, whether letting these error out on their own, or we force the error sooner, they just keep getting passed around...

Anyways, I'm hanging in here!
Keep on crunching, all...
SETI@Home Forever!


___Tklop (Step-Founder, U.S. Air Force team)
ID: 623810 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 31015
Credit: 53,134,872
RAC: 32
United States
Message 623840 - Posted: 21 Aug 2007, 6:42:50 UTC - in response to Message 623062.  

http://setiathome.berkeley.edu/workunit.php?wuid=147633867


This thread is very very long..

Is there now a recommendation what to do is with this WUs?

To now I had one WU.. ~ 2 hours and the 'completion time' gone higher and higher..


http://setiathome.berkeley.edu/workunit.php?wuid=147939514

I aborted..

That's what most people will probably do, many others will let them run until they either overflow on Pulses or BOINC kills them for Maximum CPU time exceeded. The small proportion of users who read these forums thoroughly can't have much effect on the problem.

When the same thing happened not long after _enhanced was first released, Eric Korpela produced a script which helped clean up the situation. If he weren't on vacation he'd probably do so again, or maybe he has already told Matt or Jeff where to find the script so they can tailor it to the present situation. More likely it wasn't saved.

The Hanging workunit and odd credit claims problem solved. thread is where that cleanup was discussed, and gives figures on how many bad WUs were involved that time. The few we've identified are probably only a fraction of what's out there, but each group has 256 WUs so it's a considerable number. For the record here's the list of groups mentioned in this thread:

04mr07ab.10282.4980
04mr07ab.14840.4980
04mr07ab.14852.4980
04mr07ab.32128.5798
04mr07ab.7106.5389
04mr07ab.7106.5798
04mr07ab.7106.6207
04mr07ab.7106.6616
05mr07aa.12210.24612
05mr07aa.12591.24612
05mr07aa.15859.24612
05mr07aa.32396.24612
05mr07aa.3769.20522
05mr07ab.6072.369046
05mr07ab.7301.368637
09mr07ae.8908.15614
                                                              Joe


ID: 623840 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Work Unit problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.