Message boards : Number crunching : Work Unit problem
Volunteer tester

Joined: 5 Jul 99
Message 621908 - Posted: 18 Aug 2007, 13:05:54 UTC - in response to Message 621880.  

Well, you could edit it in the project section of the "client_state.xml" file instead. You'll get better results.

Did that, set it to 0.9, it showed 9½ hours, then tried a few others
until settling on 0.3, shows 3 hours, MB WU's are taking less than that,
But it'll do for now, and i might get some work now, thanks.

kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Joined: 9 Jul 00
Message 621930 - Posted: 18 Aug 2007, 14:04:38 UTC - in response to Message 621908.  

Well, you could edit it in the project section of the "client_state.xml" file instead. You'll get better results.

Did that, set it to 0.9, it showed 9½ hours, then tried a few others
until settling on 0.3, shows 3 hours, MB WU's are taking less than that,
But it'll do for now, and i might get some work now, thanks.


I have reset the DCF in client state a few times. Whenever I process one of these bum WUs, it kicks the DCF way up and Boinc goes into EDF panic mode.

"Time is simply the mechanism that keeps everything from happening all at once."

Profile jason_gee
Volunteer developer
Volunteer tester

Joined: 24 Nov 06
Message 621933 - Posted: 18 Aug 2007, 14:15:20 UTC - in response to Message 621930.  
Last modified: 18 Aug 2007, 14:16:05 UTC

Just be aware there are some truly meaty workunits amongst the noisy ones.
this one I got: took quite a while but my claim is ~112 credits. I checked and thankfully the partner I am with is running 5.27 , so I'll probably get them. Not a big return for 28000 secs [old machine, used to get 62.4 cr / 14000 secs] , but far better than I've been getting for the noisy ones.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
Richard Haselgrove Project Donor
Volunteer tester

Joined: 4 Jul 99
Message 621940 - Posted: 18 Aug 2007, 14:27:09 UTC - in response to Message 621933.  

...thankfully the partner I am with is running 5.27 ...

I think you'll be all right with Byron - and more to the point, he's running BOINC v5.10.13.

December deadline, eh? That's meaty all right.
W-K 666 Project Donor
Volunteer tester

Joined: 18 May 99
Message 622002 - Posted: 18 Aug 2007, 15:23:51 UTC - in response to Message 621933.  

Just be aware there are some truly meaty workunits amongst the noisy ones.
this one I got: took quite a while but my claim is ~112 credits. I checked and thankfully the partner I am with is running 5.27 , so I'll probably get them. Not a big return for 28000 secs [old machine, used to get 62.4 cr / 14000 secs] , but far better than I've been getting for the noisy ones.

I've had similar high claim resultid=592412486 106cr in 4hrs. Most units on this computer are about 22/hr so this is a slight gain.

Profile Tklop

Joined: 11 May 03
Message 622285 - Posted: 18 Aug 2007, 23:26:07 UTC
Last modified: 18 Aug 2007, 23:28:03 UTC

Hello, again crunchers:

msattler: Here's the final outcome of that WU you and I discussed further down this thread...

Work Unit ID: 147604236 (09mr07ae.8908.15614.7.4.87)
Server State: Over
Outcome: Client Error
Client State: Compute Error
CPU Time in seconds: 54,599.63
Claimed credit: .06
Granted credit: -- (pending)

I let it run--but as you can see, it took a while to finally error out... Sadly, it looks like it sent it out again anyway though--see here:

There used to be only three result ID's (including mine) and now there are four...

So, whomever the poor sod is who gets lumped with this nasty bugger, please forgive me... Both I, and my machine worked long and hard to remove it from the queue, but to no avail!

Anyway, thought a few of you might be interested in the length of processing time involved to finally get this one to error out... (For those of you who don't feel like browsing this thread for my specs, the computer crunching that one was a Pentium M 1.86GHz, with 1GB RAM, running Windoze XP Pro).

P.S. I liked the suggested fix, but I didn't see it here, until after the beastly WU had errored out...

Even so, I offer up a big Thank You to Joe for figuring that out!


Also, yes... I too have run through some pretty monstrous but completeable work units from that bunch--and the credit payoff looks promising (once someone else actually manages to finish one! LOL)
Keep on crunching, all...
SETI@Home Forever!

___Tklop (Step-Founder, U.S. Air Force team)
Profile bounty.hunter
Volunteer tester

Joined: 22 Mar 04
Message 622581 - Posted: 19 Aug 2007, 9:47:32 UTC

I've just had one of the problem wu's complete with a -9...

Profile cRunchy
Volunteer moderator

Joined: 3 Apr 99
Message 622712 - Posted: 19 Aug 2007, 17:23:39 UTC

I just managed to get rid of a never ending WU (with the minus <triplet_thresh> issue) but now I'm getting units that state they will take 180+ hours yet finish in 8 hours.

Is this a result of this same range of problems we have been experiencing or is it a problem somewhere else?

Profile jason_gee
Volunteer developer
Volunteer tester

Joined: 24 Nov 06
Message 622717 - Posted: 19 Aug 2007, 17:28:09 UTC - in response to Message 622712.  
Last modified: 19 Aug 2007, 17:50:31 UTC

Not necessarily faulty workunits in my opinion. I have had some of those long ones finish in 5 to 6 hours on my old machines [with chicken optimised app] for about 112 credits. The prediction mechanism is up to kaka. I think it may be safe to ignore those predictions. [This isn't to say they couldn't also be noisy or faulty in some way too]

[More: My understanding, could be wrong, is that some better prediction formulae were made by Joe Segur, but somehow got left out of the current splitter code (not the app's problem, as I understand it). I also understand this may be examined when Eric gets back from holiday]

[cRunchy: I just looked at your results and I see that long one you let go. My you have much more patience than I do! :D If you want to crunch those results a bit faster, you might consider going for the chicken/lunatics app, as you are on XP it should be okay if you are game.]

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
Richard Haselgrove Project Donor
Volunteer tester

Joined: 4 Jul 99
Message 622724 - Posted: 19 Aug 2007, 17:51:50 UTC - in response to Message 622712.  
Last modified: 19 Aug 2007, 17:52:26 UTC

I just managed to get rid of a never ending WU (with the minus <triplet_thresh> issue) but now I'm getting units that state they will take 180+ hours yet finish in 8 hours.

Is this a result of this same range of problems we have been experiencing or is it a problem somewhere else?

To add to Jason's comment:

It isn't just that the estimates are long, but also because your computer thinks - for the time being - that all WUs are going to be as slow as the one you've just got rid of.

It's a safety mechanism called the 'Result Duration Correction Factor' (RDCF), and it's self-correcting: you don't have to do anything about it. As more 'normal' WUs pass through the system, your machine will gradually relax after its panic, and the estimates will get closer and closer to reality. Expect it to take about 20 or 30 WUs, though.
Profile makosky
Volunteer tester

Send message
Message 622748 - Posted: 19 Aug 2007, 18:54:08 UTC

i just got a new pc .. and my results from the old pc are still present on status page ..not showing new pc's work ...both pcs ids are on .. it tells me i have to communicate with host i have no idea what that means ..both results are being sent home help someone
Bob Foertsch

Joined: 19 Jun 99
Message 622760 - Posted: 19 Aug 2007, 19:26:59 UTC

I've had this work unit grinding away for nearly 4 days. Should I abort it? What can I do to prevent someone else from having the same problem?

WU ID is 147512879

Wed Aug 15 20:42:12 2007|SETI@home|Starting 04mr07aa.8827.23385.12.4.227_4
Wed Aug 15 20:42:13 2007|SETI@home|Starting task 04mr07aa.8827.23385.12.4.227_4 using setiathome_enhanced version 523
Thu Aug 16 12:18:05 2007|SETI@home|Task 04mr07aa.8827.23385.12.4.227_4 exited with zero status but no 'finished' file
Thu Aug 16 12:18:05 2007|SETI@home|If this happens repeatedly you may need to reset the project.
Thu Aug 16 12:18:05 2007|SETI@home|Restarting task 04mr07aa.8827.23385.12.4.227_4 using setiathome_enhanced version 523
Thu Aug 16 12:21:11 2007|SETI@home|Task 04mr07aa.8827.23385.12.4.227_4 exited with zero status but no 'finished' file
Thu Aug 16 12:21:11 2007|SETI@home|If this happens repeatedly you may need to reset the project.
Thu Aug 16 12:21:11 2007|SETI@home|Restarting task 04mr07aa.8827.23385.12.4.227_4 using setiathome_enhanced version 523
Thu Aug 16 22:57:22 2007|SETI@home|Restarting task 04mr07aa.8827.23385.12.4.227_4 using setiathome_enhanced version 523
Sun Aug 19 09:56:27 2007|SETI@home|Sending scheduler request: To fetch work
Sun Aug 19 09:56:27 2007|SETI@home|Requesting 73 seconds of new work
Sun Aug 19 09:56:32 2007|SETI@home|Scheduler RPC succeeded [server version 511]
Sun Aug 19 09:56:32 2007|SETI@home|Deferring communication for 11 sec
Sun Aug 19 09:56:32 2007|SETI@home|Reason: requested by project
Sun Aug 19 09:56:34 2007|SETI@home|[file_xfer] Started download of file 16fe07ab.13278.19704.10.5.177

It currently shows 38:28:15 CPU time, 0.048% complete, 44:09:25 to completion.
While all my other work units have the completion time go DOWN, this was goes UP!
Profile Dirk Sadowski
Volunteer tester

Joined: 6 Apr 07
Message 622972 - Posted: 20 Aug 2007, 0:52:44 UTC
Last modified: 20 Aug 2007, 0:53:46 UTC

This thread is very very long..

Is there now a recommendation what to do is with this WUs?

To now I had one WU.. ~ 2 hours and the 'completion time' gone higher and higher..

I aborted..

Josef W. Segur
Volunteer developer
Volunteer tester

Joined: 30 Oct 99
Message 623062 - Posted: 20 Aug 2007, 3:25:24 UTC - in response to Message 622972.  

This thread is very very long..

Is there now a recommendation what to do is with this WUs?

To now I had one WU.. ~ 2 hours and the 'completion time' gone higher and higher..

I aborted..

That's what most people will probably do, many others will let them run until they either overflow on Pulses or BOINC kills them for Maximum CPU time exceeded. The small proportion of users who read these forums thoroughly can't have much effect on the problem.

When the same thing happened not long after _enhanced was first released, Eric Korpela produced a script which helped clean up the situation. If he weren't on vacation he'd probably do so again, or maybe he has already told Matt or Jeff where to find the script so they can tailor it to the present situation. More likely it wasn't saved.

The Hanging workunit and odd credit claims problem solved. thread is where that cleanup was discussed, and gives figures on how many bad WUs were involved that time. The few we've identified are probably only a fraction of what's out there, but each group has 256 WUs so it's a considerable number. For the record here's the list of groups mentioned in this thread:

Profile Raistmer
Volunteer developer
Volunteer tester

Joined: 16 Jun 01
Message 623093 - Posted: 20 Aug 2007, 6:14:39 UTC
Last modified: 20 Aug 2007, 6:17:02 UTC

It seems that editing of WU header should be done on both computers participating in WU calculation. In other case WU may ended with CPU time overflow and just will be sended on another computer to slow it down one more time. Cause error limit is setted on 5 - there will be 5 slowed hosts before one WU disappears.

For example:
CPU time 82697.222107
stderr out <core_client_version>5.8.16</core_client_version>
Maximum CPU time exceeded

Joined: 20 Aug 01
Message 623743 - Posted: 21 Aug 2007, 1:33:47 UTC

I have been receiving WU with to completion times of 119 hours+. However as the WU is crunched the to completion time drops dramatically. It ends up being only 5 or 6 hours to process.
Volunteer tester

Joined: 19 Jul 99
Message 623744 - Posted: 21 Aug 2007, 1:33:49 UTC - in response to Message 623062.  

... For the record here's the list of groups mentioned in this thread:...

One more for you.

laconic .
Josef W. Segur
Volunteer developer
Volunteer tester

Joined: 30 Oct 99
Message 623772 - Posted: 21 Aug 2007, 2:39:22 UTC - in response to Message 623743.  

I have been receiving WU with to completion times of 119 hours+. However as the WU is crunched the to completion time drops dramatically. It ends up being only 5 or 6 hours to process.

If you've run one of the nasty WUs to completion, BOINC will have adjusted your DCF (Duration Correction Factor) to match its extended time. It will gradually come down as you do normal work, taking about 20 to 30 WUs to get back to normal.

If you want to fix it quickly, you can close down BOINC, find the <duration_correction_factor> entry for <project_name>SETI@home in the client_state.xml file, and edit it. Given a shown estimate of 119 hours for an unstarted WU which will actually take 5 hours, multiply the value by 5/119 or 0.042. That should be close enough, if it's slightly too low it will fully correct when one WU completes, if it's slightly high it will creep down as usual.
Profile Tklop

Joined: 11 May 03
Message 623810 - Posted: 21 Aug 2007, 4:45:43 UTC - in response to Message 623744.  

... For the record here's the list of groups mentioned in this thread:...

And yet another... (Sigh)


Thanks again, Joe, for the fix you posted... Has it been 'officially sanctioned' at this point? I only ask--because as best as I can tell, whether letting these error out on their own, or we force the error sooner, they just keep getting passed around...

Anyways, I'm hanging in here!
Keep on crunching, all...
SETI@Home Forever!

___Tklop (Step-Founder, U.S. Air Force team)
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Joined: 25 Dec 00
Message 623840 - Posted: 21 Aug 2007, 6:42:50 UTC - in response to Message 623062.

This thread is very very long..

Is there now a recommendation what to do is with this WUs?

To now I had one WU.. ~ 2 hours and the 'completion time' gone higher and higher..

I aborted..

That's what most people will probably do, many others will let them run until they either overflow on Pulses or BOINC kills them for Maximum CPU time exceeded. The small proportion of users who read these forums thoroughly can't have much effect on the problem.

When the same thing happened not long after _enhanced was first released, Eric Korpela produced a script which helped clean up the situation. If he weren't on vacation he'd probably do so again, or maybe he has already told Matt or Jeff where to find the script so they can tailor it to the present situation. More likely it wasn't saved.

The Hanging workunit and odd credit claims problem solved. thread is where that cleanup was discussed, and gives figures on how many bad WUs were involved that time. The few we've identified are probably only a fraction of what's out there, but each group has 256 WUs so it's a considerable number. For the record here's the list of groups mentioned in this thread:


