Excessive "Time to compleation" estimate!

Message boards : Number crunching : Excessive "Time to compleation" estimate!
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 893058 - Posted: 9 May 2009, 14:24:11 UTC
Last modified: 9 May 2009, 14:24:54 UTC

Can anyone offer a suggestion as to why this has occurred?
I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"?


Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 893058 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 893060 - Posted: 9 May 2009, 14:33:24 UTC - in response to Message 893058.  

Can anyone offer a suggestion as to why this has occurred?
I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"?


eh Dorsai - You might 'un-hide' your computers so as to allow one to look @ the Issue [just a suggestion]

and, Welcome back to the Boards btw . . .


BOINC Wiki . . .

Science Status Page . . .
ID: 893060 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 893109 - Posted: 9 May 2009, 19:01:18 UTC - in response to Message 893058.  

Can anyone offer a suggestion as to why this has occurred?
I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"?

Recently, the most common cause of "duration correction factor" (DCF) being unreasonably high has been running VLAR WUs with the CUDA version of S@H Enhanced. But it can happen any time a WU runs much longer than it should, BOINC then thinks all subsequent work will also be very slow. Astropulse work occasionally has trouble restarting because the checkpoint file isn't totally reliable, then the work is restarted at the beginning which increases total runtime; if that happened a few times on one WU it could also drive DCF to the levels you're seeing.

DCF was designed to make sure a host doesn't get more work than it can do within deadline, and was made very pessimistic. It's use in the estimate shown by BOINC Manager is basically just because they wanted to transition smoothly from that initial estimate to showing estimates based on progress. The transition is an even blend so at 26.411% the estimate is still almost 75% based on DCF.
                                                                 Joe
ID: 893109 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 893116 - Posted: 9 May 2009, 19:07:42 UTC - in response to Message 893058.  

Can anyone offer a suggestion as to why this has occurred?
I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"?

Normal DCF for Seti is under 1 for most modern systems. You could go into the client_state.xml file and edit the DCF to a more reasonable value. Just make sure Boinc and the applications have all stop first.
ID: 893116 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 893184 - Posted: 9 May 2009, 21:59:49 UTC - in response to Message 893058.  

Can anyone offer a suggestion as to why this has occurred?
I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"?

Duration Correction Factor is, and should be, a multiplier.

I'm just going to add one observation: It is far better for BOINC to over estimate run time (and not fetch new work) than it is to under-estimate and miss deadlines.

For that reason, DCF tends to increase very quickly, and decrease slowly.

You can track it down in your client_state.xml file and fix it, or you can just let it correct itself.

ID: 893184 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 893275 - Posted: 10 May 2009, 5:10:21 UTC

Part of the problem is BOINC only has one DCF per project. It doesn't maintain them by application. There is an enhancement request in to address this, but if/when it will happen is anybody's guess.

In the mean time you will see the DCF will jump around depending on the last task it finished. If you are running all the Seti work it can vary between AP, AP_v5, MB (cpu) and MB (cuda), which is why the estimates are all over the place.
BOINC blog
ID: 893275 · Report as offensive
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 893300 - Posted: 10 May 2009, 8:25:53 UTC - in response to Message 893184.  
Last modified: 10 May 2009, 8:43:09 UTC

Duration Correction Factor is, and should be, a multiplier.

I'm just going to add one observation: It is far better for BOINC to over estimate run time (and not fetch new work) than it is to under-estimate and miss deadlines.

For that reason, DCF tends to increase very quickly, and decrease slowly.

You can track it down in your client_state.xml file and fix it, or you can just let it correct itself.


That is of it self not a problem.
What I meant was that if the DCF was .5 (IE it will take half as long as estimated) then 20 x 0.5 is 10, but it looks like it is doing 20 / 0.5 which is 40.

What seems to be happening is that every time I finish an astropulse WU faster than expected the next one I get is expected to take even longer.
IE,
Do a 100 hour one in 50 hours, (Half the time) and the next one is expected to take 200 hours (twice as long, rather than half as long!)
So when I then do this 200 hour one in 50, the next one is expeted to take 4 times as long, and so on.

Host in question is this:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=4026476

It's not causing any problems, as they take that long that I would not want a queue of work, and as soon as one is almost finished I get another, but it seems that the estimates for time are going geometrically up, rather than slowly going down.

Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 893300 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 893441 - Posted: 10 May 2009, 17:09:28 UTC - in response to Message 893300.  

...
What seems to be happening is that every time I finish an astropulse WU faster than expected the next one I get is expected to take even longer.
IE,
Do a 100 hour one in 50 hours, (Half the time) and the next one is expected to take 200 hours (twice as long, rather than half as long!)
So when I then do this 200 hour one in 50, the next one is expeted to take 4 times as long, and so on.

Host in question is this:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=4026476

It's not causing any problems, as they take that long that I would not want a queue of work, and as soon as one is almost finished I get another, but it seems that the estimates for time are going geometrically up, rather than slowly going down.

BOINC 6.4.7 does estimates based on elapsed (wall) time, but there were further changes related to that in later versions. The elapsed time accounting is probably less stable than CPU time since it's more affected by other host activity. In any case 6.4.7 shows CPU time used and estimated time in wall time, which may be contributing some part of the apparent mismatch. Reported time for a task is still CPU time, too.

I note the host's results for both AP and MB work show fairly frequent restarts, and nearly half are at the same progress point indicating a new checkpoint hadn't been reached. If you don't actually need all 2 GB of memory for other activities, setting the "Leave applications in memory while suspended?" preference to Yes would improve efficiency and maybe help the estimates problem.
                                                             Joe
ID: 893441 · Report as offensive
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 894252 - Posted: 13 May 2009, 18:03:14 UTC - in response to Message 893441.  

...
What seems to be happening is that every time I finish an astropulse WU faster than expected the next one I get is expected to take even longer.
IE,
Do a 100 hour one in 50 hours, (Half the time) and the next one is expected to take 200 hours (twice as long, rather than half as long!)
So when I then do this 200 hour one in 50, the next one is expeted to take 4 times as long, and so on.

Host in question is this:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=4026476

It's not causing any problems, as they take that long that I would not want a queue of work, and as soon as one is almost finished I get another, but it seems that the estimates for time are going geometrically up, rather than slowly going down.

BOINC 6.4.7 does estimates based on elapsed (wall) time, but there were further changes related to that in later versions. The elapsed time accounting is probably less stable than CPU time since it's more affected by other host activity. In any case 6.4.7 shows CPU time used and estimated time in wall time, which may be contributing some part of the apparent mismatch. Reported time for a task is still CPU time, too.

I note the host's results for both AP and MB work show fairly frequent restarts, and nearly half are at the same progress point indicating a new checkpoint hadn't been reached. If you don't actually need all 2 GB of memory for other activities, setting the "Leave applications in memory while suspended?" preference to Yes would improve efficiency and maybe help the estimates problem.
                                                             Joe


Will try that. Ty. :)

(you never know what you are doing is screwing it, until you're told "Stop screwing it before I screw you!")


Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 894252 · Report as offensive

Message boards : Number crunching : Excessive "Time to compleation" estimate!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.