Excessive "Time to compleation" estimate!

Author	Message
Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0	Message 893058 - Posted: 9 May 2009, 14:24:11 UTC Last modified: 9 May 2009, 14:24:54 UTC Can anyone offer a suggestion as to why this has occurred? I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"? Foamy is "Lord and Master". (Oh, + some Classic WUs too.) ID: 893058 ·

Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0	Message 893060 - Posted: 9 May 2009, 14:33:24 UTC - in response to Message 893058. Can anyone offer a suggestion as to why this has occurred? I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"? eh Dorsai - You might 'un-hide' your computers so as to allow one to look @ the Issue [just a suggestion] and, Welcome back to the Boards btw . . . BOINC Wiki . . . Science Status Page . . . ID: 893060 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 893109 - Posted: 9 May 2009, 19:01:18 UTC - in response to Message 893058. Can anyone offer a suggestion as to why this has occurred? I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"? Recently, the most common cause of "duration correction factor" (DCF) being unreasonably high has been running VLAR WUs with the CUDA version of S@H Enhanced. But it can happen any time a WU runs much longer than it should, BOINC then thinks all subsequent work will also be very slow. Astropulse work occasionally has trouble restarting because the checkpoint file isn't totally reliable, then the work is restarted at the beginning which increases total runtime; if that happened a few times on one WU it could also drive DCF to the levels you're seeing. DCF was designed to make sure a host doesn't get more work than it can do within deadline, and was made very pessimistic. It's use in the estimate shown by BOINC Manager is basically just because they wanted to transition smoothly from that initial estimate to showing estimates based on progress. The transition is an even blend so at 26.411% the estimate is still almost 75% based on DCF. Joe ID: 893109 ·

Aurora Borealis Volunteer tester Send message Joined: 14 Jan 01 Posts: 3075 Credit: 5,631,463 RAC: 0	Message 893116 - Posted: 9 May 2009, 19:07:42 UTC - in response to Message 893058. Can anyone offer a suggestion as to why this has occurred? I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"? Normal DCF for Seti is under 1 for most modern systems. You could go into the client_state.xml file and edit the DCF to a more reasonable value. Just make sure Boinc and the applications have all stop first. ID: 893116 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 893184 - Posted: 9 May 2009, 21:59:49 UTC - in response to Message 893058. Can anyone offer a suggestion as to why this has occurred? I realise it won't actually take 1200+ to finish, but it's as if the "duration correction factor" has been applied as a "multiplier" when it should have been a "divider"? Duration Correction Factor is, and should be, a multiplier. I'm just going to add one observation: It is far better for BOINC to over estimate run time (and not fetch new work) than it is to under-estimate and miss deadlines. For that reason, DCF tends to increase very quickly, and decrease slowly. You can track it down in your client_state.xml file and fix it, or you can just let it correct itself. ID: 893184 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 893275 - Posted: 10 May 2009, 5:10:21 UTC Part of the problem is BOINC only has one DCF per project. It doesn't maintain them by application. There is an enhancement request in to address this, but if/when it will happen is anybody's guess. In the mean time you will see the DCF will jump around depending on the last task it finished. If you are running all the Seti work it can vary between AP, AP_v5, MB (cpu) and MB (cuda), which is why the estimates are all over the place. BOINC blog ID: 893275 ·

Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0	Message 893300 - Posted: 10 May 2009, 8:25:53 UTC - in response to Message 893184. Last modified: 10 May 2009, 8:43:09 UTC Duration Correction Factor is, and should be, a multiplier. I'm just going to add one observation: It is far better for BOINC to over estimate run time (and not fetch new work) than it is to under-estimate and miss deadlines. For that reason, DCF tends to increase very quickly, and decrease slowly. You can track it down in your client_state.xml file and fix it, or you can just let it correct itself. That is of it self not a problem. What I meant was that if the DCF was .5 (IE it will take half as long as estimated) then 20 x 0.5 is 10, but it looks like it is doing 20 / 0.5 which is 40. What seems to be happening is that every time I finish an astropulse WU faster than expected the next one I get is expected to take even longer. IE, Do a 100 hour one in 50 hours, (Half the time) and the next one is expected to take 200 hours (twice as long, rather than half as long!) So when I then do this 200 hour one in 50, the next one is expeted to take 4 times as long, and so on. Host in question is this: http://setiathome.berkeley.edu/show_host_detail.php?hostid=4026476 It's not causing any problems, as they take that long that I would not want a queue of work, and as soon as one is almost finished I get another, but it seems that the estimates for time are going geometrically up, rather than slowly going down. Foamy is "Lord and Master". (Oh, + some Classic WUs too.) ID: 893300 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 893441 - Posted: 10 May 2009, 17:09:28 UTC - in response to Message 893300. ... What seems to be happening is that every time I finish an astropulse WU faster than expected the next one I get is expected to take even longer. IE, Do a 100 hour one in 50 hours, (Half the time) and the next one is expected to take 200 hours (twice as long, rather than half as long!) So when I then do this 200 hour one in 50, the next one is expeted to take 4 times as long, and so on. Host in question is this: http://setiathome.berkeley.edu/show_host_detail.php?hostid=4026476 It's not causing any problems, as they take that long that I would not want a queue of work, and as soon as one is almost finished I get another, but it seems that the estimates for time are going geometrically up, rather than slowly going down. BOINC 6.4.7 does estimates based on elapsed (wall) time, but there were further changes related to that in later versions. The elapsed time accounting is probably less stable than CPU time since it's more affected by other host activity. In any case 6.4.7 shows CPU time used and estimated time in wall time, which may be contributing some part of the apparent mismatch. Reported time for a task is still CPU time, too. I note the host's results for both AP and MB work show fairly frequent restarts, and nearly half are at the same progress point indicating a new checkpoint hadn't been reached. If you don't actually need all 2 GB of memory for other activities, setting the "Leave applications in memory while suspended?" preference to Yes would improve efficiency and maybe help the estimates problem. Joe ID: 893441 ·

Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0	Message 894252 - Posted: 13 May 2009, 18:03:14 UTC - in response to Message 893441. ... What seems to be happening is that every time I finish an astropulse WU faster than expected the next one I get is expected to take even longer. IE, Do a 100 hour one in 50 hours, (Half the time) and the next one is expected to take 200 hours (twice as long, rather than half as long!) So when I then do this 200 hour one in 50, the next one is expeted to take 4 times as long, and so on. Host in question is this: http://setiathome.berkeley.edu/show_host_detail.php?hostid=4026476 It's not causing any problems, as they take that long that I would not want a queue of work, and as soon as one is almost finished I get another, but it seems that the estimates for time are going geometrically up, rather than slowly going down. BOINC 6.4.7 does estimates based on elapsed (wall) time, but there were further changes related to that in later versions. The elapsed time accounting is probably less stable than CPU time since it's more affected by other host activity. In any case 6.4.7 shows CPU time used and estimated time in wall time, which may be contributing some part of the apparent mismatch. Reported time for a task is still CPU time, too. I note the host's results for both AP and MB work show fairly frequent restarts, and nearly half are at the same progress point indicating a new checkpoint hadn't been reached. If you don't actually need all 2 GB of memory for other activities, setting the "Leave applications in memory while suspended?" preference to Yes would improve efficiency and maybe help the estimates problem. Joe Will try that. Ty. :) (you never know what you are doing is screwing it, until you're told "Stop screwing it before I screw you!") Foamy is "Lord and Master". (Oh, + some Classic WUs too.) ID: 894252 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.