Shorties estimate up from three minutes to six hours after today's outage!

Author	Message
Oddbjornik Volunteer tester Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728	Message 1151791 - Posted: 13 Sep 2011, 19:22:28 UTC I just got a batch of shorties, i.e. work units with a two week time limit. Such units normally finish in three minutes on cuda. Only the ones I got just after today's outage are estimated to take 05:59:09. Naturally Boinc has abruptly stopped requesting new work. Perhaps this is a trick to temporarily lower the load on the download pipe? I fear the units will run in three minutes as before, and that only the estimate is wrong. ID: 1151791 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1151797 - Posted: 13 Sep 2011, 19:34:37 UTC - in response to Message 1151791. I just got a batch of shorties, i.e. work units with a two week time limit. Such units normally finish in three minutes on cuda. Only the ones I got just after today's outage are estimated to take 05:59:09. Naturally Boinc has abruptly stopped requesting new work. Perhaps this is a trick to temporarily lower the load on the download pipe? I fear the units will run in three minutes as before, and that only the estimate is wrong. That probably indicates that the server software has been rebuilt with changeset [trac]changeset:24128[/trac]. If so, each 3 minute run will reduce the estimates by 1% as Duration Correction Factor (DCF) gradually reduces until the estimates get below 30 minutes, then it will adapt somewhat faster. CPU work will also be overestimated until DCF gets into a lower range, after that CPU work will tend to fight against the adaptation for GPU work. The purpose of the change is admirable, it's implementation unrealistic. Joe ID: 1151797 ·

Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0	Message 1151819 - Posted: 13 Sep 2011, 20:21:22 UTC - in response to Message 1151797. I just started receiving WUs with massively overestimated times, too. Won't that wreak havoc with boinc client's DCF? What will happen when these units cause local DCF to drop too much? As far as I understand, the client will keep requesting more and more work, much more than cache preferences allow until DCF stabilizes around 1 again. For people with large caches, this might be a nightmare. Or am I wrong? ID: 1151819 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1151847 - Posted: 13 Sep 2011, 20:58:48 UTC - in response to Message 1151819. I just started receiving WUs with massively overestimated times, too. Won't that wreak havoc with boinc client's DCF? What will happen when these units cause local DCF to drop too much? As far as I understand, the client will keep requesting more and more work, much more than cache preferences allow until DCF stabilizes around 1 again. For people with large caches, this might be a nightmare. Or am I wrong? Luckily you're wrong. The gross overestimation means BOINC thinks you already have a lot of work. DCF will eventually come down as work completes and the cache gets near empty. That will allow at least minimal work fetch requests, but it shouldn't get so low that overfetches by a large amount are likely. If DCF were able to stabilize at a "correct" level, work fetch and estimates would be normal. Unfortunately, most users of optimized apps are using two or more app versions and the local DCF cannot track more than one correctly. So there is the possibility of overfetching CPU work because DCF has been pulled lower by GPU work. Then when a CPU task finishes DCF jumps up because it took considerably longer than BOINC was expecting. From there it gradually drops again as GPU tasks finish. We're basically back in the situation as before the server-side scaling based on what's displayed as "Average processing rate" was deployed. Those running anonymous platform have to cope as well as possible until something better comes along. Joe ID: 1151847 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 1151849 - Posted: 13 Sep 2011, 20:59:04 UTC - in response to Message 1151819. I just got four 6.10 cudas. Three at 6 hours estimated and the fourth at almost 17 hours. I'm running a couple of APs on my GPU right now so I won't worry about them for awhile. PROUD MEMBER OF Team Starfire World BOINC ID: 1151849 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1151853 - Posted: 13 Sep 2011, 21:08:07 UTC - in response to Message 1151849. I just got four 6.10 cudas. Three at 6 hours estimated and the fourth at almost 17 hours. I'm running a couple of APs on my GPU right now so I won't worry about them for awhile. They are showing 23 hours for the GTX560. ID: 1151853 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1151873 - Posted: 13 Sep 2011, 22:03:10 UTC I don't understand all this new stuff.. So it's time for anonymous members to use <flops> entries in app_info.xml file again? *- Best regards!* -** Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - ID: 1151873 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1151890 - Posted: 13 Sep 2011, 22:41:03 UTC - in response to Message 1151873. I don't understand all this new stuff.. So it's time for anonymous members to use <flops> entries in app_info.xml file again? Best regards! Sutaru Tsureku Those having a mix of work where some was sent before the change and still has reasonable estimates, should not take any such panic action. Setting <flops> affects all project work on the host. Considering how many got into problems trying to use <flops> before, only those who understand how BOINC uses them ought to consider making hasty changes. However, it is true that the change assumes that those running Anonymous platform will have reasonably accurate <flops> in their app_info.xml. If that were true, nobody would be seeing any problem because of the change. Setting <flops>, and adjusting them when trying different operating conditions (such as changes in how many tasks GPUs should be doing at the same time), doesn't require higher math skills. Simple arithmetic is adequate, but needs care. Joe ID: 1151890 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1151895 - Posted: 13 Sep 2011, 23:00:46 UTC - in response to Message 1151853. I just got four 6.10 cudas. Three at 6 hours estimated and the fourth at almost 17 hours. I'm running a couple of APs on my GPU right now so I won't worry about them for awhile. They are showing 23 hours for the GTX560. And took 3 minutes. ID: 1151895 ·

Dave Stegner Volunteer tester Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27	Message 1151897 - Posted: 13 Sep 2011, 23:06:19 UTC How does one determin what the correct <flops> entry is ?? Dave ID: 1151897 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1151902 - Posted: 13 Sep 2011, 23:23:53 UTC I removed the <flops> entry in my app_info file when I found out that the SETI servers are calculating a flops value for each science app and uses that value when sending work. Removed the <flops> value 3 or 4 months ago. In my view a lot of problems are caused by the various rescheduler programs moving work from cpu to gpu or from gpu to cpu. The Berkeley servers expect the work to be done by the app it sent it out to, not a different app. So I stopped rescheduling work also. Right now my system is getting work with proper time estimates. I have not however received any work since todays outage. My opinion is still to leave the <flops> out of the app_info file for now and wait and see what happens. Boinc....Boinc....Boinc....Boinc.... ID: 1151902 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1151927 - Posted: 14 Sep 2011, 1:02:00 UTC - in response to Message 1151797. I just got a batch of shorties, i.e. work units with a two week time limit. Such units normally finish in three minutes on cuda. Only the ones I got just after today's outage are estimated to take 05:59:09. Naturally Boinc has abruptly stopped requesting new work. Perhaps this is a trick to temporarily lower the load on the download pipe? I fear the units will run in three minutes as before, and that only the estimate is wrong. That probably indicates that the server software has been rebuilt with changeset [trac]changeset:24128[/trac]. If so, each 3 minute run will reduce the estimates by 1% as Duration Correction Factor (DCF) gradually reduces until the estimates get below 30 minutes, then it will adapt somewhat faster. CPU work will also be overestimated until DCF gets into a lower range, after that CPU work will tend to fight against the adaptation for GPU work. The purpose of the change is admirable, it's implementation unrealistic. Joe I guess a side effect of this will help stabilize the huge payouts in the new credit system from machines that completed a lot of shorties and then did some large complex jobs right after. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1151927 ·

Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0	Message 1151935 - Posted: 14 Sep 2011, 1:29:38 UTC Last modified: 14 Sep 2011, 1:30:47 UTC My VHAR's are at 6:30 and change. My 603's are from 14 to 22 hours. Haven't got any AP's since the update, but can only imagine what they'll look like. Sheesh. ID: 1151935 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1151946 - Posted: 14 Sep 2011, 2:14:16 UTC - in response to Message 1151897. Dave Stegner wrote: How does one determin what the correct <flops> entry is ?? For your domain controllers there isn't any, unfortunately. BOINC 5.10.45 doesn't report app_version flops so the servers will just use the host Whetstone benchmark, and there's no practical way to make that reflect how fast a CPU is doing S@h work. Because those hosts don't have usable GPUs either, local DCF will prove adequate to eventually get estimates reasonably close. Once most of the work on such a host is the overestimated kind, you could edit the <duration_correction_factor> field for this project in its client_state.xml to whatever small fraction gives reasonable estimates for that new work. That's to speed the adaptation, but if you do it too soon the older work which will then have tiny estimates could cause BOINC to fetch more work than you really want; worst case more work than could be done by deadline. Anything you do to try to correct the situation could turn out unwise if Dr. Anderson recognizes the problem and does some corrective modification. Sitting tight and hoping for the best might indeed be the best policy, as Geek@Play advises. Joe ID: 1151946 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1151950 - Posted: 14 Sep 2011, 2:18:29 UTC - in response to Message 1151927. I guess a side effect of this will help stabilize the huge payouts in the new credit system from machines that completed a lot of shorties and then did some large complex jobs right after. No, the only change is that estimated runtimes are too big as seen on the client. It doesn't affect the server-side averages at all nor actual runtimes so has no effect on the credit calculations. Joe ID: 1151950 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19045 Credit: 40,757,560 RAC: 67	Message 1151989 - Posted: 14 Sep 2011, 3:57:45 UTC - in response to Message 1151797. Do wish the BOINC devs would do a proper fix and not bailing wire and chewing gum, think I might go and look at the southern sky. ID: 1151989 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19045 Credit: 40,757,560 RAC: 67	Message 1152006 - Posted: 14 Sep 2011, 4:59:52 UTC Last modified: 14 Sep 2011, 5:03:12 UTC If you have selected to do AP tasks as well as MB, I would suggest you de-select AP 505 until the computer has sorted out the estimates and DCF. If not then you will face months of DCF variations because each time a AP task completes it will punch the DCF up into the stratosphere and the computer will stop requesting work. edit] for devs, remember KISS - use server side estimates OR DCF not Sever side estimates AND DCF ID: 1152006 ·

Dave Stegner Volunteer tester Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27	Message 1152033 - Posted: 14 Sep 2011, 7:16:59 UTC Last modified: 14 Sep 2011, 7:39:33 UTC I have 19 machines crunching Seti. Some AP only, some MB only, some mixed, and for good measure I threw 3 low level GPU cards in the mix a month ago. I have been running opti apps for over a year and have never had an issue with incorrect time estimates. I did have a small issue when I added the GPU cards but, it fixed itself in about 2 weeks. After 2 weeks, they were correctly estimating times even though the machines were doing MB GPU and MB and AP CPU. ALL TIME ESTIMATES FOR ALL TYPE OF WORK (DEDICATED OR MIXED MODE MACHINES) WERE ACCURATE. If I understand Joe's analysis of what to expect in the future, ping pong / herky jerky time estimates, over and under subscribing, unless you run AP only or MB only and CPU only or GPU only; I do NOT understand why this change was made. For my machines, it creates a problem where none existed before. Does anyone know how big the problem, as described in the changeset, was ?? Dave ID: 1152033 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19045 Credit: 40,757,560 RAC: 67	Message 1152057 - Posted: 14 Sep 2011, 9:46:49 UTC - in response to Message 1152033. I have 19 machines crunching Seti. Some AP only, some MB only, some mixed, and for good measure I threw 3 low level GPU cards in the mix a month ago. I have been running opti apps for over a year and have never had an issue with incorrect time estimates. I did have a small issue when I added the GPU cards but, it fixed itself in about 2 weeks. After 2 weeks, they were correctly estimating times even though the machines were doing MB GPU and MB and AP CPU. ALL TIME ESTIMATES FOR ALL TYPE OF WORK (DEDICATED OR MIXED MODE MACHINES) WERE ACCURATE. If I understand Joe's analysis of what to expect in the future, ping pong / herky jerky time estimates, over and under subscribing, unless you run AP only or MB only and CPU only or GPU only; I do NOT understand why this change was made. For my machines, it creates a problem where none existed before. Does anyone know how big the problem, as described in the changeset, was ?? I still have a problem wit AP task estimates on the quad I rebuilt (mobo failure) which I updated to Win 7 and installed some new components, that was first connected on the 1st Aug. In the first ten AP tasks there were at least three that had "too much blanking" and completed after 30 sec. This made the initial APR about twice what it should be. It is still about 50% out as the present day estimates for the tasks in progress are 8h:45m (duration correction factor 1.055596) when the true estimate should be at 12h:30m minimum. This present patch is another band aid and chewing gum attempt, when the real patch should be detecting early completion, -9 and too much blanking etc, and removing those times from the APR calculation. ID: 1152057 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1152088 - Posted: 14 Sep 2011, 12:47:19 UTC - in response to Message 1152057. I got a few WU's from 25jn11ab that estimated at 133 hours on my CPU. Final times were around 3 hours. That is an unusually long time for even a VLAR on my PC. These WU's weren't marked as VLAR's either. I have to wonder if the estimates are specific to the AR. So if BOINC sees an AR that it hasn't processed yet, It gives wildly exaggerated estimates In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1152088 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.