Message boards :
Number crunching :
Estimates and Deadlines revisited
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Here's an updated chart. (direct link) I've added the Opteron 275 from Joe's list, and taken out the Q6600_XP which changed speed part way through. Next move, I think, is to locate and investigate the task records for those outliers above 15,000 'normalised' seconds in the 0.2 - 0.4 AR region. If I can find a legitimate excuse for excluding them (application error, restarted from 0.00% - that sort of thing), we might have a basis for doing some curve-fitting. Edit - success! Here are the RIDs for those outliers: 677159581 - purged 646706541 - restarted from the beginning 640651198 - restarted from the beginning 659605612 - restarted from the beginning 648817421 - restarted from the beginning 681717929 - purged 653894489 - restarted from the beginning 686061619 - restarted from the beginning On that basis, I feel comfortable about excluding them from this analysis. But as an aside, all of those 'restarted from the beginning' were from X5160_Darwin_8.11.1 - I wonder why that should be so bad at checkpointing? So now we get a clearer view of the action: (direct link) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Two more charts - expanded views of the curved sections. (direct link) (direct link) Pretty much the same data as last time, though a few extra results have come in over lunch. I've selected out AR 0.05+ thru 0.22548, and AR 0.22549 thru 1.1274 - the AR scale on these plots is linear, rather than logarithmic. Data in the AR 0.05+ thru 0.22548 range is pretty sparse, though we're actually getting more than usual from the current 'tapes'. It may fill out over the coming days. The AR 0.22549 thru 1.1274 range is already pretty complete: my feeling is that perhaps Joe's curve needs to be a little steeper, higher in the top left and lower in the bottom right. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Two more charts - expanded views of the curved sections. The 0.2255 thru 0.355 range is still sparse and there don't appear to be any in the valleys just below the jumps at AR 0.29091 and 0.35556, but I agree the evidence so far indicates an increase at that end of maybe about 7% may be in order even though it would increase the deadline at 0.226 to 41 days. The bottom right I think is probably near correct, data points above the rather large jump at AR 1.06667 are missing so far. There were some at AR 1.076 in 31dc06ad, I hope the monitored systems caught a few. Joe |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Two more charts - expanded views of the curved sections. Remember that thing about not getting "perfection"? ;) I'd suggest only 3-5% bump, not 7%... It is, after all, an estimate... Most all of the hosts are under the estimate at the upper end of that range (around 0.875 and up)... Any luck with the slower hosts? Brian...who is now resuming my Ubuntu struggle for a while... |
W-K 666 Send message Joined: 18 May 99 Posts: 19012 Credit: 40,757,560 RAC: 67 |
Thanks for all the hard work Joe, et al. Question, can the same or similar logic be used to equalize the cr/time curves? Andy |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Thanks for all the hard work Joe, et al. I had thought about that too... I guess my question to you is, are you thinking about suggesting a return to the benchmarks for credit basis? |
W-K 666 Send message Joined: 18 May 99 Posts: 19012 Credit: 40,757,560 RAC: 67 |
Thanks for all the hard work Joe, et al. Never. That is the worse suggestion you have made this year. Consider washing your mouth out with soap and water ;-) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Two more charts - expanded views of the curved sections. Not a lot yet. One of the Darwins caught a new 0.310078 and a 1.073367, and the server got a 0.310287 and a 1.068357, but not worth re-doing the chart tonight. 31dc06ad is pretty recent, we'll probably have to wait for them to mature in the cache a while. |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Thanks for all the hard work Joe, et al. Hey now! I didn't make the suggestion. I was just asking if that was on your mind... |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Two more charts - expanded views of the curved sections. So we wait for the cache to hatch? ;) |
purplemkayel Send message Joined: 23 Jul 02 Posts: 1904 Credit: 55,594 RAC: 0 |
Two more charts - expanded views of the curved sections. How slow a host are you looking for? If you're interested, I can run my AMD (linux with stock app) SETI only for a while... mind you it takes between 3 to 7 days to finish a work unit running 24 hours a day, lol... and I do run it 24 hours a day. Really slow AMD-K6 Happy birthday Calm Chaos!!! Terrible twos? Calm Chaos... are you feeling it yet? |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Perfection isn't the issue, just convincing Eric that the estimates have been checked out and adapted as necessary. The way I shortened the testing puts more weight on startup tasks and zero chirp processing than full runs do and I didn't have enough data to compensate that well. If we don't get any data points below the estimate curve within the 0.2255 to 0.20909 range or the 0.20909 to 0.3556 range I'll be surprised, but would have to respond by bending the curve more strongly to match the data. My testing showed a peak at 0.64 and a valley at 1.067. The curve is intended to go through the midpoint of that section around 0.8, so it's natural that data points in the 0.8 to 1.067 range should tend to be below the curve. The slower hosts are slower :^) RACs range from 63 to 400, there will only be a few new results for each host per day on average. I did look through the data I gathered yesterday in hopes that pendings would provide enough data to make ratio approximations for a few, but none seemed acceptable to me. Today's gather didn't improve the distribution much, I think at least a week will be needed to begin to have enough data for the faster hosts. Joe |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Thanks for all the hard work Joe, et al. Essentially all that's needed is the ratio between flop counts and time. If the estimate curves are a reasonable approximation of times then it's fairly trivial to derive the credit adjustments. Adjusting time estimates is basically a trivial change, replacing formulas in 5 lines of the splitter code. Adding logic to vary the credit_rate header parameter by angle range would be all new code, though not very complex. Joe |
W-K 666 Send message Joined: 18 May 99 Posts: 19012 Credit: 40,757,560 RAC: 67 |
Thanks for all the hard work Joe, et al. Thanks Joe, Thought that would be the case, but wanted an experts view, not my electronic hardware view. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
It's been a couple of days since I updated the charts. Here's the current set: Full range, all ARs captured: (direct link) AR 0.05+ thru 0.22548: (direct link) AR 0.22549 thru 1.1274: (direct link) Note that I've had to rescale the two expanded views to display a couple of outliers. We have now got some points above AR 1.06667, proving that Joe was right all along and I jumped to a hasty conclusion, as always! I can keep this going for a couple more days, then will have to take an enforced break until the New Year. But judging by today's splitter output, we won't be missing much of interest. Edit - in case anybody's interested, I'm currently monitoring: 3677428 X5355 Windows server 2003 ..... 317 data points 3605815 Intel Quad Windows XP ......... 259 3950644 Q6600 Windows XP .............. excluded 3229377 Q6600 Windows Vista ........... 254 2965688 X5160 Darwin 8.11.1 ........... 606 3009870 X5160 Darwin 8.11.1 ........... included above 4037247 E5345 Gentoo Linux 2.6.22 ..... 295 4011908 E5335 Xen Linux 2.6.22 ........ 330 3253554 X5365 Darwin 8.11.1 ........... 363 3117194 Intel Quad FC8 Linux 2.6.23 ... 294 2370791 Opteron 275 Linux 2.6.18 ...... 183 - a total of 2,901 data points so far |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
One aspect of the estimates I didn't factor in while developing my curve formulas is the way DCF (Duration Correction Factor) works. Because it adjusts upward immediately and downward slowly, work fetch and displayed time to completion will effectively have the curves placed above most of the actual run times for a host. If we model the situation as requiring 24 downward adjustments to balance 1 upward adjustment, there would be about 4% of runtimes above the estimate curve and 96% below, on average. Had I been developing the curves with that in mind, I would have shaped them to match the peaks in the ranges with imposed sawteeth. In the .05 to .225 range that would steepen the curve since the biggest teeth are to the left. In the .226 to 1.1274 range the curve would be shallower because the big teeth are to the right. I don't plan to make changes based on this thinking, because basing the curves on the middle of the teeth is the right approach for getting deadlines balanced. However, the large variations in runtime on multi-core hosts will make time estimates less accurate than I'd like, though still much better than the old estimates. Joe |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
While awaiting an opportunity to resume data gathering, I decided to use what I had to get a rough estimate of the ratios. The table below shows the information for the hosts which had enough data to be at least indicative. The VLAR column is the average time for WUs in the 0 to 0.5 range, MID is the average in the 0.35556 to 0.45714 range, and VHAR the average for 1.1275 and above. The calculated ratios are between those columns. the C column indicates how many cores the host has. "Intel Core" refers to Core 2 duo, quad, and Xeon processors using the same architecture. host___ VLAR_ ___L:M __MID ___M:H VHAR C type 1755696 07611 0.86:1 08880 4.70:1 1891 4 Intel Core 1789875 13882 0.99:1 14055 3.42:1 4104 1 AMD 2037132 16151 1.08:1 14898 4.40:1 3387 1 Intel P4 2122553 16433 1.08:1 15221 4.97:1 3060 1 Intel P4 2315607 09046 0.89:1 10196 4.08:1 2497 4 Intel Core 2370791 13233 0.89:1 14863 3.49:1 4258 4 AMD 2402518 14646 0.92:1 16182 3.43:1 4719 4 AMD 2842066 14847 0.91:1 16390 2.60:1 6292 8 AMD 3004905 07934 0.83:1 09535 4.54:1 2102 4 Intel Core 3332596 11861 1.04:1 11448 4.03:1 2842 4 Intel Core 3368721 08414 0.95:1 08885 3.16:1 2811 8 Intel Core 3429866 15056 1.02:1 14763 3.47:1 4258 1 AMD 3591957 12191 0.93:1 13173 3.84:1 3429 4 Intel Core 3597407 10456 1.03:1 10158 4.93:1 2059 4 Intel Core 3669880 12547 1.02:1 12350 3.34:1 3703 4 AMD 3714185 13145 1.04:1 12608 3.25:1 3879 8 Intel Core 3811020 10113 0.92:1 10970 3.21:1 3417 4 AMD 3842351 13185 0.95:1 13895 2.68:1 5184 8 AMD 3983844 11855 0.93:1 12739 3.34:1 3814 4 AMD 4018306 11620 0.93:1 12513 3.49:1 3590 1 AMD Naively averaging the ratios gives 0.96:1 for L:M and 3.72:1 for M:H so may indicate I'll need to reduce the VLAR range and increase the VHAR. But I obviously need to get data for some P4 and/or PD hosts running two cores or HT, there are too many of those doing S@H to ignore. Joe |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Here's the data that's come in over the holidays: (direct link) (direct link) (direct link) There are now almost 10,000 points in the main plot (9,851 to be exact), so I think I'll probably not post any more charts - your eye gets drawn to the rare anomalies, rather than concentrating on the underlying form. Also, I think the 'Intel_Quad_Windows_XP' had a few problems over the break - I've thrown out some extreme values, but there are still some questionable high times around AR=0.42 |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
... Hmm, the children playing games during holidays? I agree it's sufficient data for the Intel Core architecture, and I've gathered almost enough Intel Netburst and additional AMD to do some graphs myself. I just hope it doesn't expand that 1:2 uncertainty (2000 to 4000 on your scale) for the highest angle ranges. The very low angle ranges are interesting too, but 8000 to 12000 is only 1:1.5 so that area is less critical. My feeling that the different architectures will have significantly different ratios may not be correct, anyhow. Joe |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
... I wouldn't worry too much about the uncertainty at VHAR. Remember that I'm monitoring multicores (the most prominent variation, the Xen Linux, is an 8-core), and review the discussion we had in SETI and Einstein Cooperation on a Q6600. I think that the increased 'Manhattan Skyline' effect at VHAR since my last plot is due to the volume of VHAR since 22 Dec, so the machines will have spent much more time running VHAR in parallel on most/all cores. I'm more interested in the clear banding between 'Windows' and 'everything else' at VLAR. I should have said that I renormalised these graphs, now I've got more data, by taking the average of the 11 data points straddling AR=0.41 as my 10,000 second datum: that should line the different machines/OSs up better. It looks to me as if whichever routines are used most specifically at VLAR could do with re-optimising for the Windows stock compiler. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.