Message boards :
Number crunching :
Short estimated runtimes - don't panic
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Wiggo Send message Joined: 24 Jan 00 Posts: 36829 Credit: 261,360,520 RAC: 489 |
Thanks all for the advice after a lot of messing around this morning I managed to get the cruncher back to driver 270.61 and now all seems OK again. Just another example of Bill Gates trying to stop us finding ET:)) Now if it will just behave for a few days (been fighting weekend power cuts and overnight GPU crashes for the last few weeks) I should finally make it to 7Million credits which BoincStats was expecting me to get to yesterday! The driver problem is actually nvidia's fault, not Windows. As to the slot speeds, probably nothing really significant but it would be advisable to have the faster card on the faster bus. Cheers. |
cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0 |
Well I dunno about underestimating runtimes, I've had 2 days of the damm thing overestimating runtimes and going bleeding balistic on me, first its one project, then its the other one, then its both projects using GPU, and 1 using CPU.. Arrghhhh. But what particularly irritating to me, is that it dumps WU that are seconds away from completion to load WU at zero %. Even when both WU belong to the same project and are of the same type.. Its bloody daft. There are now a multitude of WU waiting to run, in varying stages of completion for both projects. In order to stop this behavior one has to micro mange WU, suspend whatever had caused the move to HP, and allow partialy completed WU to complete then unsuspend WU again.. Whereupon Boinc goes blithly on dumping more WU into the waiting queue while it loads yet more WU at zero % and cruises on its merry way. Perhaps Boinc should do a small check of running tasks before it goes hyper, and see if they belong to the project/WU type or whatever has caused it got go HP and leave them running [at HP if required] until complete before loading new WU.. When you have AP6 GPU tasks that have estimated times from 240 hrs to 64 hrs and that inevitably complete on a GPU in under 3 hours.. Those estimates are nuts. That even after Boinc has completed 6 or more such WU, it still has crazy estimated times is beyond me. Those estimates need to be corrected on the fly and in a timely manner. Regards, Cliff, Been there, Done that, Still no damm T shirt! |
Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0 |
|
W-K 666 Send message Joined: 18 May 99 Posts: 19403 Credit: 40,757,560 RAC: 67 |
This is what happens when you don't use <flops>... As I was the person who reported the initial problem. I felt I could not use flops, so I could keep coming up with reminder that it hdn't been fixed yet. Now that it has been fixed, all the new estimates look as though they are right on the button and my DCF is ~1.0. And now I don't have tasks being suspended a few seconds from completion, or additional others suspended just because another task has completed etc. So not going to use flops because they are not needed now. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
This is what happens when you don't use <flops>... It probably depends on your hardware - both in absolute and relative terms. My Q6600/9800GT rigs - which I regard as being quite 'well balanced' - are happy with the new settings. But this lunchtime I watched my E5320/9800GTX+ transition from 'old' to 'new' tasks. While the 'old' rate (capped APR) tasks were controlling proceedings, I accumulated 400 queued GPU tasks, estimated at exactly one day (well, 1 day and 4 minutes) with DCF=0.1468 The first 'new' task jumped that to DCF=0.7490 and a cache of 5 days 3 hours. So, with the slower CPUs and the faster GPU, that host hasn't yet completed rebalancing. And as for the Q9300/GTX470 - that still has DCF=0.3543, so it'll have to wait for the next raising of the cap. There will be other users out there, with an even more extreme disparity between CPU and GPU speed, that will still need another five-fold increase, and then some. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
My E8400/GTX 560Ti system has pretty much settled down. Finally it is able hit the GPU server side limits. The DCF still moves around a bit, but at least it is only a bit. Before it used to go from 0.2 to 1.5+ with the completion of a CPU VLAR unit. Grant Darwin NT |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
I have another question. My motherboard has a x16 and a x4 slot for the GPUs. Whilst messing around this morning I noticed that my faster GPU (GTX460) is currently in the x4 slot and the slower GPU (GT430) is in the x16 slot. Would I see a (significant) performance enhancement if I swapped the GPUs over? On Seti MB, with the optimized apps the number of lanes (x16, x4) it is not crucial, because the amount of data transferred back and forth between CPU and GPU is not hughe. On other apps and or projects could be very different. I know that a 430 on x16 works a 66% faster than on a x1 for Einstein BRPs (that are hibryd apps that need a lot of CPU work), while it give me no noticeable difference on Seti Multibeam. So, should you switch them? On one side, having the faster GPU on the slow PCIe, can help to make the difference on crunching speeds a bit less different, leading to a more stable DCF and APR... On the other, the faster GPU will be less faster than expected, so you might get better performance swithching them... As always, Your mileage may vary. |
shizaru Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0 |
Run-times are perfect (on my little laptop). Thanx everybody! |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
So I finally ran some of the new estimate tasks ahead of the ~8 days of old estimate that I had. New ones went up to the correct time and the old ones just about doubled their estimate. Based on ETA alone, I've got about 22 days of cache, but it is actually more like 14 or so. BOINC has not gone into HP mode for anything though, which I was half-expecting, but it is working on the oldest tasks first anyway (FIFO), which those are the ones that would end up going into HP. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
The kitties are doing just fine... Waiting for the next round of adjustments. "Time is simply the mechanism that keeps everything from happening all at once." |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
So Richard! Answering more generally, for the benefit of other readers who may shoulder-surf this thread: I'm seeing that runtime estimation is pretty good now on my 20/04/2012 10:30:22 | | NVIDIA GPU 0: GeForce 9800 GT (driver version 29036, CUDA version 4010, compute capability 1.1, 512MB, 336 GFLOPS peak) That card has 'Average processing rate 138.37077926918' for the Lunatics MB app. But on faster cards, there's still some way to go before things settle down: we'll be writing to the staff on Monday, suggesting that the time is right for the second stage of normalisation (as suggested by Mark). So, I'd say that if your GFLOPS/APR figures are similar to, or below, mine, you should be fine to remove them now. But if you have faster cards than this one, maybe wait for another week or two after the next round of corrections. |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
Hi Richard, What is expected to happen on systems with mixed speed GPUs please? I am expecting the GTX 460 times to be a few % too high and the 430/520 ones way too low. I get Average processing rate 308.83578908794 for as follows and a DCF that has a lot in common with a Yo-Yo. 20/04/2012 09:10:00 | | NVIDIA GPU 0: GeForce GTX 460 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 1024MB, 766MB available, 1025 GFLOPS peak) 20/04/2012 09:10:00 | | NVIDIA GPU 1: GeForce GT 430 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 512MB, 361MB available, 269 GFLOPS peak) 20/04/2012 09:10:00 | | NVIDIA GPU 2: GeForce GTX 460 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 1024MB, 814MB available, 1025 GFLOPS peak) 20/04/2012 09:10:00 | | NVIDIA GPU 3: GeForce GT 520 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 512MB, 366MB available, 156 GFLOPS peak) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
As always, you're limited to just one value per application type. All MB CUDA apps will share a value, all AP OpenCL apps will share a value, and so on. The best value to use - in general - is the one shown as the APR for the application type. This thread arose because - for a while, now hopefully coming to an end - the displayed APR wasn't the same as the effective APR. People used <FLOPS> to get the effective APR back up to the proper value. Since your APR seems to be below the new threshhold, it'll make no difference at all whether you supply a value yourself in <FLOPS>, or rely on the one maintained by the project in APR. APR is easier, in my book. |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
Thank you Richard. I suspect that when/if dont_use_dcf is sent by the server I will get plausable estimates for all but the GT 430/520 so I am hoping that change happens on next tuesday. What is the URL that lists the new threshholds currently being used please? With my GTX 680 + GTX 460 I get Average processing rate 207.34629880725 and am wondering why this is lower. I suspect because the GTX 680 has only been in use since Monday. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
So Richard! I didn't put the FLOPs back in when the bug fix borked things. On my E8400/ GTX 560Ti system the DCF would move from .2 to over 1.5. Since the latest changes the DCF moves bewteen 1.3 & 0.9 (or there abouts). On my i7/ GTX 460 system it moves between 1.1 & 0.8. The estimated completion time for shorties is still out by a factor of 10, but it's a big improvement on what it was & now both systems actually hit the serverside limits for GPU work. It would be nice if they could double the serverside limits with the next tweaking. The previous increase allowed me to keep just under 2 days work for the GPUs. Doubling it would get me close to my usual cache of 4, which would be nice. Not knowing just how the estimated completion time estimates work, and the limits for timing them out for finishing too early or not early enough, and considering the huge change in values the last change gave i'd suggest changing that setting by half of what was done last time- gradually moving things closer back to their rightfull settings. Grant Darwin NT |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
I suspect these shorter estimates are sometimes too short for my slow GPUs. All of the following got Exit status -177 (0xffffffffffffff4f) ERR_RSC_LIMIT_EXCEEDED http://setiathome.berkeley.edu/result.php?resultid=2404443134 http://setiathome.berkeley.edu/result.php?resultid=2404443124 http://setiathome.berkeley.edu/result.php?resultid=2403125100 http://setiathome.berkeley.edu/result.php?resultid=2401215374 http://setiathome.berkeley.edu/result.php?resultid=2401215372 http://setiathome.berkeley.edu/result.php?resultid=2401215324 http://setiathome.berkeley.edu/results.php?hostid=6379672&offset=0&show_names=0&state=5&appid= I suspect BOINC would need to take different GPU speeds into account to stop these. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I suspect these shorter estimates are sometimes too short for my slow GPUs. All of the following got Exit status -177 (0xffffffffffffff4f) ERR_RSC_LIMIT_EXCEEDED Most of those were run on your GT 520 - I forget where that comes in the speed range. And all of them have 'difficult' ARs, which extend the runtime a long way beyond expectations. In your rather specialised environment (2 x GTX 460, GT 430, GT 520), you may have to help BOINC out by setting a <flops> value closer to the speed of the slowest, rather than relying on the APR which will be heavily weighted by the two fast cards. |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Just a heads up - we are expecting the next step to go in this maintenance. I'm not the Pope. I don't speak Ex Cathedra! |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
*fingers crossed* Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
It seemed to go pretty smoothly last time, and this time, fewer people will be affected - only the faster GPU cards. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.