Questions and Answers :
GPU applications :
Machine not getting enough CUDA work, DCF totally off... No idea what to adjust
Message board moderation
Author | Message |
---|---|
Grand Admiral Thrawn Send message Joined: 19 Feb 01 Posts: 54 Credit: 23,149,634 RAC: 38 |
Greetings! So, I have this CUDA machine running here, see [Details] and [Tasks] (should be public). Now, the machine consists of a 1.6GHz Dualcore Pentium (Core 2 arch), one GeForce 285 GTX (est. 708 GFLOPS) and one GeForce 9500GT (est. 88 GFLOPS). Operating System is WinXP Pro x64 Edition with 1GB of RAM. Nothing else is running on the box, only BOINC + Seti@Home. So, the estimated times to finish WUs for both CPU MB and AP as well as GPUs is totally off, being far too high. At first I thought it might have been VLAR WUs on the 9500GT being responsible, but I haven't spotted any. The DCF sat around 4.8, so I re-adjusted it to 1.0. At first, that made all the estimates pretty sane and correct, but THEN the DCF converged to ~7.5 pretty fast. I don't know why. It assumes that a normal CUDA WU would finish in like 4-6 hours, when in reality, the 285 GTX does it in 10-15 minutes, and even the slow 9500GT does it in 1-1.5 hours. Same for the CPU for both MB and AP, estimates way off. I heard you shouldn't set any FLOPS values anymore in the configuration for some reason, and re-adjusting the DCF doesn't do anything for more than a few hours at best. Any idea of what I should reconfigure to get the estimates correct, and get enough CUDA WUs to make it through the downtime from Tuesday to.. like Friday evening in GMT+1? btw., the computing prefs are set to fetch enough work for 10 days and connect every 0.01 days. I'm sorry if the information I seek is already available on the forums, I just couldn't find anything but DCF and FLOPS values... and some people were saying "don't use FLOPS settings anymore".. So I wonder what I can do. Thank you for any help you might be able to provide! 3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
<flops> values can only be changed when running anonymous platform (optimised), which you don't. One thing less for you to worry about ;-) I think, part of your problem stems from the fact that you are running two GPUs with vastly different perfomances. How could the scheduler (or whoever) figure out the correct adjustments in that case? Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours |
Grand Admiral Thrawn Send message Joined: 19 Feb 01 Posts: 54 Credit: 23,149,634 RAC: 38 |
Hmm, I would assume it would do that by the FLOPS estimation (708 GFLOPS for the 285 GTX and 88 GFLOPS for the 9500GT), no? But even if it would use only the 9500GT for the estimation, it's far too high even for the slower GPU... and even for the CPU. So that I fail to understand. 3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Hmm, I would assume it would do that by the FLOPS estimation (708 GFLOPS for the 285 GTX and 88 GFLOPS for the 9500GT), no? Yes, but it uses only one of them. I don't know which. But even if it would use only the 9500GT for the estimation, it's far too high even for the slower GPU... and even for the CPU. So that I fail to understand. Perhaps it needs some time to get the right direction of adjustment. But then, it might be that this direction changes every time that a task is validated of the "wrong" (other) GPU. Did you check the Number crunching subforum? I think there might be some threads with information on this. Gruß, Gundolf |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Too late to edit my previous post. :-) Perhaps you should temporarily set <use_all_gpus>0</use_all_gpus> until the estimates have stabilised. Then you can try what happens if you reenable your 9500GT. Gruß, Gundolf |
Grand Admiral Thrawn Send message Joined: 19 Feb 01 Posts: 54 Credit: 23,149,634 RAC: 38 |
Now, the machine has no more CUDA WUs already, and DCF has gone down to 2.7. Seems to fluctuate a lot. But even with DCF down to 2.7, estimates are roughly double the actual crunching time. But I suspect too, that's the GPUs which mess it all up for whatever reason. Now, with only the CPU crunching it seems to slowly reach a more sane level. Should I probably repost this in the number crunching forum? Not sure if it would be the right place though. I'd love to fully utilize both GPUs in that machine, since that's their sole purpose now. :) Edit: I searched the number crunching forums, and found a few threads, but no real solution. Most people seem to have this kind of problem, because they're using faster customized/optimized apps instead of stock ones without the (correct) FLOPS settings applied.. 3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge |
Grand Admiral Thrawn Send message Joined: 19 Feb 01 Posts: 54 Credit: 23,149,634 RAC: 38 |
I just wanted to add: I tried different things now (all but using anything else but stock apps), to no avail. So, I had to disable the weaker GeForce 9500GT card by setting <use_all_gpus>0</use_all_gpus> in cc_config.xml, and after just a few hours (!), everything went back to normal. Both the estimates for CPU WUs and also GPU WUs are now very precise, and I'm getting a lot more CUDA WUs despite the lesser actual GPU performance. So, I suppose, BOINC simply can't handle multiple GPUs which are on different performance levels? At least when they're working on the same project (never tried multiple projects with multiple GPUs). Would sure be nice if that was fixed somehow, because currently I can do better with the 9500GT disabled, simply because I can survive the weekly downtime, when actually performance "could" be higher, if DCF/FLOPS were estimated correctly for both the GeForce 285 GTX and the GeForce 9500GT, so my queues would get filled up nicely. This seems to work perfectly on systems with multiple identical GPUs, like my box with the two GTX 480s. I hope, some future BOINC version will be able to do that for different GPUs in the same system. :) 3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge |
Grand Admiral Thrawn Send message Joined: 19 Feb 01 Posts: 54 Credit: 23,149,634 RAC: 38 |
Ok, I simply could not let it rest! I installed optimized apps to be able to set <flops></flops> in app_info.xml. But the damn DCF kept slipping out of control still. Sooo, I downloaded ActiveState Perl (Perl language available for Windows x86_32 and x64 for free), and wrote a little in-place search&replace oneliner to simply keep the DCF down at 1.000000 automatically (i don't like to have to edit stuff manually all the time). Like this: "C:\Program Files\System\Perl\bin\perl.exe" -itmp -pe "s/<duration_correction_factor>\d+\.\d+<\/duration_correction_factor>/<duration_correction_factor>1.000000<\/duration_correction_factor>/g" "C:\Program Files\BOINC\data\client_state.xml" I placed the oneliner in a Batch script that is then being called by the Windows task scheduler every 10 minutes. I am not sure how safe in-place editing really is on client_state.xml, I hope there will be no inconsistencies when BOINC also writes to the file. So far no problems, though I will definitely have to observe a bit more. BOINC seems to realize the new DCF whenever there is a workunit status change, no need to manually reload the configuration. This is very dirty though, but I don't know how to do this any "cleaner".. 3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
BOINC seems to realize the new DCF whenever there is a workunit status change, no need to manually reload the configuration. Are you sure? I always thought BOINC only reads the client state once at startup and thereafter only writes to the xml file. Gruß, Gundolf |
Grand Admiral Thrawn Send message Joined: 19 Feb 01 Posts: 54 Credit: 23,149,634 RAC: 38 |
I set the DCF to 10.000000 while BOINC was running, and waited for one CUDA WU to finish, just to test the stuff. After that all the estimated times skyrocketed. But now that I have observed a bit more, I am not fully sure that this was the reason. I found, that when setting BOINC to a DCF of 1.000000 while the client is not running, then starting BOINC and letting it finish one WU, it writes a new DCF of around 4.5, and estimates are in the same area that I have seen in my test with 10.000000. So you might be right actually. BOINC might keep the actual DCF only in memory, writing it to client_state.xml only for the sole reason to have "correct" estimates after a client restart. Dammit. If that is true, I have no way to actually influence this behaviour online... There has to be some way to just FIX the DCF to 1. That would solve all problems, since I can then just fine-tune using the <flops> tags... 3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Perhaps you don't know what DCF stands for, but it's (Task) Duration Correction Factor. It can't be stuck at the same value, as no task here at this project runs for exactly the same amount of time as the next. Even minor differences of seconds is a difference already. Nothing said about the variation in angle range, which will make variations in the duration of the tasks - even with the VLARs now banished to the CPU only. And while it has to be around 1 to be accurate for showing time remaining on tasks, you also have to give it a chance to get to the correct value for your computer. It's how BOINC learns how long work takes. By constantly changing the value yourself, you're making it impossible for BOINC to learn how long these nasty things take. So here's a challenge for you: reset DCF to 1, then sit back and just watch things happening for 2 weeks. And then post back here with the DCF value your system has come up with by then. Remember, no tinkering. Just let BOINC figure it out for once. |
Grand Admiral Thrawn Send message Joined: 19 Feb 01 Posts: 54 Credit: 23,149,634 RAC: 38 |
That is exactly what I did in the very beginning, with stock apps. And it went totally wrong. The estimates where like 5 to 10 times higher than what it should have been, and I don't know why. I only know that it pretty much stops when deactivating the second GPU, the 9500GT. But even though that second GPU takes somewhat over 1h to complete a task, DCF makes it estimate completion times of several hours. So, I have a Dualcore CPU at 1.6GHz doing an MB WU in like 4-5h (with optimized workers), the 285 GTX doing the work in like 15mins, and the 9500GT doing the work in like 1-1.5h. But estimates are like 4-5h for GPU after some time. DCF reaches values beyond 10. Estimates for MB CPU reach values like 20 hours and according to what I've seen stay there. This just can't be right. So the machine was starving already for some time while all other machines were supplied with plenty of work, especially my most powerful one (i7 + 2 480 GTX cards). So, what I want to do is to ensure that the machine doesn't run out of work during the regular weekly downtimes, that's all. So far I have been unable to do that... With my current <flops> settings, all estimates are pretty much perfect at a DCF of 1.000000, so I would like it to stay there. But it just doesn't.. It jumps to 4, then 7, then 10.... I either have to make sure that the DCF stays at 1, or I need to dynamically adjust <flops> settings.. Whatever it takes to make the pipeline stay full... On all my other boxes this works nicely. It's only the machine with two different GPUs in it that has this problem. It's just that I don't wanna keep the 9500GT idle, would be a waste of resources. I want both GPUs to work. But if this results in the whole machine being in idle state for like 4 days a week, it's not very good.. The funny thing: If DCF would be pushed up so far that the estimates would match the slowest crunching processor in the system sorted by type of processor, it would make sense. Let's say, it would push CUDA estimates up to match the speed of the 9500GT. 1.5h per MB WU. Yeah. Would make "some" sense. But it pushes the estimates FAR beyond the time that the SLOWEST crunching processor in the system takes to complete a task. 4h? 5h? How come? The GTX 285 does it in 15min, the 9500GT in roughly 90min. Where do the 4-5h CUDA estimates come from? (And yeah, they stay at about that level if left alone...) 3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
You can use the New rescheduler (BoincRescheduler) http://setiathome.berkeley.edu/forum_thread.php?id=60712 http://www.efmer.eu/forum_tt/index.php?topic=428.0 to limit the DCF: " config.xml This file is for changing settings and debugging. This file must be placed together with the exe and has to be called config.xml. If you don't use it, rename it to e.g. xconfig.xml. <config> <seti> <vlar>0.13</vlar> <vhar>1.127</vhar> <dcf_min>0.1</dcf_min> <dcf_max>0.2</dcf_max> <est_ratio_cpu_min>0.5</est_ratio_cpu_min> <est_ratio_cpu_max>1.5</est_ratio_cpu_max> <est_ratio_gpu_min>0.5</est_ratio_gpu_min> <est_ratio_gpu_max>1.5</est_ratio_gpu_max> </seti> </config> <vlar> Override default VLAR value. <vhar> Override default VHAR value. <dcf_min> Minimum duration_correction_factor. If the read value in the state file, is less than this value, this value is used to replace the existing duration_correction_factor. <dcf_max> Maximum duration_correction_factor. If the read value in the state file, is more than this value, this value is used to replace the existing duration_correction_factor You need to set both <dcf_min>and <dcf_max>values! A value of 0 may NOT be used. " Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Grand Admiral Thrawn Send message Joined: 19 Feb 01 Posts: 54 Credit: 23,149,634 RAC: 38 |
Aaaah, thank you. Will try this as soon as everything is up and running again. :) 3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.