Machine not getting enough CUDA work, DCF totally off... No idea what to adjust

Questions and Answers : GPU applications : Machine not getting enough CUDA work, DCF totally off... No idea what to adjust
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Grand Admiral Thrawn
Avatar

Send message
Joined: 19 Feb 01
Posts: 54
Credit: 23,149,634
RAC: 38
Austria
Message 1026035 - Posted: 18 Aug 2010, 8:01:04 UTC
Last modified: 18 Aug 2010, 8:04:48 UTC

Greetings!

So, I have this CUDA machine running here, see [Details] and [Tasks] (should be public).

Now, the machine consists of a 1.6GHz Dualcore Pentium (Core 2 arch), one GeForce 285 GTX (est. 708 GFLOPS) and one GeForce 9500GT (est. 88 GFLOPS). Operating System is WinXP Pro x64 Edition with 1GB of RAM. Nothing else is running on the box, only BOINC + Seti@Home.

So, the estimated times to finish WUs for both CPU MB and AP as well as GPUs is totally off, being far too high. At first I thought it might have been VLAR WUs on the 9500GT being responsible, but I haven't spotted any. The DCF sat around 4.8, so I re-adjusted it to 1.0. At first, that made all the estimates pretty sane and correct, but THEN the DCF converged to ~7.5 pretty fast. I don't know why.

It assumes that a normal CUDA WU would finish in like 4-6 hours, when in reality, the 285 GTX does it in 10-15 minutes, and even the slow 9500GT does it in 1-1.5 hours. Same for the CPU for both MB and AP, estimates way off. I heard you shouldn't set any FLOPS values anymore in the configuration for some reason, and re-adjusting the DCF doesn't do anything for more than a few hours at best.

Any idea of what I should reconfigure to get the estimates correct, and get enough CUDA WUs to make it through the downtime from Tuesday to.. like Friday evening in GMT+1?

btw., the computing prefs are set to fetch enough work for 10 days and connect every 0.01 days.

I'm sorry if the information I seek is already available on the forums, I just couldn't find anything but DCF and FLOPS values... and some people were saying "don't use FLOPS settings anymore".. So I wonder what I can do.

Thank you for any help you might be able to provide!
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

ID: 1026035 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1026036 - Posted: 18 Aug 2010, 8:09:41 UTC - in response to Message 1026035.  

<flops> values can only be changed when running anonymous platform (optimised), which you don't. One thing less for you to worry about ;-)

I think, part of your problem stems from the fact that you are running two GPUs with vastly different perfomances. How could the scheduler (or whoever) figure out the correct adjustments in that case?

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 1026036 · Report as offensive
Profile Grand Admiral Thrawn
Avatar

Send message
Joined: 19 Feb 01
Posts: 54
Credit: 23,149,634
RAC: 38
Austria
Message 1026038 - Posted: 18 Aug 2010, 8:29:16 UTC - in response to Message 1026036.  

Hmm, I would assume it would do that by the FLOPS estimation (708 GFLOPS for the 285 GTX and 88 GFLOPS for the 9500GT), no?

But even if it would use only the 9500GT for the estimation, it's far too high even for the slower GPU... and even for the CPU. So that I fail to understand.
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

ID: 1026038 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1026052 - Posted: 18 Aug 2010, 9:32:26 UTC - in response to Message 1026038.  

Hmm, I would assume it would do that by the FLOPS estimation (708 GFLOPS for the 285 GTX and 88 GFLOPS for the 9500GT), no?

Yes, but it uses only one of them. I don't know which.

But even if it would use only the 9500GT for the estimation, it's far too high even for the slower GPU... and even for the CPU. So that I fail to understand.

Perhaps it needs some time to get the right direction of adjustment. But then, it might be that this direction changes every time that a task is validated of the "wrong" (other) GPU.

Did you check the Number crunching subforum? I think there might be some threads with information on this.

Gruß,
Gundolf
ID: 1026052 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1026072 - Posted: 18 Aug 2010, 11:43:39 UTC - in response to Message 1026038.  

Too late to edit my previous post. :-)

Perhaps you should temporarily set <use_all_gpus>0</use_all_gpus> until the estimates have stabilised. Then you can try what happens if you reenable your 9500GT.

Gruß,
Gundolf
ID: 1026072 · Report as offensive
Profile Grand Admiral Thrawn
Avatar

Send message
Joined: 19 Feb 01
Posts: 54
Credit: 23,149,634
RAC: 38
Austria
Message 1026769 - Posted: 20 Aug 2010, 18:31:01 UTC
Last modified: 20 Aug 2010, 18:33:37 UTC

Now, the machine has no more CUDA WUs already, and DCF has gone down to 2.7. Seems to fluctuate a lot. But even with DCF down to 2.7, estimates are roughly double the actual crunching time. But I suspect too, that's the GPUs which mess it all up for whatever reason. Now, with only the CPU crunching it seems to slowly reach a more sane level.

Should I probably repost this in the number crunching forum? Not sure if it would be the right place though. I'd love to fully utilize both GPUs in that machine, since that's their sole purpose now. :)

Edit: I searched the number crunching forums, and found a few threads, but no real solution. Most people seem to have this kind of problem, because they're using faster customized/optimized apps instead of stock ones without the (correct) FLOPS settings applied..
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

ID: 1026769 · Report as offensive
Profile Grand Admiral Thrawn
Avatar

Send message
Joined: 19 Feb 01
Posts: 54
Credit: 23,149,634
RAC: 38
Austria
Message 1031189 - Posted: 5 Sep 2010, 16:49:11 UTC
Last modified: 5 Sep 2010, 16:51:35 UTC

I just wanted to add: I tried different things now (all but using anything else but stock apps), to no avail.

So, I had to disable the weaker GeForce 9500GT card by setting <use_all_gpus>0</use_all_gpus> in cc_config.xml, and after just a few hours (!), everything went back to normal. Both the estimates for CPU WUs and also GPU WUs are now very precise, and I'm getting a lot more CUDA WUs despite the lesser actual GPU performance.

So, I suppose, BOINC simply can't handle multiple GPUs which are on different performance levels? At least when they're working on the same project (never tried multiple projects with multiple GPUs).

Would sure be nice if that was fixed somehow, because currently I can do better with the 9500GT disabled, simply because I can survive the weekly downtime, when actually performance "could" be higher, if DCF/FLOPS were estimated correctly for both the GeForce 285 GTX and the GeForce 9500GT, so my queues would get filled up nicely. This seems to work perfectly on systems with multiple identical GPUs, like my box with the two GTX 480s.

I hope, some future BOINC version will be able to do that for different GPUs in the same system. :)
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

ID: 1031189 · Report as offensive
Profile Grand Admiral Thrawn
Avatar

Send message
Joined: 19 Feb 01
Posts: 54
Credit: 23,149,634
RAC: 38
Austria
Message 1033605 - Posted: 17 Sep 2010, 16:50:12 UTC

Ok, I simply could not let it rest! I installed optimized apps to be able to set <flops></flops> in app_info.xml. But the damn DCF kept slipping out of control still.

Sooo, I downloaded ActiveState Perl (Perl language available for Windows x86_32 and x64 for free), and wrote a little in-place search&replace oneliner to simply keep the DCF down at 1.000000 automatically (i don't like to have to edit stuff manually all the time). Like this:

"C:\Program Files\System\Perl\bin\perl.exe" -itmp -pe "s/<duration_correction_factor>\d+\.\d+<\/duration_correction_factor>/<duration_correction_factor>1.000000<\/duration_correction_factor>/g" "C:\Program Files\BOINC\data\client_state.xml"


I placed the oneliner in a Batch script that is then being called by the Windows task scheduler every 10 minutes.

I am not sure how safe in-place editing really is on client_state.xml, I hope there will be no inconsistencies when BOINC also writes to the file. So far no problems, though I will definitely have to observe a bit more. BOINC seems to realize the new DCF whenever there is a workunit status change, no need to manually reload the configuration.

This is very dirty though, but I don't know how to do this any "cleaner"..
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

ID: 1033605 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1033644 - Posted: 17 Sep 2010, 18:53:12 UTC - in response to Message 1033605.  

BOINC seems to realize the new DCF whenever there is a workunit status change, no need to manually reload the configuration.

Are you sure? I always thought BOINC only reads the client state once at startup and thereafter only writes to the xml file.

Gruß,
Gundolf
ID: 1033644 · Report as offensive
Profile Grand Admiral Thrawn
Avatar

Send message
Joined: 19 Feb 01
Posts: 54
Credit: 23,149,634
RAC: 38
Austria
Message 1033742 - Posted: 17 Sep 2010, 22:41:20 UTC - in response to Message 1033644.  
Last modified: 17 Sep 2010, 22:49:59 UTC

I set the DCF to 10.000000 while BOINC was running, and waited for one CUDA WU to finish, just to test the stuff. After that all the estimated times skyrocketed. But now that I have observed a bit more, I am not fully sure that this was the reason. I found, that when setting BOINC to a DCF of 1.000000 while the client is not running, then starting BOINC and letting it finish one WU, it writes a new DCF of around 4.5, and estimates are in the same area that I have seen in my test with 10.000000.

So you might be right actually. BOINC might keep the actual DCF only in memory, writing it to client_state.xml only for the sole reason to have "correct" estimates after a client restart.

Dammit. If that is true, I have no way to actually influence this behaviour online... There has to be some way to just FIX the DCF to 1. That would solve all problems, since I can then just fine-tune using the <flops> tags...
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

ID: 1033742 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1033773 - Posted: 17 Sep 2010, 23:48:51 UTC - in response to Message 1033742.  

Perhaps you don't know what DCF stands for, but it's (Task) Duration Correction Factor. It can't be stuck at the same value, as no task here at this project runs for exactly the same amount of time as the next. Even minor differences of seconds is a difference already. Nothing said about the variation in angle range, which will make variations in the duration of the tasks - even with the VLARs now banished to the CPU only.

And while it has to be around 1 to be accurate for showing time remaining on tasks, you also have to give it a chance to get to the correct value for your computer. It's how BOINC learns how long work takes. By constantly changing the value yourself, you're making it impossible for BOINC to learn how long these nasty things take.

So here's a challenge for you: reset DCF to 1, then sit back and just watch things happening for 2 weeks. And then post back here with the DCF value your system has come up with by then. Remember, no tinkering. Just let BOINC figure it out for once.
ID: 1033773 · Report as offensive
Profile Grand Admiral Thrawn
Avatar

Send message
Joined: 19 Feb 01
Posts: 54
Credit: 23,149,634
RAC: 38
Austria
Message 1033795 - Posted: 18 Sep 2010, 0:35:19 UTC - in response to Message 1033773.  
Last modified: 18 Sep 2010, 0:48:55 UTC

That is exactly what I did in the very beginning, with stock apps. And it went totally wrong. The estimates where like 5 to 10 times higher than what it should have been, and I don't know why.

I only know that it pretty much stops when deactivating the second GPU, the 9500GT. But even though that second GPU takes somewhat over 1h to complete a task, DCF makes it estimate completion times of several hours.

So, I have a Dualcore CPU at 1.6GHz doing an MB WU in like 4-5h (with optimized workers), the 285 GTX doing the work in like 15mins, and the 9500GT doing the work in like 1-1.5h.

But estimates are like 4-5h for GPU after some time. DCF reaches values beyond 10. Estimates for MB CPU reach values like 20 hours and according to what I've seen stay there. This just can't be right.

So the machine was starving already for some time while all other machines were supplied with plenty of work, especially my most powerful one (i7 + 2 480 GTX cards).

So, what I want to do is to ensure that the machine doesn't run out of work during the regular weekly downtimes, that's all. So far I have been unable to do that... With my current <flops> settings, all estimates are pretty much perfect at a DCF of 1.000000, so I would like it to stay there. But it just doesn't.. It jumps to 4, then 7, then 10.... I either have to make sure that the DCF stays at 1, or I need to dynamically adjust <flops> settings.. Whatever it takes to make the pipeline stay full... On all my other boxes this works nicely. It's only the machine with two different GPUs in it that has this problem. It's just that I don't wanna keep the 9500GT idle, would be a waste of resources. I want both GPUs to work. But if this results in the whole machine being in idle state for like 4 days a week, it's not very good..

The funny thing: If DCF would be pushed up so far that the estimates would match the slowest crunching processor in the system sorted by type of processor, it would make sense. Let's say, it would push CUDA estimates up to match the speed of the 9500GT. 1.5h per MB WU. Yeah. Would make "some" sense. But it pushes the estimates FAR beyond the time that the SLOWEST crunching processor in the system takes to complete a task. 4h? 5h? How come? The GTX 285 does it in 15min, the 9500GT in roughly 90min. Where do the 4-5h CUDA estimates come from?

(And yeah, they stay at about that level if left alone...)
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

ID: 1033795 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1034080 - Posted: 18 Sep 2010, 16:25:27 UTC - in response to Message 1033795.  



You can use the New rescheduler (BoincRescheduler)
http://setiathome.berkeley.edu/forum_thread.php?id=60712
http://www.efmer.eu/forum_tt/index.php?topic=428.0

to limit the DCF:

"
config.xml

This file is for changing settings and debugging.
This file must be placed together with the exe and has to be called config.xml.
If you don't use it, rename it to e.g. xconfig.xml.
<config>
   <seti>
      <vlar>0.13</vlar>   
      <vhar>1.127</vhar>
      <dcf_min>0.1</dcf_min>
      <dcf_max>0.2</dcf_max>
      <est_ratio_cpu_min>0.5</est_ratio_cpu_min>
      <est_ratio_cpu_max>1.5</est_ratio_cpu_max>
      <est_ratio_gpu_min>0.5</est_ratio_gpu_min>
      <est_ratio_gpu_max>1.5</est_ratio_gpu_max>
   </seti>
</config>

<vlar> Override default VLAR value.
<vhar> Override default VHAR value.
<dcf_min> Minimum duration_correction_factor. If the read value in the state file, is less than this value, this value is used to replace the existing duration_correction_factor.
<dcf_max> Maximum duration_correction_factor. If the read value in the state file, is more than this value, this value is used to replace the existing duration_correction_factor
You need to set both <dcf_min>and <dcf_max>values! A value of 0 may NOT be used.
"



 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1034080 · Report as offensive
Profile Grand Admiral Thrawn
Avatar

Send message
Joined: 19 Feb 01
Posts: 54
Credit: 23,149,634
RAC: 38
Austria
Message 1046207 - Posted: 5 Nov 2010, 10:47:24 UTC

Aaaah, thank you. Will try this as soon as everything is up and running again. :)
3dfx Voodoo5 6000 AGP HiNT Rev.A 3700 prototype, dead HiNT bridge

ID: 1046207 · Report as offensive

Questions and Answers : GPU applications : Machine not getting enough CUDA work, DCF totally off... No idea what to adjust


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.