Message boards :
Number crunching :
Why am I getting a mix of Mcuda50 and SOG for my gpu?
Message board moderation
Author | Message |
---|---|
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I had a GT 1060 (e.g. dinky) gpu on this machine: 8213716 (http://setiathome.berkeley.edu/hosts_user.php?userid=190117) and was getting a 50/50 mix of Mcuda50 and SOG after a mix of mcuda 50/42/32? and SOG to start with. After a while I switched my GTX 750 Ti mini card from my Xeon over to the Intel box. I am still getting Mcuda50 and SOG's. The Mcuda50's seem to be running maybe an hour, very seldom down in the 20-40 minute range. The SOG's run maybe a half hour or more or less. (2 tasks at a time on 750 Ti Gpu). It seems like, in general the SOG's are faster. Is there any explanation of why I am still getting Mcuda50's? I do understand that if I upgrade to Luntic's I could stop getting the Cuda50's. I am still experimenting with that on my Xeon box. I don't want to (yet) on my Inteli5. Is there a way "encourage" the scheduler to only send SOG's under stock seti without creating a brand new cpu id? eg. delete a wisdom file or something? Thanks, Tom Miller A proud member of the OFA (Old Farts Association). |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
It seems like, in general the SOG's are faster. Is there any explanation of why I am still getting Mcuda50's? Because the server still hasn't decided which is best. It is possible to get the slower version depending on what work is available at the time; get a bunch of GBT work processed by CUDA, then get a bunch of Arecibo work crunched by SoG and the end result is CUDA will be selected, even though it is slower for a given type of WU. Personally I only run 1 WU at a time with SoG, if you run 1 GBT & 1 Arecibo task on the same GPU, the processing time for the Arecibo task can almost triple. Grant Darwin NT |
Harri Liljeroos Send message Joined: 29 May 99 Posts: 4093 Credit: 85,281,665 RAC: 126 |
See the application details for that host and you'll see how server has evaluated the different applications and their speed. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
See the application details for that host and you'll see how server has evaluated the different applications and their speed. SETI@home v8 8.00 windows_intelx86 (cuda50) APR 76.85 GFLOPS SETI@home v8 8.22 windows_intelx86 (opencl_nvidia_SoG) APR 37.26 GFLOPS It's picked CUDA50. Depending on the work mix, and if you were running 2 WUs at a time on SoG and only 1 at a time on CUDA50, then that would be why it picked CUDA50 as fastest. Grant Darwin NT |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
See the application details for that host and you'll see how server has evaluated the different applications and their speed. I will admit to having been running 2 tasks on the gpu but they were both SOG and Cuda50 tasks since I am not (yet) competent to control the number of tasks / application. So the results I am seeing are mostly from running 2 tasks/wu's on a gpu at a time. Grant, if you get a chance would you post the file name and parameters for controlling the number of gpu tasks / application? I wouldn't mind running 1 SOG and 2 Cuda50's if I could figure out how... obtw, this system is running a stock Seti. Based on what I have been recently reading I have started running a single task on my 750 Ti gpu. When I started doing that, when the Cuda50 task was running the gpu was loading at about 50%-60%. So I jacked both the processpriority = abovenormal pfblockspersm = 16 pfperiodsperlaunch = 400 pfblock and pfper up to the above from the standard 8/200 mix to see if the load on the gpu would go up. It did the second time I tried it. Apparently 16/200 will spend more time at 95% (doesn't stay there though) gpu load than 16/400 will. Not sure about how it effects the processing speed. So far there has been no screen lag. I have been clearing out some of my seti at home beta backlog today so I have not yet gotten any idea if the above speeds up or slows down the elapsed processing time of the Cuda50. My other gtx 750 ti has not run any Cuda50's in a long time except during my Lunatics upgrade where I was fumbling around to get the SOG started again. That was why I have been wondering how two different 750's would attract different mixes of work loads. Again, Thank you. Tom Miller A proud member of the OFA (Old Farts Association). |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Grant, if you get a chance would you post the file name and parameters for controlling the number of gpu tasks / application? I wouldn't mind running 1 SOG and 2 Cuda50's if I could figure out how... obtw, this system is running a stock Seti. It's probably possible, but I have to say I don't know how. When my system picked CUDA over SoG on Beta, I just increased the number of WUs till it switched back to SoG, then put it back to 1 at a time. Based on what I have been recently reading I have started running a single task on my 750 Ti gpu. When I started doing that, when the Cuda50 task was running the gpu was loading at about 50%-60%. With CUDA you do need to run 2 tasks at a time to get best throughput, with SoG (especially so with lower end cards) it needs to be 1 at a time with some tweaked values to get best productivity. The default settings for the application are fairly mild to cater for the wide range of hardware and systems so that those running stock don't end up with sluggish or almost unusable systems. Running stock you need to put the values in the mb_cmdline-8.22_windows_intel__opencl_nvidia_SoG.txt file -tt 700 -hp -period_iterations_num 1 -high_perf -high_prec_timer -sbs 1024 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 These are the values I used on my crunching only system. As Mike posted in the other thread, set the -period_iterations_num value to 30 or so. If that's OK, reduce it to 15 or so. If that's OK, reduce it to 10, then 5, then 1. If a particular value results in the system becoming too sluggish, bump the value up 3-5 or so & see how it goes. If all is well, then that's the value to use. My other gtx 750 ti has not run any Cuda50's in a long time except during my Lunatics upgrade where I was fumbling around to get the SOG started again. That was why I have been wondering how two different 750's would attract different mixes of work loads. As I mentioned, it depends very much on the work type being processed by the application- GBT or Arecibo, and with Arecibo there are shorties, mid range or longer running WUs, and whether you're running 1 or 2 WUs at a time. The APR value only gives a good indication of processing ability when running 1 WU at a time. With CUDA 50, 2 at a time gives better output, but the APR value will be lower than when running 1WU at a time- for a given type of WU. The same with more powerful video cards running SoG- for them 2 WUs at a time can be best, but the APR value will be lower than if running only 1 WU at a time. Grant Darwin NT |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
> >When my system picked CUDA over SoG on Beta, I just increased the number of WUs till it switched back to SoG, then put it back to 1 at a time. > If I am understanding you correctly you bumped your app_config.xml file's number gpu tasks per cpu up past 2? Tom A proud member of the OFA (Old Farts Association). |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
See the application details for that host and you'll see how server has evaluated the different applications and their speed. When I looked at my "other" machine, the SOG gflops were massively larger than this report. Something like 120+ Gflops. Now the two Gtx 750 Ti's are of different makes and manufacturing. Infact the gpu for this machine is a new mini "750 Ti". And the reports via Gpu-Z when I had them both installed on the Xeon where a little bit different in some of the details. In sensor area, they reported about the memory used in different formats. So I wonder. Thanks for the guidance. I have installed the latest parameters you gave me in the mb*sog.txt file but who knows how long before it starts processing the SOG wu/files. Tom A proud member of the OFA (Old Farts Association). |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
When my system picked CUDA over SoG on Beta, I just increased the number of WUs till it switched back to SoG, then put it back to 1 at a time. Yep. <app> <name>setiathome_v8</name> <gpu_versions> <gpu_usage>0.50</gpu_usage> <cpu_usage>0.04</cpu_usage> </gpu_versions> </app> <gpu_usage>0.50</gpu_usage> gives you 2 GPU WUs at a time <gpu_usage>0.33</gpu_usage> gives you 3 GPU WUs at a time. In my case, I just bumped it up to 2 at a time, and that slowed the processing down enough for it to give SoG another go, at which time I changed it back to 1. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.