Why am I getting a mix of Mcuda50 and SOG for my gpu?

Message boards : Number crunching : Why am I getting a mix of Mcuda50 and SOG for my gpu?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1869564 - Posted: 26 May 2017, 15:38:34 UTC

I had a GT 1060 (e.g. dinky) gpu on this machine: 8213716 (http://setiathome.berkeley.edu/hosts_user.php?userid=190117) and was getting a 50/50 mix of Mcuda50 and SOG after a mix of mcuda 50/42/32? and SOG to start with.

After a while I switched my GTX 750 Ti mini card from my Xeon over to the Intel box. I am still getting Mcuda50 and SOG's. The Mcuda50's seem to be running maybe an hour, very seldom down in the 20-40 minute range. The SOG's run maybe a half hour or more or less. (2 tasks at a time on 750 Ti Gpu).

It seems like, in general the SOG's are faster. Is there any explanation of why I am still getting Mcuda50's?

I do understand that if I upgrade to Luntic's I could stop getting the Cuda50's. I am still experimenting with that on my Xeon box. I don't want to (yet) on my Inteli5.

Is there a way "encourage" the scheduler to only send SOG's under stock seti without creating a brand new cpu id? eg. delete a wisdom file or something?

Thanks,
Tom Miller
A proud member of the OFA (Old Farts Association).
ID: 1869564 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1869625 - Posted: 26 May 2017, 21:22:22 UTC - in response to Message 1869564.  

It seems like, in general the SOG's are faster. Is there any explanation of why I am still getting Mcuda50's?

Because the server still hasn't decided which is best.

It is possible to get the slower version depending on what work is available at the time; get a bunch of GBT work processed by CUDA, then get a bunch of Arecibo work crunched by SoG and the end result is CUDA will be selected, even though it is slower for a given type of WU.

Personally I only run 1 WU at a time with SoG, if you run 1 GBT & 1 Arecibo task on the same GPU, the processing time for the Arecibo task can almost triple.
Grant
Darwin NT
ID: 1869625 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4093
Credit: 85,281,665
RAC: 126
Finland
Message 1869815 - Posted: 27 May 2017, 18:03:40 UTC - in response to Message 1869564.  

See the application details for that host and you'll see how server has evaluated the different applications and their speed.
ID: 1869815 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1869842 - Posted: 27 May 2017, 21:48:24 UTC - in response to Message 1869815.  

See the application details for that host and you'll see how server has evaluated the different applications and their speed.

SETI@home v8 8.00 windows_intelx86 (cuda50) APR 76.85 GFLOPS
SETI@home v8 8.22 windows_intelx86 (opencl_nvidia_SoG) APR 37.26 GFLOPS

It's picked CUDA50.
Depending on the work mix, and if you were running 2 WUs at a time on SoG and only 1 at a time on CUDA50, then that would be why it picked CUDA50 as fastest.
Grant
Darwin NT
ID: 1869842 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1869861 - Posted: 27 May 2017, 23:54:32 UTC - in response to Message 1869842.  
Last modified: 28 May 2017, 0:24:55 UTC

See the application details for that host and you'll see how server has evaluated the different applications and their speed.

SETI@home v8 8.00 windows_intelx86 (cuda50) APR 76.85 GFLOPS
SETI@home v8 8.22 windows_intelx86 (opencl_nvidia_SoG) APR 37.26 GFLOPS

It's picked CUDA50.
Depending on the work mix, and if you were running 2 WUs at a time on SoG and only 1 at a time on CUDA50, then that would be why it picked CUDA50 as fastest.


I will admit to having been running 2 tasks on the gpu but they were both SOG and Cuda50 tasks since I am not (yet) competent to control the number of tasks / application. So the results I
am seeing are mostly from running 2 tasks/wu's on a gpu at a time.

Grant, if you get a chance would you post the file name and parameters for controlling the number of gpu tasks / application? I wouldn't mind running 1 SOG and 2 Cuda50's if I could figure out how... obtw, this system is running a stock Seti.

Based on what I have been recently reading I have started running a single task on my 750 Ti gpu. When I started doing that, when the Cuda50 task was running the gpu was loading at about 50%-60%. So I jacked both the
processpriority = abovenormal
pfblockspersm = 16
pfperiodsperlaunch = 400

pfblock and pfper up to the above from the standard 8/200 mix to see if the load on the gpu would go up. It did the second time I tried it. Apparently 16/200 will spend more time at 95% (doesn't stay there though) gpu load than 16/400 will. Not sure about how it effects the processing speed. So far there has been no screen lag.

I have been clearing out some of my seti at home beta backlog today so I have not yet gotten any idea if the above speeds up or slows down the elapsed processing time of the Cuda50.

My other gtx 750 ti has not run any Cuda50's in a long time except during my Lunatics upgrade where I was fumbling around to get the SOG started again. That was why I have been wondering how two different 750's would attract different mixes of work loads.

Again, Thank you.

Tom Miller
A proud member of the OFA (Old Farts Association).
ID: 1869861 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1869866 - Posted: 28 May 2017, 0:14:55 UTC - in response to Message 1869861.  
Last modified: 28 May 2017, 0:16:16 UTC

Grant, if you get a chance would you post the file name and parameters for controlling the number of gpu tasks / application? I wouldn't mind running 1 SOG and 2 Cuda50's if I could figure out how... obtw, this system is running a stock Seti.

It's probably possible, but I have to say I don't know how.
When my system picked CUDA over SoG on Beta, I just increased the number of WUs till it switched back to SoG, then put it back to 1 at a time.

Based on what I have been recently reading I have started running a single task on my 750 Ti gpu. When I started doing that, when the Cuda50 task was running the gpu was loading at about 50%-60%.

With CUDA you do need to run 2 tasks at a time to get best throughput, with SoG (especially so with lower end cards) it needs to be 1 at a time with some tweaked values to get best productivity.
The default settings for the application are fairly mild to cater for the wide range of hardware and systems so that those running stock don't end up with sluggish or almost unusable systems.


Running stock you need to put the values in the
mb_cmdline-8.22_windows_intel__opencl_nvidia_SoG.txt
file
-tt 700 -hp -period_iterations_num 1 -high_perf -high_prec_timer -sbs 1024 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
These are the values I used on my crunching only system.
As Mike posted in the other thread, set the -period_iterations_num value to 30 or so. If that's OK, reduce it to 15 or so. If that's OK, reduce it to 10, then 5, then 1. If a particular value results in the system becoming too sluggish, bump the value up 3-5 or so & see how it goes. If all is well, then that's the value to use.

My other gtx 750 ti has not run any Cuda50's in a long time except during my Lunatics upgrade where I was fumbling around to get the SOG started again. That was why I have been wondering how two different 750's would attract different mixes of work loads.

As I mentioned, it depends very much on the work type being processed by the application- GBT or Arecibo, and with Arecibo there are shorties, mid range or longer running WUs, and whether you're running 1 or 2 WUs at a time.
The APR value only gives a good indication of processing ability when running 1 WU at a time. With CUDA 50, 2 at a time gives better output, but the APR value will be lower than when running 1WU at a time- for a given type of WU. The same with more powerful video cards running SoG- for them 2 WUs at a time can be best, but the APR value will be lower than if running only 1 WU at a time.
Grant
Darwin NT
ID: 1869866 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1869881 - Posted: 28 May 2017, 1:44:00 UTC - in response to Message 1869866.  

>
>When my system picked CUDA over SoG on Beta, I just increased the number of WUs till it switched back to SoG, then put it back to 1 at a time.
>

If I am understanding you correctly you bumped your app_config.xml file's number gpu tasks per cpu up past 2?

Tom
A proud member of the OFA (Old Farts Association).
ID: 1869881 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1869883 - Posted: 28 May 2017, 2:03:20 UTC - in response to Message 1869842.  

See the application details for that host and you'll see how server has evaluated the different applications and their speed.

SETI@home v8 8.00 windows_intelx86 (cuda50) APR 76.85 GFLOPS
SETI@home v8 8.22 windows_intelx86 (opencl_nvidia_SoG) APR 37.26 GFLOPS

It's picked CUDA50.
Depending on the work mix, and if you were running 2 WUs at a time on SoG and only 1 at a time on CUDA50, then that would be why it picked CUDA50 as fastest.


When I looked at my "other" machine, the SOG gflops were massively larger than this report. Something like 120+ Gflops.

Now the two Gtx 750 Ti's are of different makes and manufacturing. Infact the gpu for this machine is a new mini "750 Ti". And the reports via Gpu-Z when I had them both installed on the Xeon where a little bit different in some of the details. In sensor area, they reported about the memory used in different formats.

So I wonder.

Thanks for the guidance. I have installed the latest parameters you gave me in the mb*sog.txt file but who knows how long before it starts processing the SOG wu/files.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1869883 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1869894 - Posted: 28 May 2017, 3:50:10 UTC - in response to Message 1869881.  

When my system picked CUDA over SoG on Beta, I just increased the number of WUs till it switched back to SoG, then put it back to 1 at a time.

If I am understanding you correctly you bumped your app_config.xml file's number gpu tasks per cpu up past 2?


Yep.
<app>
    <name>setiathome_v8</name>
    <gpu_versions>
    <gpu_usage>0.50</gpu_usage>
    <cpu_usage>0.04</cpu_usage>
   </gpu_versions>
 </app>


<gpu_usage>0.50</gpu_usage> gives you 2 GPU WUs at a time
<gpu_usage>0.33</gpu_usage> gives you 3 GPU WUs at a time.

In my case, I just bumped it up to 2 at a time, and that slowed the processing down enough for it to give SoG another go, at which time I changed it back to 1.
Grant
Darwin NT
ID: 1869894 · Report as offensive

Message boards : Number crunching : Why am I getting a mix of Mcuda50 and SOG for my gpu?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.