Running 2 WorkUnits on a GPU is using 2 cores to feed GPU. Is there a quick way to figure out if that's an improvement?

Questions and Answers : GPU applications : Running 2 WorkUnits on a GPU is using 2 cores to feed GPU. Is there a quick way to figure out if that's an improvement?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1795094 - Posted: 10 Jun 2016, 15:32:47 UTC

Hello all,
On my top BOINC performing PC, I have two identical GPUs running the standard BOINC setup and my RAC is starting to stabilize.
I have searched the forum on how to get 2 Work Units (WU) to run on each GPU
and this seems to be the easiest solution:
- create a file named app_config.xml in C:\ProgramData\BOINC\projects\setiathome.berkeley.edu with the following content:
<app_config>
<app>
<name>setiathome_v8</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.25</cpu_usage>
</gpu_versions>
</app>

<app>
<name>astropulse_v7</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.25</cpu_usage>
</gpu_versions>
</app>
</app_config>

and then in the BOINC Manager, select: Options | Read config files

It works!!! ...but it is using 1 core to feed each of the the GPU WUs.
In my case, that's 4 cores for 4 GPU WUs!
Thankfully I have a Xeon 4 core with HyperThreading enabled so BOINC sees 8 cores and I was letting it use 6 or 7.

From reading other threads, this seems to be specific to the GPU SoG app
and there's nothing much to do about it (for now) without getting into much more complex XML config files or running Lunatics setup program.

Considering that each virtual core was processing at: ~13 GFLOPS (using SETI@home v8 8.00 windows_intelx86 )
and before the above changes, SETI@home v8 8.12 windows_intelx86 (opencl_nvidia_SoG) was processing at: ~150 GFLOPS ,
I should still be better off with this latest change if the GPUs can do a decent job at processing 2 WUs in parallel.
Is there a quick way to figure that out?

Cheers,
Rob :-}
ID: 1795094 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1795666 - Posted: 12 Jun 2016, 16:51:56 UTC - in response to Message 1795094.  

<cpu_usage>0.25</cpu_usage>
.....
It works!!! ...but it is using 1 core to feed each of the the GPU WUs.

It's not clear what you mean by "using" and what you mean by "it".
The <cpu_usage>0.25 tells BOINC to "free one core" (run one less CPU task) per 4 GPU tasks running.


BOINC sees 8 cores and I was letting it use 6 or 7

So you tell BOINC to "free cores" by 2 ways (which add-up)
- by Setting: e.g. "Use at most 99% of the CPUs" ("hard" Setting - will free 1 core always)
- by app_config.xml ("soft" Setting - (if <cpu_usage>0.25) will free 1 core only if 4 GPU tasks are running)

<cpu_usage> in app_config.xml don't tell anything to the app, because the app don't know what is app_config.xml (which is read only by BOINC)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1795666 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1795811 - Posted: 12 Jun 2016, 22:52:16 UTC - in response to Message 1795666.  

Hello BilBg,
Thanks for the reply.

I was using the info in this thread:
https://setiathome.berkeley.edu/forum_thread.php?id=79281

After running my OP config for a few hours, I noticed that 2 WUs per GPU were taking twice as long to run so the output wasn't improving.
I then went back to 1 WU per GPU.

Later on, I noticed with GPU-Z that the WUs with the names: blc...guppi...vlar
were already using 100% of the GPU...but not the other ones from Arecibo 2010.

So instead of optimizing the GPUs, I installed the Lunatics v0.44 setup program and I selected the CUDA50 option for the GPUs (GTX 750 Ti).

My CPU WUs are now running much faster at around 3 hrs each instead of almost 4 (HT enabled on a Xeon) but my GPU WUs have not improved.
As compared to the SoG numbers I was getting, I think they are slightly worse.

I'm now wondering if I should try the Lunatics BETA v0.45 to keep the SSE3 for the CPUs and go back to the SoG for the GPUs. see:
http://setiathome.berkeley.edu/forum_thread.php?id=79704&sort_style=6&start=0
but considering I'm not an "Advanced Users only".

Are there other options that I'm not aware of?

FYI, I was hoping to find a fairly easy solution to increase substancially my RAC on my primary cruncher (HP Z400) to what looks like identical PCs with the same GPUs doing 20K of RAC.
see: http://setiathome.berkeley.edu/hosts_user.php?userid=9862389

Any input is greatly appreciated,

Cheers,
Rob :-)
ID: 1795811 · Report as offensive
chris lee

Send message
Joined: 30 May 04
Posts: 9
Credit: 22,759,278
RAC: 0
United Kingdom
Message 1795913 - Posted: 13 Jun 2016, 14:25:04 UTC - in response to Message 1795811.  

Hi

I had a similar problem and after a fair bit of help and some head scratching it became clear that the vlars...blc work units require a CPU processor to run the whole time in addition to the GPU.

If you look at the properties of a completed work unit, you will see that the CPU time for a vlars / blc is almost the same as the total run time.

For the 'ordinary' work units from arecibo, the amount of CPU per work unit is just a few seconds from the total. Hence the setting of 0.04 CPU per work unit.

You can tell BOINC to only use 0.04 CPU or 0.25 as much as you like, but the blc / vlars application will use a whole CPU core. Have a look at your task manager when running one and you will see what I mean.

If you have 4 cores and are running 2 work units at the same time on your GPU then Use at Most 50% of CPUs.

This then ensures that no more than 2 non GPU work units are run on your CPU leaving 2 cores fully available for when your GPU is working on the vlars units.


In my personal case, I also worked out that the optimal number of processes to run at the same time is 4 on my GPU (a 980 Ti) providing I allow no more than 3 threads on my CPU of of total of 8 to run other, non GPU work units thus leaving 4 available (and 1 spare for me!) When my GPU is processing 4 vlars and my CPU is running 3 non GPU work units, it is running 7 CPUs out of 8 at 100%. Keeps my office warm too!

I suggest you try experimenting with different numbers of concurrent work units on your GPU to see what works best but try to run just vlars or just the ordinary work units and ensure you are therefore comparing like with like.

Regardless of settings, the vlars units seem to take 3 or 4 times as long to complete as the ordinary units.

Hope that helps a bit and good luck,

Chris
ID: 1795913 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1796299 - Posted: 15 Jun 2016, 7:37:10 UTC - in response to Message 1795811.  
Last modified: 15 Jun 2016, 7:52:53 UTC

Are there other options that I'm not aware of?

You may ask in "OpenCL NV MultiBeam v8 SoG edition for Windows" for the best Settings (cmdline) for SoG app on your NVIDIA GTX 750 Ti
http://setiathome.berkeley.edu/forum_thread.php?id=79019

Then you may just Copy/Paste to the appropriate .txt file.
(I don't know the exact filename but something like mb_cmdline_win_x86_SSE2_OpenCL_NV_SoG.txt)

As an example (not recommendation) - in this post there is a cmdline ( -sbs 192 ..... -oclfft_tune_cw 16 )
http://setiathome.berkeley.edu/forum_thread.php?id=79019&postid=1773305#1773305


There should be also ~ ReadMe_MultiBeam_OpenCL_NV.txt in setiathome.berkeley.edu\docs\
 
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1796299 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1796679 - Posted: 16 Jun 2016, 20:15:15 UTC

Thanks to both of you for your replies and sorry for this slow response; I was waiting a few days after the "Tues maintenance effect on my RAC" before I replied to see what my Sunday change amounted to.

Currently, I am running with the Lunatics v0.44 setup (since Sunday) with:
- the sse3 app running on the CPU cores ( 6 out of 8 cores ), which is much faster especially on blc...guppi...vlar; and
- the cuda 50 app on the GPUs (2x GTX 750 Ti) with 1 WU each.

I tried a few short runs with 2 WUs on each of the GPUs (with the app_config.xml setup in my orginal post) but since I am getting so many blc...guppi...vlar WUs assigned to the GPUs, there are very few times (<25%) when 2 Arecibo 2010 WUs are running on the same GPU...so it didn't seem worth it.

Is there an easy way where I can keep the sse3 app running on the CPU cores but get the SoG app running on the GPUs (instead of Cuda 50)

The only way I have come across that doesn't involve going into complex XML files or the command line is by trying Lunatics v0.45 in BETA that I mention in my 2nd post.
I'm guessing it is worth the try and if it doesn't work, then I can compare using the stock setup (with SoG on the GPUs) and the Lunatics v0.44 (with sse3 and cuda50) in order to figure out which is most productive.

Any feedback is appreciated,

Cheers,
Rob
ID: 1796679 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1796830 - Posted: 17 Jun 2016, 12:04:11 UTC - in response to Message 1796679.  
Last modified: 17 Jun 2016, 12:15:30 UTC

Is there an easy way where I can keep the sse3 app running on the CPU cores but get the SoG app running on the GPUs (instead of Cuda 50)

The only way I have come across that doesn't involve going into complex XML files or the command line is by trying Lunatics v0.45 in BETA that I mention in my 2nd post.

Yes, Lunatics v0.45 is the best way.
"command line" can be used for both stock and Lunatics ("command line" will be empty by default in both cases)

! Lunatics v0.45 have both CUDA and SoG apps - if you later want to go back to CUDA do Not run Lunatics v0.44 ! (because all the "SoG tasks" will be deleted)
Instead run again Lunatics v0.45 and choose CUDA (will preserve "SoG tasks" and "mark them" as CUDA tasks - this may not be visible in BOINC Manager but they will be done by selected CUDA app which can be checked by Windows Task Manager)


then I can compare using the stock setup (with SoG on the GPUs)

With "stock setup" you can't choose SoG - the server will send tasks for some (all?) CUDA versions and for opencl_nvidia_sah & opencl_nvidia_SoG
http://setiathome.berkeley.edu/apps.php
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1796830 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1797177 - Posted: 18 Jun 2016, 16:48:42 UTC - in response to Message 1796830.  

Hello BilBg.
I installed the Lunatics v0.45 Beta 3 before your reply and I seem to at (or close to) optimal settings without getting into advanced XMLs or command lines (and without overclocking).

On my HP Z400 with two GTX 750 Ti, I have:

5 cores (out of 8) crunching with the app: MB8_win_x64_SSE3...exe
It takes:
- almost 3 hrs/core for Arecibo 2010 WUs; and
- slightly more than 2 hrs/core for blc...guppi...vlar WUs.

1 WU running on each of the GPUs crunching with the app: MB8_win_x86_SSE3...NV_SoG...exe (32 bit)
(this also requires 1 full core to support each of the GPUs)
It takes:
- around 14 mins for Arecibo 2010 WUs; and
- 25-28 mins for blc...guppi...vlar WUs.

According to GPU-Z software, the GPU loads are running close to 100% and are down to 0% for ~20 secs between WUs.

In case others use the info in this thread, you mentioned:
Instead run again Lunatics v0.45 and choose CUDA (will preserve "SoG tasks" and "mark them" as CUDA tasks - this may not be visible in BOINC Manager but they will be done by selected CUDA app which can be checked by Windows Task Manager)

I noticed that only after I had aborted the Cuda WUs that were "Ready to start".
I hope they can fix that in Lunatics (if possible) while it is still in Beta testing (or at least inform the user at the end of the Lunatics setup).
Considering I had emptied my queue the day before, I don't think this is a big issue since those aborted WUs will likely get reassigned quickly to other Hosts and therefor wont delay by more than 48hrs the regular validation process.

Thanks for the help...and keep the comments coming if you have any or can think of something else to optimize.
Rob :-}
ID: 1797177 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1797617 - Posted: 20 Jun 2016, 21:30:31 UTC - in response to Message 1797177.  

I hope they can fix that in Lunatics (if possible) ...

Fix what?
Lunatics installer never changes client_state.xml (now or in the past)
Anything already Downloaded (e.g. marked as stock apps) will remain marked the same in client_state.xml (as shown by BOINC Manager)

If you want "at least inform the user at the end of the Lunatics setup" ask the author (Richard Haselgrove) in the place where he will see this:
http://setiathome.berkeley.edu/forum_thread.php?id=79704
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1797617 · Report as offensive

Questions and Answers : GPU applications : Running 2 WorkUnits on a GPU is using 2 cores to feed GPU. Is there a quick way to figure out if that's an improvement?


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.