cuda 90 app not using as much GPU as it should?

Message boards : Number crunching : cuda 90 app not using as much GPU as it should?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 2017222 - Posted: 30 Oct 2019, 15:07:18 UTC

OS Linux -
This started as a discussion between myself and another person on why my i9-9900k was not as productive as my i7-3960x even though the i9 was running at a higher frequency. Well that turned out to be due to a memory module placed in the wrong slot (2 modules 4 slots)
Anyway, in the course of things I sent him a message showing my nvidia state while crunching and he mentioned that he thought the GPU wasn't performing as highly as it should. So I am posting it here in the hopes that someone can point out how i can get the full use of it
The i9 is at 48% and my i7 is running its nvidia at 28%, (this varies up and down over time but it seems to always be below 50%) both systems have the same nvidia card a GeForce GTX 1660 Ti and are using the same driver(440.26) and kernel (5.3.7-1-default) the app is setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90
The pcie slot in both machines is running at PCIe 3.0 x16 as it should according to lspci: "LnkSta: Speed 8GT/s (ok), Width x16 (ok)" with a total "possible" throughput of 15.75GB/s specified when i look up the spec for that "pcie ver/slot width" combo

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.26       Driver Version: 440.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0 Off |                  N/A |
| 40%   60C    P2   107W / 130W |   1464MiB /  5941MiB |     48%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1479      G   /usr/bin/X                                    85MiB |
|    0      1946      G   /usr/bin/kwin_x11                             22MiB |
|    0      1953      G   /usr/bin/plasmashell                          35MiB |
|    0      2885      G   /usr/bin/krunner                               8MiB |
|    0     25576      C   ...x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90  1305MiB |
+-----------------------------------------------------------------------------+
ID: 2017222 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2017224 - Posted: 30 Oct 2019, 15:18:41 UTC

Well first I would add the -nobs parameter that has been suggested to you multiple times.
<cmdline>-nobs</cmdline>

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2017224 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017239 - Posted: 30 Oct 2019, 17:00:45 UTC - in response to Message 2017222.  

In addition to the -nobs flag, what % of the CPU are you using? You need to reserve some spare CPU resources to feed the GPU. If you allow BOINC to use 100% of the CPU, you will bottle the GPU. Set the CPU resource to 80-85% in the compute settings.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017239 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 2017312 - Posted: 31 Oct 2019, 3:53:42 UTC
Last modified: 31 Oct 2019, 3:57:09 UTC

Ok, so then what is the purpose of these in app_info.xml?
<avg_ncpus>0.1</avg_ncpus>
<max_ncpus>0.1</max_ncpus>
Another question: if you set 80-85% in computing preferences doesn't that end up reducing overall performance?
When i try that the nvidia bounces all over the place, its not steady at some particular load level (within a few points) as I would expect
ID: 2017312 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2017323 - Posted: 31 Oct 2019, 5:56:25 UTC - in response to Message 2017312.  

Ok, so then what is the purpose of these in app_info.xml?
<avg_ncpus>0.1</avg_ncpus>
<max_ncpus>0.1</max_ncpus>

They are configurations to tell the scheduler how much work can be processed when your host contacts the scheduler. From the client configuration document:

avg_ncpus
the number of CPU instances (possibly fractional) used by the app version.

<ncpus>N</ncpus>
Act as if there were N CPUs; e.g. to simulate 2 CPUs on a machine that has only 1. Zero means use the actual number of CPUs. Don't use this to limit CPU usage; use computing preferences instead.


https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2017323 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2017324 - Posted: 31 Oct 2019, 6:01:25 UTC - in response to Message 2017312.  
Last modified: 31 Oct 2019, 6:04:20 UTC

Another question: if you set 80-85% in computing preferences doesn't that end up reducing overall performance?
When i try that the nvidia bounces all over the place, its not steady at some particular load level (within a few points) as I would expect


No, you need to reserve some cpu to handle the desktop or any other background process. The reason that the gpu task is under utilized is because you have overcommitted too much cpu to cpu tasks and the desktop and there is not enough cpu to support the gpu task. The gpu task thread is being starved for resources and is constantly being ignored for timeslices because the cpu threads are busy with cpu work and the desktop maintenance.

You need to allocate at least one cpu thread for each gpu task. You do that by setting:

<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>


You tell the gpu application to use a full cpu core to support the gpu thread by adding the -nobs parameter to the cmdline entry in the app_info or app_config files.

<cmdline>-nobs</cmdline>

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2017324 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 2017328 - Posted: 31 Oct 2019, 7:24:25 UTC

Ok, this is what i have now - look good?

<app>
     <name>setiathome_v8</name>
  </app>
    <file_info>
      <name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>801</version_num>
      <plan_class>cuda90</plan_class>
      <cmdline>-nobs</cmdline>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>                                                                                                                                                                                    
      <avg_ncpus>1</avg_ncpus>
      <ngpus>1</ngpus>
      <file_ref>
         <file_name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</file_name>
          <main_program/>
      </file_ref>
    </app_version>
ID: 2017328 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 2017331 - Posted: 31 Oct 2019, 7:36:51 UTC

Watching nvidia-smi (watch -n 1) for a few minutes now shows my gpu usage at roughly 98% give or take 1-2% - thanks!
I also had set computing preferences to 85% in boinc manager as you recommended earlier.
I can check 'Host Average' in a day or two and see how things are affected.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.26       Driver Version: 440.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0  On |                  N/A |
| 56%   63C    P2   115W / 130W |   1482MiB /  5941MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1479      G   /usr/bin/X                                    93MiB |
|    0      1946      G   /usr/bin/kwin_x11                             22MiB |
|    0      1953      G   /usr/bin/plasmashell                          35MiB |
|    0      2885      G   /usr/bin/krunner                               8MiB |
|    0     21797      C   ...x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90  1307MiB |
+-----------------------------------------------------------------------------+
asus-isa-0000
Adapter: ISA adapter
cpu_fan:        0 RPM

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +64.0°C  (high = +86.0°C, crit = +100.0°C)
Core 0:        +63.0°C  (high = +86.0°C, crit = +100.0°C)
Core 1:        +61.0°C  (high = +86.0°C, crit = +100.0°C)
Core 2:        +64.0°C  (high = +86.0°C, crit = +100.0°C)
Core 3:        +62.0°C  (high = +86.0°C, crit = +100.0°C)
Core 4:        +64.0°C  (high = +86.0°C, crit = +100.0°C)
Core 5:        +62.0°C  (high = +86.0°C, crit = +100.0°C)
Core 6:        +62.0°C  (high = +86.0°C, crit = +100.0°C)
Core 7:        +62.0°C  (high = +86.0°C, crit = +100.0°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  (crit = +119.0°C)

Frequency: 4.182Ghz
ID: 2017331 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 2017333 - Posted: 31 Oct 2019, 7:52:53 UTC

Whats the difference between
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90
and
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101

is one faster than the other?
ID: 2017333 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2017339 - Posted: 31 Oct 2019, 11:40:28 UTC

the "....90" app uses CUDA version 9 while the "...101" uses CUDA verson 10.1
Which is faster really depends on how old your hardware is, reports suggest that with the very latest GPUs CUDA 10.1 is faster, while for many of the oldest CUDA 9.0 is faster. But be aware that not all of the very oldest GPUs are supported by the latest versions of CUDA.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2017339 · Report as offensive
Profile Retvari Zoltan

Send message
Joined: 28 Apr 00
Posts: 35
Credit: 128,746,856
RAC: 230
Hungary
Message 2017342 - Posted: 31 Oct 2019, 12:09:04 UTC

Do not run more CPU tasks simultaneously as many cores your CPU has. The i9-9900K has 8 cores.
ID: 2017342 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2017357 - Posted: 31 Oct 2019, 14:11:18 UTC

The app_info looks fine as does your reported gpu utilization. Your cpu times look reasonable. Not too much difference between cpu_time and run_time, only about 5 minutes so your 85% cpu utilization shows you are not overcommitted with your cpu resources. You setup looks good now. Your cpu times and gpu times look reasonable for your hardware and applications now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2017357 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2017361 - Posted: 31 Oct 2019, 14:30:00 UTC - in response to Message 2017328.  

You actually have a lot of (invisible) trailing spaces on the </coproc> line, which are causing the thread to stretch and become hard to read.

Too late to do anything about it now, but if you clean up the file now, it'll be ready if you ever feel the need to post it again.
ID: 2017361 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2017362 - Posted: 31 Oct 2019, 14:40:58 UTC - in response to Message 2017333.  

Whats the difference between
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90
and
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101

is one faster than the other?


the 101 app will be slightly faster on your 1660ti
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2017362 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 2017557 - Posted: 2 Nov 2019, 15:15:23 UTC - in response to Message 2017362.  

I just switched one of my systems to cuda101 - if it pans out as a bit faster I'll switch the other also
I didn't know what to put for version_num so i just used 801 as before and it seems to work
Also I checked for trailing spaes
In vim use this
:highlight ExtraWhitespace ctermbg=red guibg=red
:match ExtraWhitespace /\s\+$/
and any trailing spaces show up in red
ID: 2017557 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2017574 - Posted: 2 Nov 2019, 17:37:31 UTC - in response to Message 2017362.  
Last modified: 2 Nov 2019, 18:29:03 UTC

Whats the difference between
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90
and
setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101

is one faster than the other?


the 101 app will be slightly faster on your 1660ti

Hi Ian,

[edit]
Duh! Doh!!! Belay that. I just looked at my ap_info.xml file and see that 10.1 is assigned to setiathome_v8.
[/edit]

[edit2]
Just shows to go ya how long it's been since I upgraded to 10.1. I forgot how I did it. ;)
[/edit2]

I'm running a GTX 1660Ti on my i7 8086K PC. I was just looking at my app_config.xml file and I have a question about it:
<app_config>
  <app_version>
    <app_name>setiathome_v8</app_name>
    <plan_class>cuda90</plan_class>
    <avg_ncpus>1</avg_ncpus>
    <ngpus>1</ngpus>
    <cmdline>-nobs -pfb 32</cmdline>
  </app_version>

  <app_version>
    <app_name>astropulse_v7</app_name>
    <plan_class>opencl_nvidia_100</plan_class>
    <avg_ncpus>1</avg_ncpus>
    <ngpus>1</ngpus>
  </app_version>
</app_config>

I have the 10.1 on the astropulse_v7 plan_class. The setiathome_v8 plan_class has cuda90. Should I have the 10.1 on both, or is this the correct way to run?

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2017574 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 2018166 - Posted: 8 Nov 2019, 13:49:34 UTC

So after running awhile with the new settings I see the GPU's are working as expected but the CPU performance isn't. The i9 is lagging behind the i7 and the i9 has more threads. I gathered some data and it seems to confirm my visual observations. The average time to process MB WU's on the i9 is 4184 seconds each compared to the i7 at 3790. This isn't averaged over a lot of MB WU's but nevertheless it doesn't seem right to me, especially given that the i9 is running at 4.2Ghz vs the i7 at 3.7Ghz.

Average GPU time on ERB1: 100 GPU WUs - average of 73.3 seconds each
Average GPU time on ERB2: 100 GPU WUs - average of 61.4 seconds each

Average CPU time on ERB1: 19 CPU WUs - average of 3790.4 seconds each
Average CPU time on ERB2: 33 CPU WUs - average of 4184.6 seconds each

both PC's are using MBv8_8.05r3345_avx_linux64
I have two other MB apps I can try (if people think they would be better)
MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu
MBv8_8.22r4008_avx2_intel_x86_64-pc-linux-gnu

avx2 seems a step up from avx but probably it depends heavily on the app and if it can be taken advantage of or not.
ID: 2018166 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2018167 - Posted: 8 Nov 2019, 13:59:40 UTC - in response to Message 2018166.  
Last modified: 8 Nov 2019, 14:07:16 UTC

The only way to be sure is run few rounds with each app to be sure what is the best one on your particular host.

I done that some time ago and in my case the best performance was with MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu

So YMMV is in place, test all the sauces to be sure, just not forget, compare similar types of WU and with similar AR or your test is not valid.

Otherwise be advice, there are some sweet spot on the number of CPU WU running at the same time on the CPU, above or below that the number of total crunched WU per hr/day is less.

Again, test is the only way to be sure on a particular host. On mine old 12 thread i7-6850K CPU driving 4 GPUs, the best point is 4 GPU WU + 6 CPU WU at a time leaving 2 threads free.
ID: 2018167 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 2018228 - Posted: 9 Nov 2019, 1:26:12 UTC

Well, the same app AND the same WU - which brings me back to my original question -
is there some way to load the same WU on 2 different machines (multiple instances, 1 per boinc/seti cpu thread)
so I can properly compare things? I'd be using it as a benchmark WU. There needs to be safeguards so it is never
uploaded to a seti server
ID: 2018228 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 715
Credit: 8,032,827
RAC: 62
France
Message 2018255 - Posted: 9 Nov 2019, 9:44:43 UTC

ID: 2018255 · Report as offensive

Message boards : Number crunching : cuda 90 app not using as much GPU as it should?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.