Only 1 out of 3 GPUs being used, cc_config not working.

Message boards : Number crunching : Only 1 out of 3 GPUs being used, cc_config not working.
Message board moderation

To post messages, you must log in.

AuthorMessage
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1936225 - Posted: 19 May 2018, 1:29:25 UTC
Last modified: 19 May 2018, 1:30:49 UTC

So i have a system as follows

Supermicro X9DRi-LN4F+ v1.10
2x Xeon E5-2690 (v1)
32GB ram
2x GTX 750ti FTW
1x GTX 1060 SC
Running on Ubuntu 17.10 x64 with the "special sauce"

I have tried reinstalled the latest WHQL drivers (390.48, from ppa), to no avail

this system was previously running fine with 2x 750tis and crunching both cards, but of course i can't leave well enough alone :)

so i added the 1060, and now BOINC/SETI is ONLY crunching on the 1060 and the 750tis are sitting idle doing nothing.

i do have my cc_config.xml file as follows: (from my previous working setup)
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>


app_config.xml
<app_config>
<app>
<name>astropulse_v7</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v8</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
</app_config>


app_info.xml
<app_info>
  <app>
     <name>setiathome_v8</name>
  </app>
    <file_info>
      <name>setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90</name>
      <executable/>
    </file_info>
    <file_info>
      <name>libcudart.so.9.0</name>
    </file_info>
    <file_info>
      <name>libcufft.so.9.0</name>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>801</version_num>
      <plan_class>cuda90</plan_class>
      <cmdline></cmdline>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <avg_ncpus>0.1</avg_ncpus>
      <max_ncpus>0.1</max_ncpus>
      <file_ref>
         <file_name>setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90</file_name>
          <main_program/>
      </file_ref>
      <file_ref>
         <file_name>libcudart.so.9.0</file_name>
      </file_ref>
      <file_ref>
         <file_name>libcufft.so.9.0</file_name>
      </file_ref>
    </app_version>
  <app>
     <name>astropulse_v7</name>
  </app>
     <file_info>
       <name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name>
        <executable/>
     </file_info>
     <file_info>
       <name>AstroPulse_Kernels_r2751.cl</name>
     </file_info>
     <file_info>
       <name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name>
     </file_info>
    <app_version>
      <app_name>astropulse_v7</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>708</version_num>
      <plan_class>opencl_nvidia_100</plan_class>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <avg_ncpus>0.1</avg_ncpus>
      <max_ncpus>0.1</max_ncpus>
      <file_ref>
         <file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name>
          <main_program/>
      </file_ref>
      <file_ref>
         <file_name>AstroPulse_Kernels_r2751.cl</file_name>
      </file_ref>
      <file_ref>
         <file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name>
         <open_name>ap_cmdline.txt</open_name>
      </file_ref>
    </app_version>
   <app>
      <name>setiathome_v8</name>
   </app>
      <file_info>
         <name>MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu</name>
         <executable/>
      </file_info>
     <app_version>
     <app_name>setiathome_v8</app_name>
     <platform>x86_64-pc-linux-gnu</platform>
     <version_num>800</version_num>   
      <file_ref>
        <file_name>MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu</file_name>
        <main_program/>
      </file_ref>
    </app_version>
   <app>
      <name>astropulse_v7</name>
   </app>
     <file_info>
       <name>ap_7.05r2728_sse3_linux64</name>
        <executable/>
     </file_info>
    <app_version>
       <app_name>astropulse_v7</app_name>
       <version_num>704</version_num>
       <platform>x86_64-pc-linux-gnu</platform>
       <plan_class></plan_class>
       <file_ref>
         <file_name>ap_7.05r2728_sse3_linux64</file_name>
          <main_program/>
       </file_ref>
    </app_version>
</app_info>


do i have to add something different when the GPUs are not matching in the same system? all my other machines have matching GPU types and "just work"

help please.

output of nvidia-smi:

@SIERRA-SPARE:~$ nvidia-smi
Fri May 18 21:30:24 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 00000000:03:00.0  On |                  N/A |
| 42%   46C    P0     2W /  52W |    317MiB /  1997MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 106...  Off  | 00000000:04:00.0 Off |                  N/A |
| 46%   64C    P2    78W / 120W |   1809MiB /  3019MiB |     93%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 750 Ti  Off  | 00000000:82:00.0 Off |                  N/A |
| 42%   37C    P8     1W /  65W |     13MiB /  2002MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1332      G   /usr/lib/xorg/Xorg                            10MiB |
|    0      1592      G   /usr/bin/gnome-shell                          50MiB |
|    0      1838      G   /usr/lib/xorg/Xorg                           114MiB |
|    0      2000      G   /usr/bin/gnome-shell                         125MiB |
|    1      3100      C   ...me_x41p_zi3v_x86_64-pc-linux-gnu_cuda90  1797MiB |
+-----------------------------------------------------------------------------+

Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1936225 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1936226 - Posted: 19 May 2018, 1:36:07 UTC - in response to Message 1936225.  

Which folder is the cc_config in?
ID: 1936226 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1936227 - Posted: 19 May 2018, 1:46:10 UTC - in response to Message 1936226.  

/var/lib/boinc-client/projects/setiathome.berkley.edu/

here is the log showing that BOINC is ignoring it.

18-May-2018 21:10:16 [---] Data directory: /var/lib/boinc-client
18-May-2018 21:10:19 [---] CUDA: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 390.48, CUDA version 9.1, compute capability 6.1, 3019MB, 2952MB available, 4228 GFLOPS peak)
18-May-2018 21:10:19 [---] CUDA: NVIDIA GPU 1 (not used): GeForce GTX 750 Ti (driver version 390.48, CUDA version 9.1, compute capability 5.0, 1997MB, 1959MB available, 1622 GFLOPS peak)
18-May-2018 21:10:19 [---] CUDA: NVIDIA GPU 2 (not used): GeForce GTX 750 Ti (driver version 390.48, CUDA version 9.1, compute capability 5.0, 2002MB, 1964MB available, 1622 GFLOPS peak)
18-May-2018 21:10:19 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 390.48, device version OpenCL 1.2 CUDA, 3019MB, 2952MB available, 4228 GFLOPS peak)
18-May-2018 21:10:19 [---] OpenCL: NVIDIA GPU 1 (ignored by config): GeForce GTX 750 Ti (driver version 390.48, device version OpenCL 1.2 CUDA, 1997MB, 1959MB available, 1622 GFLOPS peak)
18-May-2018 21:10:19 [---] OpenCL: NVIDIA GPU 2 (ignored by config): GeForce GTX 750 Ti (driver version 390.48, device version OpenCL 1.2 CUDA, 2002MB, 1964MB available, 1622 GFLOPS peak)
18-May-2018 21:10:19 [SETI@home] Found app_info.xml; using anonymous platform

Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1936227 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1936228 - Posted: 19 May 2018, 1:51:18 UTC - in response to Message 1936227.  
Last modified: 19 May 2018, 1:51:57 UTC

/var/lib/boinc-client/projects/setiathome.berkley.edu/

cc_config.xml should be in the boinc-client directory.
/var/lib/boinc-client
Grant
Darwin NT
ID: 1936228 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1936229 - Posted: 19 May 2018, 1:52:26 UTC - in response to Message 1936227.  

Move the cc_config.xml and put it in the boinc folder. It's sitting 2 folders down from where it needs to be placed. Once you do that, close out boinc and relaunch it. It should re-read that xml and begin to use the cards. I also want to see if the event log says it sees the cc_config.xml after you move it.
ID: 1936229 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1936232 - Posted: 19 May 2018, 2:09:15 UTC - in response to Message 1936229.  

/var/lib/boinc-client/projects/setiathome.berkley.edu/

cc_config.xml should be in the boinc-client directory.
/var/lib/boinc-client


Move the cc_config.xml and put it in the boinc folder. It's sitting 2 folders down from where it needs to be placed. Once you do that, close out boinc and relaunch it. It should re-read that xml and begin to use the cards. I also want to see if the event log says it sees the cc_config.xml after you move it.


yup. this was it. i always thought the cc_config file went in the same directory as the app_config files. but maybe that's only windows.

heres the log now:

18-May-2018 22:01:17 [---] Data directory: /var/lib/boinc-client
18-May-2018 22:01:22 [---] CUDA: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 390.48, CUDA version 9.1, compute capability 6.1, 3019MB, 2952MB available, 4228 GFLOPS peak)
18-May-2018 22:01:22 [---] CUDA: NVIDIA GPU 1: GeForce GTX 750 Ti (driver version 390.48, CUDA version 9.1, compute capability 5.0, 1997MB, 1959MB available, 1622 GFLOPS peak)
18-May-2018 22:01:22 [---] CUDA: NVIDIA GPU 2: GeForce GTX 750 Ti (driver version 390.48, CUDA version 9.1, compute capability 5.0, 2002MB, 1964MB available, 1622 GFLOPS peak)
18-May-2018 22:01:22 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 390.48, device version OpenCL 1.2 CUDA, 3019MB, 2952MB available, 4228 GFLOPS peak)
18-May-2018 22:01:22 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 750 Ti (driver version 390.48, device version OpenCL 1.2 CUDA, 1997MB, 1959MB available, 1622 GFLOPS peak)
18-May-2018 22:01:22 [---] OpenCL: NVIDIA GPU 2: GeForce GTX 750 Ti (driver version 390.48, device version OpenCL 1.2 CUDA, 2002MB, 1964MB available, 1622 GFLOPS peak)
18-May-2018 22:01:22 [SETI@home] Found app_info.xml; using anonymous platform


nvidia-smi:
@SIERRA-SPARE:~$ nvidia-smi
Fri May 18 22:08:16 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 00000000:03:00.0  On |                  N/A |
| 51%   72C    P0    21W /  52W |   1781MiB /  1997MiB |     93%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 106...  Off  | 00000000:04:00.0 Off |                  N/A |
| 47%   66C    P2   103W / 120W |   1825MiB /  3019MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 750 Ti  Off  | 00000000:82:00.0 Off |                  N/A |
| 46%   59C    P0    29W /  65W |   1490MiB /  2002MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1332      G   /usr/lib/xorg/Xorg                            15MiB |
|    0      1523      G   /usr/bin/gnome-shell                          50MiB |
|    0      1756      G   /usr/lib/xorg/Xorg                           111MiB |
|    0      1918      G   /usr/bin/gnome-shell                         109MiB |
|    0      2663      C   ...me_x41p_zi3v_x86_64-pc-linux-gnu_cuda90  1460MiB |
|    1      2723      C   ...me_x41p_zi3v_x86_64-pc-linux-gnu_cuda90  1797MiB |
|    2      2706      C   ...me_x41p_zi3v_x86_64-pc-linux-gnu_cuda90  1460MiB |
+-----------------------------------------------------------------------------+


thanks for the quick help guys!
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1936232 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1936233 - Posted: 19 May 2018, 2:15:46 UTC - in response to Message 1936232.  
Last modified: 19 May 2018, 2:16:24 UTC

yup. this was it. i always thought the cc_config file went in the same directory as the app_config files. but maybe that's only windows.

Nope, i'm on Windows and it's in my BOINC directory.
I'd say it's a case of things that affect BOINC as a whole, go in it's directory. Things that affect only a project, go in that project's directory.

Glad it was easily sorted.
Grant
Darwin NT
ID: 1936233 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1936235 - Posted: 19 May 2018, 2:33:41 UTC - in response to Message 1936232.  
Last modified: 19 May 2018, 2:37:32 UTC

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 00000000:03:00.0  On |                  N/A |
| 51%   72C    P0    21W /  52W |   1781MiB /  1997MiB |     93%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 106...  Off  | 00000000:04:00.0 Off |                  N/A |
| 47%   66C    P2   103W / 120W |   1825MiB /  3019MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 750 Ti  Off  | 00000000:82:00.0 Off |                  N/A |
| 46%   59C    P0    29W /  65W |   1490MiB /  2002MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1332      G   /usr/lib/xorg/Xorg                            15MiB |
|    0      1523      G   /usr/bin/gnome-shell                          50MiB |
|    0      1756      G   /usr/lib/xorg/Xorg                           111MiB |
|    0      1918      G   /usr/bin/gnome-shell                         109MiB |
|    0      2663      C   ...me_x41p_zi3v_x86_64-pc-linux-gnu_cuda90  1460MiB |
|    1      2723      C   ...me_x41p_zi3v_x86_64-pc-linux-gnu_cuda90  1797MiB |
|    2      2706      C   ...me_x41p_zi3v_x86_64-pc-linux-gnu_cuda90  1460MiB |
+-----------------------------------------------------------------------------+
From the above it would appear you have the Monitor attached to one of the 2 GB 750 Ti GPUs. As you can see, the 750 Ti is using Most of the Video RAM;
1781MiB of 1997MiB.
That means if you opened enough Browser windows that card will run out of vRAM and start trashing tasks with errors. Since you have a 1060 with over a GB of vRAM available, it would be best if you used that GPU for the Monitor. I have run a 3 GB 1060 out of vRAM before, but it takes quite a bit more than it does with a 2 GB GPU. The best is a 4 GB Card, so far I haven't managed to run one of those out of vRAM.
ID: 1936235 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1936243 - Posted: 19 May 2018, 4:47:43 UTC - in response to Message 1936235.  
Last modified: 19 May 2018, 5:25:03 UTC

From the above it would appear you have the Monitor attached to one of the 2 GB 750 Ti GPUs. As you can see, the 750 Ti is using Most of the Video RAM;
1781MiB of 1997MiB.
That means if you opened enough Browser windows that card will run out of vRAM and start trashing tasks with errors. Since you have a 1060 with over a GB of vRAM available, it would be best if you used that GPU for the Monitor. I have run a 3 GB 1060 out of vRAM before, but it takes quite a bit more than it does with a 2 GB GPU. The best is a 4 GB Card, so far I haven't managed to run one of those out of vRAM.


yes, i have the monitor connected to one of the 750tis.

mainly because my KVM switch is VGA only, and the 1060 does not support analog out, while the 750ti does via the DVI-I port.

this system is a cruncher only. not regularly using any browsers, just the BOINC window and a terminal running HTOP
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1936243 · Report as offensive

Message boards : Number crunching : Only 1 out of 3 GPUs being used, cc_config not working.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.