Two Nvidia cards, one showing neither being used

Message boards : Number crunching : Two Nvidia cards, one showing neither being used
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Radjin Project Donor
Avatar

Send message
Joined: 2 May 00
Posts: 105
Credit: 14,928,529
RAC: 102
United States
Message 2025732 - Posted: 31 Dec 2019, 19:37:05 UTC
Last modified: 31 Dec 2019, 19:40:32 UTC

I have two video cards installed . I installed the driver directly from the Nvidia site; both cards take the same driver.
===
lspci | grep ' VGA ' | cut -d" " -f 1 | xargs -i lspci -v -s {}

01:00.0 VGA compatible controller: NVIDIA Corporation TU107 (rev a1) (prog-if 00 [VGA controller])
Subsystem: ZOTAC International (MCO) Ltd. TU107
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at f5000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 4000 [size=128]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia

04:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710B] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ZOTAC International (MCO) Ltd. GK208B [GeForce GT 710]
Flags: fast devsel, IRQ 16
[virtual] Memory at f3000000 (32-bit, non-prefetchable) [size=16M]
Memory at e8000000 (64-bit, prefetchable) [size=128M]
Memory at f0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 2000 [size=128]
[virtual] Expansion ROM at f4000000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
===
The first card is a GeForce GTX 1650 the second GeForce GT 710B.

The first shows up in the Boinc users computers, the second does not show up at all.

Boinc is downloading CUDA WU’s but they are not being processed, Boinc-Manager shows the GPU as missing.

This started a couple of days ago upon trying to upgrade the drivers using the Debian repository as Nvidia-detect said the cards work under that standard install. Since that failed I uninstalled and went back to the drivers directly from Nvidia via [sh NVIDIA-Linux-x86_64-440.44.run].

Computer: https://setiathome.berkeley.edu/show_host_detail.php?hostid=8816958

Stderr output from a failed wu below:
===
Stderr output
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
too many boinc_temporary_exit()s</message>
<stderr_txt>
mporary exit (180 secs)
v8 task detected
Cuda error 'Couldn't get cuda device count
' in file 'cuda/cudaAcceleration.cu' in line 138 : invalid device ordinal.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda device initialisation retry 1 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'cuda/cudaAcceleration.cu' in line 138 : invalid device ordinal.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used

Cuda initialisation FAILED, Initiating Boinc temporary exit (180 secs)

</stderr_txt>
]]>

===
The last part repeats 6 times so I did not paste here.


I must have missed some install step.

Suggestions please.
Radjin~
ID: 2025732 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2025733 - Posted: 31 Dec 2019, 19:42:09 UTC - in response to Message 2025732.  

First 30 lines of the start up log from Bonic manager.

Did you install a cc_config.xml to use all gpus? Otherwise only the most capable GPU will be used.
ID: 2025733 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2025734 - Posted: 31 Dec 2019, 19:46:01 UTC - in response to Message 2025732.  
Last modified: 31 Dec 2019, 19:46:19 UTC

you need to add a flag to your CC config to allow BOINC to use all GPUs when you have mismatched GPUs.

cc_config.xml and place in the BOINC main directory:
<cc_config>
    <options>
        <use_all_gpus>1</use_all_gpus>
    </options>
</cc_config>


next, about the errors, I believe the cuda60 app doesnt work well. just wait it out and the servers will send you SoG and sah apps again which seem to process properly on your system as of a few days ago.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2025734 · Report as offensive
Profile Radjin Project Donor
Avatar

Send message
Joined: 2 May 00
Posts: 105
Credit: 14,928,529
RAC: 102
United States
Message 2025736 - Posted: 31 Dec 2019, 20:04:50 UTC - in response to Message 2025734.  
Last modified: 31 Dec 2019, 20:07:16 UTC

you need to add a flag to your CC config to allow BOINC to use all GPUs when you have mismatched GPUs.

cc_config.xml and place in the BOINC main directory:
<cc_config>
    <options>
        <use_all_gpus>1</use_all_gpus>
    </options>
</cc_config>


next, about the errors, I believe the cuda60 app doesnt work well. just wait it out and the servers will send you SoG and sah apps again which seem to process properly on your system as of a few days ago.


I added the lines above so my cc_config.xml Now looks like this:

<cc_config>                                                                                                  
  <log_flags>                                                                                                
    <task>1</task>                                                                                           
    <file_xfer>1</file_xfer>                                                                                 
    <sched_ops>1</sched_ops>                                                                                 
  </log_flags>                                                                                               
</cc_config>                                                                                                 
                                                                                                             
<cc_config>                                                                                                  
    <options>                                                                                                
        <use_all_gpus>1</use_all_gpus>                                                                       
    </options>                                                                                               
</cc_config>               


Thanks to you both for that tidbit. I rebooted, I need to run for a bit; we’ll see how it works while I’m gone.
Radjin~
ID: 2025736 · Report as offensive
Profile Radjin Project Donor
Avatar

Send message
Joined: 2 May 00
Posts: 105
Credit: 14,928,529
RAC: 102
United States
Message 2025737 - Posted: 31 Dec 2019, 20:06:14 UTC - in response to Message 2025733.  

First 30 lines of the start up log from Bonic manager.

Did you install a cc_config.xml to use all gpus? Otherwise only the most capable GPU will be used.



Thanks to you both for that tidbit, I just added the code.

I have to run; rebooted and let’s see how it runs while I am gone a few hours.
Radjin~
ID: 2025737 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2025738 - Posted: 31 Dec 2019, 20:07:47 UTC - in response to Message 2025736.  

<cc_config>                                                                                                  
  <log_flags>                                                                                                
    <task>1</task>                                                                                           
    <file_xfer>1</file_xfer>                                                                                 
    <sched_ops>1</sched_ops>                                                                                 
  </log_flags>                                                                                                                                                                                             
    <options>                                                                                                
        <use_all_gpus>1</use_all_gpus>                                                                       
    </options>                                                                                               
</cc_config>               


Might be a better option
ID: 2025738 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2025740 - Posted: 31 Dec 2019, 20:15:47 UTC - in response to Message 2025738.  

<cc_config>                                                                                                  
  <log_flags>                                                                                                
    <task>1</task>                                                                                           
    <file_xfer>1</file_xfer>                                                                                 
    <sched_ops>1</sched_ops>                                                                                 
  </log_flags>                                                                                                                                                                                             
    <options>                                                                                                
        <use_all_gpus>1</use_all_gpus>                                                                       
    </options>                                                                                               
</cc_config>               


Might be a better option


+1
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2025740 · Report as offensive
Profile Radjin Project Donor
Avatar

Send message
Joined: 2 May 00
Posts: 105
Credit: 14,928,529
RAC: 102
United States
Message 2025741 - Posted: 31 Dec 2019, 20:25:00 UTC - in response to Message 2025738.  

<cc_config>                                                                                                  
  <log_flags>                                                                                                
    <task>1</task>                                                                                           
    <file_xfer>1</file_xfer>                                                                                 
    <sched_ops>1</sched_ops>                                                                                 
  </log_flags>                                                                                                                                                                                             
    <options>                                                                                                
        <use_all_gpus>1</use_all_gpus>                                                                       
    </options>                                                                                               
</cc_config>               


Might be a better option



Updated and rebooted again. Thanks for that.
Radjin~
ID: 2025741 · Report as offensive
Profile Radjin Project Donor
Avatar

Send message
Joined: 2 May 00
Posts: 105
Credit: 14,928,529
RAC: 102
United States
Message 2025742 - Posted: 31 Dec 2019, 20:26:36 UTC - in response to Message 2025733.  

First 30 lines of the start up log from Bonic manager.

Did you install a cc_config.xml to use all gpus? Otherwise only the most capable GPU will be used.



What is the name of the start up log? Client state? I didn’t see a file named start up in the Boinc-client directory.
Radjin~
ID: 2025742 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2025743 - Posted: 31 Dec 2019, 20:29:56 UTC - in response to Message 2025742.  

its in the GUI. just hit ctrl+shift+E with the boincmgr GUI running. that brings up the log, you can copy and paste the first few lines from there.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2025743 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34766
Credit: 261,360,520
RAC: 489
Australia
Message 2025748 - Posted: 31 Dec 2019, 21:11:04 UTC

There's no drivers being listed for that rig so I'd look further into that. ;-)

Cheers.
ID: 2025748 · Report as offensive
Profile Radjin Project Donor
Avatar

Send message
Joined: 2 May 00
Posts: 105
Credit: 14,928,529
RAC: 102
United States
Message 2025756 - Posted: 31 Dec 2019, 21:33:14 UTC - in response to Message 2025748.  

There's no drivers being listed for that rig so I'd look further into that. ;-)

Cheers.


I saw drivers being used in the output from the first post. That does not mean they are installed?

The same question when it is showing in computers on the boinc site?

In the first post I described running the installer from nvidia; I didn’t get any errors. Is there another step I missed?
Radjin~
ID: 2025756 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34766
Credit: 261,360,520
RAC: 489
Australia
Message 2025765 - Posted: 31 Dec 2019, 21:45:14 UTC

There's no drivers being listed for that rig so I'd look further into that. ;-)

Cheers.
I saw drivers being used in the output from the first post. That does not mean they are installed?

The same question when it is showing in computers on the boinc site?

In the first post I described running the installer from nvidia; I didn’t get any errors. Is there another step I missed?
That rig only shows, "NVIDIA GeForce GTX 1650 (3911MB)", were it should be showing something more like this, "NVIDIA GeForce GTX 1650 (3911MB) driver: 440.44 OpenCL: 1.2", for it to work. ;-)

Cheers.
ID: 2025765 · Report as offensive
Profile Radjin Project Donor
Avatar

Send message
Joined: 2 May 00
Posts: 105
Credit: 14,928,529
RAC: 102
United States
Message 2025766 - Posted: 31 Dec 2019, 21:51:53 UTC - in response to Message 2025765.  

Thanks for that info. At least I know what to look for.
Radjin~
ID: 2025766 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2025778 - Posted: 31 Dec 2019, 22:57:09 UTC

next, about the errors, I believe the cuda60 app doesnt work well. just wait it out and the servers will send you SoG and sah apps again which seem to process properly on your system as of a few days ago.


Or speed things up with this is cc_config.xml
<no_cuda>1</no_cuda>
That will prevent the CUDA60 from being used and force the SAH and SoG OpenCL apps on the host. Still need to see OpenCL being detected on BOINC startup for even that to work.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2025778 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2025783 - Posted: 31 Dec 2019, 23:23:54 UTC - in response to Message 2025778.  

I wonder how he got and processed SoG and sah tasks already tho.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2025783 · Report as offensive
Profile Radjin Project Donor
Avatar

Send message
Joined: 2 May 00
Posts: 105
Credit: 14,928,529
RAC: 102
United States
Message 2025788 - Posted: 31 Dec 2019, 23:38:16 UTC - in response to Message 2025783.  

Here is the Boinc log.

===
Tue 31 Dec 2019 03:18:06 PM PST | | Starting BOINC client version 7.14.2 for x86_64-pc-linux-gnu
Tue 31 Dec 2019 03:18:06 PM PST | | log flags: file_xfer, sched_ops, task
Tue 31 Dec 2019 03:18:06 PM PST | | Libraries: libcurl/7.64.0 OpenSSL/1.1.1d zlib/1.2.11 libidn2/2.0.5 libps
l/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
Tue 31 Dec 2019 03:18:06 PM PST | | Data directory: /var/lib/boinc-client
Tue 31 Dec 2019 03:18:07 PM PST | | CUDA: NVIDIA GPU 0: GeForce GTX 1650 (driver version unknown, CUDA versi
on 10.2, compute capability 7.5, 3912MB, 3851MB available, 2984 GFLOPS peak)
Tue 31 Dec 2019 03:18:07 PM PST | | App version needs OpenCL but GPU doesn't support it
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Application uses missing NVIDIA GPU
Tue 31 Dec 2019 03:18:07 PM PST | | App version needs OpenCL but GPU doesn't support it
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Application uses missing NVIDIA GPU
Tue 31 Dec 2019 03:18:07 PM PST | | App version needs OpenCL but GPU doesn't support it
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Application uses missing NVIDIA GPU
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.10543.2112.9.36.254.vlar_
0
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task blc56_2bit_guppi_58692_58180_HIP21
489_0021.22559.409.21.44.208.vlar_0
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 13se08ab.7651.8661.14.41.174_2
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.5186.12746.6.33.76.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task blc56_2bit_guppi_58692_60738_HIP23
083_0029.1830.409.21.44.18.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.5169.17245.5.32.51_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.5186.18063.6.33.3.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.5169.19699.5.32.49_0
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task blc56_2bit_guppi_58692_58497_HIP21
594_0022.11036.409.21.44.128.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.17700.9474.14.41.81_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task blc56_2bit_guppi_58692_61070_HIP23
512_0030.13547.818.21.44.14.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.3075.3748.16.43.253_1
===
My system does not process CUDA?
Radjin~
ID: 2025788 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2025789 - Posted: 31 Dec 2019, 23:43:59 UTC - in response to Message 2025778.  

Or speed things up with this is cc_config.xml
<no_cuda>1</no_cuda>
I'm worried about that line. It doesn't appear as an option in the User Manual, and it isn't written out in the full template when you change a logging option.

Also, it should be written in the config report at startup - and it isn't there in the log Radjin has just posted.
ID: 2025789 · Report as offensive
Profile Radjin Project Donor
Avatar

Send message
Joined: 2 May 00
Posts: 105
Credit: 14,928,529
RAC: 102
United States
Message 2025791 - Posted: 31 Dec 2019, 23:48:37 UTC - in response to Message 2025783.  

I wonder how he got and processed SoG and sah tasks already tho.

It was working until I did the update and added the second GPU. Now no matter what I do it does not load. I tried the standard Debian Nvidia-Driver install and that locks the system to where I have to do a hard restart then use recovery to remove it. Then I tried an install using Nvidia-driver install, as described here: https://linuxusers.net/debian/how_install_debian_10_buster_with_nvidia.php on step 5 using back ports. That ran but was not seeing the drivers. Lastly I went back to the drivers directly from Nvidia as described above and that’s where I am now. I have tried each step above with and without the second card installed.
Radjin~
ID: 2025791 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2025799 - Posted: 1 Jan 2020, 0:05:10 UTC

You are going to need to figure out how to get the Nvidia OpenCL drivers installed somehow. I don't know what Debian has as an equivalent to the standard Ubuntu distro command:
sudo apt-get install ocl-icd-libopencl1


I thought that Jord proved that <no_cuda>1</no_cuda> works over at Github. I see it referenced several times in conversation.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2025799 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Two Nvidia cards, one showing neither being used


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.