Posts by Radjin

1) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2026936)
Posted 9 Jan 2020 by Profile Radjin Project Donor
Post:
I didn’t notice it before but looking in an earlier post both cards we’re showing and IRQ of 16. If I remember my old BBS days this causes both devices to fail. What do you guys think?
IRQ sharing has been possible for years, and PCI E doesn't actually use IRQs at all.


We’ll burst a bubble, I thought I had it figured out. At least the 1650 appears to be working.
2) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2026935)
Posted 9 Jan 2020 by Profile Radjin Project Donor
Post:
You still have some sort of big problem there as you're error count is mounting fast.

Cheers.


It’s all the CUDA WU’s that downloaded. I guess I can’t process them?
3) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2026932)
Posted 9 Jan 2020 by Profile Radjin Project Donor
Post:
01:00.0 VGA compatible controller: NVIDIA Corporation TU107 (rev a1) (prog-if 00 [VGA controller])
Subsystem: ZOTAC International (MCO) Ltd. TU107
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at f5000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 4000 [size=128]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia

VGA compatible controller: NVIDIA Corporation TU107 (rev a1) (prog-if 00 [VGA controller])
Subsystem: ZOTAC International (MCO) Ltd. TU107
Flags: bus master, fast devsel, latency 0, IRQ 28
Memory at e3000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 3000 [size=128]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia

Here is the 1650 card info before (not working) and after (working) the only difference is the IRQ. I didn’t notice it before but looking in an earlier post both cards we’re showing and IRQ of 16. If I remember my old BBS days this causes both devices to fail. What do you guys think?
4) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2026931)
Posted 9 Jan 2020 by Profile Radjin Project Donor
Post:
nvidia-smi
Wed Jan 8 20:55:46 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:01:00.0 Off | N/A |
| 54% 55C P0 46W / 75W | 276MiB / 3911MiB | 86% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1603 C ..._x86_64-pc-linux-gnu__opencl_nvidia_SoG 265MiB |
+-----------------------------------------------------------------------------+
5) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2026930)
Posted 9 Jan 2020 by Profile Radjin Project Donor
Post:
Yet the drivers will not run. There must be some missing dependency that keeps the drivers from activating.
Or you have a damaged videocard. You said all worked fine until you added the 710b, so what happens when you take that one out and then install the drivers?

If that works, try exchanging the cards, taking the GTX 1650 out and only putting the GT 710b in. Does that work with those drivers, or does it work when you install the drivers? If it doesn't, you found your culprit.

If the 710b works in the PCIe slot of the 1650, try either the 1650 or this 710b solely in the other PCIe slot that the 710b was in originally, to exclude that it's a damaged PCIe slot.


I did all the above multiple times as I tried to install the drivers three different ways. However I took your advice and did it again except this time I completely purged anything to do with nvidia and opencl, removed the 710B card and reinstalled using this page: https://www.kinetica.com/docs/install/nvidia_deb.html and it started working. I would like to add the 710 card back in but think I will wait until I have a few days together to troubleshoot.

Thanks for the info.
6) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2026671)
Posted 7 Jan 2020 by Profile Radjin Project Donor
Post:
Whatever you’re comfortable with.

But Server is CLI only. No desktop environment.


Thanks. It sounds like everyone that knows the OS uses the desktop version. I’ll go with that.
7) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2026664)
Posted 7 Jan 2020 by Profile Radjin Project Donor
Post:
I tried it once (very early version) and didn't really find any advantage on the small & simple set of partitions I needed. In general such things don't really come into play unless you have large disc arrays with multiple (dynamic) partitions which are not the usual case for the home user. Beware that if one gets things wrong it is possible not just to destroy the partition you were working on, but the whole array, and there is very little chance of rescuing it.


Installing Ubuntu on an old laptop to play with it. On my web server rig, do you recommend desktop or server?
8) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025989)
Posted 2 Jan 2020 by Profile Radjin Project Donor
Post:
If you are installing the drivers from the command line, you can watch the drivers being compiled into the kernel. There is nothing that needs to be done other than reboot the system to reload the new compiled kernel image with the drivers in it.


Yet the drivers will not run. There must be some missing dependency that keeps the drivers from activating. Is there a command to check the status of the drivers?

I ran the command after purging everything nvidia and I can see that only nouveau is shown where before it was: Kernel modules: nouveau, nvidia_drm, nvidia

lspci | grep ' VGA ' | cut -d" " -f 1 | xargs -i lspci -v -s {}
01:00.0 VGA compatible controller: NVIDIA Corporation TU107 (rev a1) (prog-if 00 [VGA controller])
Subsystem: ZOTAC International (MCO) Ltd. TU107
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f5000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 4000 [size=128]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel modules: nouveau

04:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710B] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ZOTAC International (MCO) Ltd. GK208B [GeForce GT 710]
Flags: fast devsel, IRQ 11
Memory at f3000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Memory at e8000000 (64-bit, prefetchable) [disabled] [size=128M]
Memory at f0000000 (64-bit, prefetchable) [disabled] [size=32M]
I/O ports at 2000 [disabled] [size=128]
Expansion ROM at f4000000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel modules: nouveau
9) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025987)
Posted 2 Jan 2020 by Profile Radjin Project Donor
Post:
I tried it once (very early version) and didn't really find any advantage on the small & simple set of partitions I needed. In general such things don't really come into play unless you have large disc arrays with multiple (dynamic) partitions which are not the usual case for the home user. Beware that if one gets things wrong it is possible not just to destroy the partition you were working on, but the whole array, and there is very little chance of rescuing it.


Thanks for the replies on this. I have talked to two others and only one uses LVM.
10) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025985)
Posted 2 Jan 2020 by Profile Radjin Project Donor
Post:
Using the three different methods to install the drivers I get one of two responses.

With the standard repository install:
apt-get install nvidia-driver
once I reboot the system stalls and requires a hard reboot into recovery to purge/autoremove anything Nvidia. Once rebooted everything is back to normal.

Using https://linuxusers.net/debian/how_install_debian_10_buster_with_nvidia.php on step 5 using back ports, or installing via the latest Nvidia drivers from their site the install shows no errors however does not activate the drivers.

There must be a step I am missing. Is there a command I need to run after installing the drivers?
11) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025919)
Posted 1 Jan 2020 by Profile Radjin Project Donor
Post:
Looking at the Ubuntu options? Anyone use Encryption or LVM?
12) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025911)
Posted 1 Jan 2020 by Profile Radjin Project Donor
Post:
You have rebooted the machine to use the Nvidia drivers . . . . haven't you??

If you don't reboot the machine after installing the Nvidia drivers you never load the new recompiled kernel that contains the Nvidia drivers.


Yes, although overkill, I reboot after any install/upgrade.
13) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025909)
Posted 1 Jan 2020 by Profile Radjin Project Donor
Post:
Is there a particular reason you need to use Debian 10 instead of something else? I think you’ll have a much easier time with something more conventional like Ubuntu.

But if you’re insistent to stick with Debian, I would try re-installing the Nvidia drivers from the .run or .deb files like you did before.


If I had to do it over I would install a fresh Ubuntu as there seems to be a lot more support. But moving now would mean moving my webserver and it’s database to backup, then back on to the server once installed. At some point in the future I will probably do that. I have a friend that helps now and then and I may do the above if he has time in the near future. I am not experienced enough to go much further than installs and such using the standard apt. When it gets into compiling or changing configuration files I start getting over my head.
14) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025907)
Posted 1 Jan 2020 by Profile Radjin Project Donor
Post:
I am not sure where to go on this. I installed the
 ocl-icd-libopencl1
with no apparent errors but still do not have it right. I have tried the three options above with the same results.

Any other suggestions?

HAPPY NEW YEAR!
15) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025791)
Posted 31 Dec 2019 by Profile Radjin Project Donor
Post:
I wonder how he got and processed SoG and sah tasks already tho.

It was working until I did the update and added the second GPU. Now no matter what I do it does not load. I tried the standard Debian Nvidia-Driver install and that locks the system to where I have to do a hard restart then use recovery to remove it. Then I tried an install using Nvidia-driver install, as described here: https://linuxusers.net/debian/how_install_debian_10_buster_with_nvidia.php on step 5 using back ports. That ran but was not seeing the drivers. Lastly I went back to the drivers directly from Nvidia as described above and that’s where I am now. I have tried each step above with and without the second card installed.
16) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025788)
Posted 31 Dec 2019 by Profile Radjin Project Donor
Post:
Here is the Boinc log.

===
Tue 31 Dec 2019 03:18:06 PM PST | | Starting BOINC client version 7.14.2 for x86_64-pc-linux-gnu
Tue 31 Dec 2019 03:18:06 PM PST | | log flags: file_xfer, sched_ops, task
Tue 31 Dec 2019 03:18:06 PM PST | | Libraries: libcurl/7.64.0 OpenSSL/1.1.1d zlib/1.2.11 libidn2/2.0.5 libps
l/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
Tue 31 Dec 2019 03:18:06 PM PST | | Data directory: /var/lib/boinc-client
Tue 31 Dec 2019 03:18:07 PM PST | | CUDA: NVIDIA GPU 0: GeForce GTX 1650 (driver version unknown, CUDA versi
on 10.2, compute capability 7.5, 3912MB, 3851MB available, 2984 GFLOPS peak)
Tue 31 Dec 2019 03:18:07 PM PST | | App version needs OpenCL but GPU doesn't support it
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Application uses missing NVIDIA GPU
Tue 31 Dec 2019 03:18:07 PM PST | | App version needs OpenCL but GPU doesn't support it
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Application uses missing NVIDIA GPU
Tue 31 Dec 2019 03:18:07 PM PST | | App version needs OpenCL but GPU doesn't support it
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Application uses missing NVIDIA GPU
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.10543.2112.9.36.254.vlar_
0
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task blc56_2bit_guppi_58692_58180_HIP21
489_0021.22559.409.21.44.208.vlar_0
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 13se08ab.7651.8661.14.41.174_2
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.5186.12746.6.33.76.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task blc56_2bit_guppi_58692_60738_HIP23
083_0029.1830.409.21.44.18.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.5169.17245.5.32.51_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.5186.18063.6.33.3.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.5169.19699.5.32.49_0
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task blc56_2bit_guppi_58692_58497_HIP21
594_0022.11036.409.21.44.128.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.17700.9474.14.41.81_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task blc56_2bit_guppi_58692_61070_HIP23
512_0030.13547.818.21.44.14.vlar_1
Tue 31 Dec 2019 03:18:07 PM PST | SETI@home | Missing coprocessor for task 26dc19aa.3075.3748.16.43.253_1
===
My system does not process CUDA?
17) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025766)
Posted 31 Dec 2019 by Profile Radjin Project Donor
Post:
Thanks for that info. At least I know what to look for.
18) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025756)
Posted 31 Dec 2019 by Profile Radjin Project Donor
Post:
There's no drivers being listed for that rig so I'd look further into that. ;-)

Cheers.


I saw drivers being used in the output from the first post. That does not mean they are installed?

The same question when it is showing in computers on the boinc site?

In the first post I described running the installer from nvidia; I didn’t get any errors. Is there another step I missed?
19) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025742)
Posted 31 Dec 2019 by Profile Radjin Project Donor
Post:
First 30 lines of the start up log from Bonic manager.

Did you install a cc_config.xml to use all gpus? Otherwise only the most capable GPU will be used.



What is the name of the start up log? Client state? I didn’t see a file named start up in the Boinc-client directory.
20) Message boards : Number crunching : Two Nvidia cards, one showing neither being used (Message 2025741)
Posted 31 Dec 2019 by Profile Radjin Project Donor
Post:
<cc_config>                                                                                                  
  <log_flags>                                                                                                
    <task>1</task>                                                                                           
    <file_xfer>1</file_xfer>                                                                                 
    <sched_ops>1</sched_ops>                                                                                 
  </log_flags>                                                                                                                                                                                             
    <options>                                                                                                
        <use_all_gpus>1</use_all_gpus>                                                                       
    </options>                                                                                               
</cc_config>               


Might be a better option



Updated and rebooted again. Thanks for that.


Next 20


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.