BOINC does not identify GPUs correctly

Message boards : Number crunching : BOINC does not identify GPUs correctly
Message board moderation

To post messages, you must log in.

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4071
Credit: 85,281,665
RAC: 126
Finland
Message 1816178 - Posted: 10 Sep 2016, 18:26:38 UTC
Last modified: 10 Sep 2016, 18:30:20 UTC

Boinc identifies GPU0 and GPU1 incorrectly.

Here's the story what I see: I used to have on one of my hosts two Nvidia GPUs a GTX970 (GPU0) and a GTX650 Ti (GPU1). Boinc, GPU-Z and Nvidia Inspector agreed which one was GPU0 and which GPU1. Then this week I bought a new GTX970 which is identical to the old GTX970 (Asus Strix-GTX970-DC2OC-4GD5) and replaced the GTX 650 Ti with the new GTX970. Now Boinc disagrees with GPU-Z and Nvidia Inspector about which one is GPU0 and which is GPU1.

Below is a link to a picture of a situation when one GPU is starting a new task. Boinc shows that GPU1 (d1) is starting the new task but Nvidia Inspector shows that it is GPU0 which has the load at 0%.

Does anybody else see similar situation on a host that has two identical GPUs?

https://1drv.ms/i/s!AsrPYtj_-FTpgZRD_5gO3nNmY_e3dQ
ID: 1816178 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1816229 - Posted: 10 Sep 2016, 22:25:44 UTC - in response to Message 1816178.  

Hi Harri,

General gist is that with nv-Cuda-OpenCL devices, the only guaranteed method for consistent enumeration is by PCI bus + slot number, and order within Cuda or OpenCL may change depending on driver. With newest drivers on Windows nVidia-smi is available, I would check the order in there. Boinc might be using either the old nvapi interface (as used to be the only one on WIndows), or the new nvml interface (which is what nvidia-smi is based on). It's mostly a case of that the Cuda device number really means nothing, and that anything less than full PCIe location is a fragile shortcut.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1816229 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4071
Credit: 85,281,665
RAC: 126
Finland
Message 1816339 - Posted: 11 Sep 2016, 10:09:31 UTC - in response to Message 1816229.  

I am not using the latest driver but version 361.91 (same as when GTX650 Ti was in use). I reinstalled it though after changing the GPU. Anyway it is good to know if you have to debug a system that what Boinc shows you may not be the whole truth.
ID: 1816339 · Report as offensive
Profile Rune Bjørge

Send message
Joined: 5 Feb 00
Posts: 45
Credit: 30,508,204
RAC: 5
Norway
Message 1816480 - Posted: 11 Sep 2016, 18:58:29 UTC - in response to Message 1816339.  

I have the same behaviour on my host, running 3 gpu. As all of them are titans, the Device id does not match gpuz.

I have one EVGA titan With default clock a little higher than the two other Cards. To me it seems like the EVGA always gets Device 0 when i start the system. The two other Cards seems to be assigned at random when Boinc starts.
ID: 1816480 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1816596 - Posted: 12 Sep 2016, 5:13:31 UTC - in response to Message 1816480.  
Last modified: 12 Sep 2016, 5:13:54 UTC

I have the same behaviour on my host, running 3 gpu. As all of them are titans, the Device id does not match gpuz.

I have one EVGA titan With default clock a little higher than the two other Cards. To me it seems like the EVGA always gets Device 0 when i start the system. The two other Cards seems to be assigned at random when Boinc starts.


mine does a simular thing . I think you will find that the fastest GPU will always have slot zero .

I had a 650 and a 680 and there where in the wrong slots

650 in slot 0
680 in slot 1

but seti said the 680 was slot 0

I then upgraded and get a 970 and put that in slot 1 and the 680 in slot 0

seti said slot 0 is the 970 when it's in slot 1

I then changed the slots and seti still said slot 0 was the 970 witch it now was .

So I think what ever the client thinks is the fastest gets slot 0 weather it's in slot 0 or slot 1 and doesn't seem to matter weather it's the latest or a card that's a bit older
ID: 1816596 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4071
Credit: 85,281,665
RAC: 126
Finland
Message 1816604 - Posted: 12 Sep 2016, 6:25:52 UTC

I have similar situation. My GPUs are factory overclocked. The newer GTX970 has a bit higher clock than the older and Boinc sees the faster as GPU0.
ID: 1816604 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1816605 - Posted: 12 Sep 2016, 6:32:50 UTC - in response to Message 1816604.  

I have similar situation. My GPUs are factory overclocked. The newer GTX970 has a bit higher clock than the older and Boinc sees the faster as GPU0.


I wouldn't worry about it , if it does bother you just switch the cards over to the right slot .

I haven't had any errors so I don't worry .
ID: 1816605 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4071
Credit: 85,281,665
RAC: 126
Finland
Message 1816606 - Posted: 12 Sep 2016, 6:36:04 UTC - in response to Message 1816605.  

I won't worry. I just have to remember it for future reference. :)
ID: 1816606 · Report as offensive

Message boards : Number crunching : BOINC does not identify GPUs correctly


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.