NVIDIA GPU blues (750Ti and 250)

Message boards : Number crunching : NVIDIA GPU blues (750Ti and 250)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 78
Credit: 18,032,718
RAC: 38
United States
Message 1735911 - Posted: 21 Oct 2015, 11:29:17 UTC
Last modified: 21 Oct 2015, 12:12:49 UTC

Finally upgraded my video cards, going from a single NVIDIA Geforce 250 to two faster cards (Geforce 750 Ti)

First problem: Ubuntu 12.04 kernel PANIC when both cards are installed

I temporarily addressed this by removing the second (identical) card.

lspci | grep VGA shows this hardware:

02:00.0 VGA compatible controller: NVIDIA Corporation Device 1380 (rev a2)


Second problem: No GPU workloads are running. Here's the log for Boinc/SETI

Mon 19 Oct 2015 09:22:07 PM EDT | | Starting BOINC client version 7.2.33 for x86_64-pc-linux-gnu
Mon 19 Oct 2015 09:22:07 PM EDT | | log flags: file_xfer, sched_ops, task
Mon 19 Oct 2015 09:22:07 PM EDT | | Libraries: libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
Mon 19 Oct 2015 09:22:07 PM EDT | | Data directory: /var/lib/boinc-client
Mon 19 Oct 2015 09:22:07 PM EDT | | CUDA: NVIDIA GPU 0: GeForce GTX 750 Ti (driver version unknown, CUDA version 6.5, compute capability 5.0, 2047MB, 1871MB available, 2409 GFLOPS peak)
Mon 19 Oct 2015 09:22:07 PM EDT | | App version needs OpenCL but GPU doesn't support it
Mon 19 Oct 2015 09:22:07 PM EDT | Milkyway@Home | Application uses missing NVIDIA GPU
Mon 19 Oct 2015 09:22:07 PM EDT | | App version needs OpenCL but GPU doesn't support it

Suggestions are welcome. I've seen other suggestions that say my updated drivers (340.
ID: 1735911 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1735914 - Posted: 21 Oct 2015, 11:44:15 UTC - in response to Message 1735911.  

The GTS 250 uses a different driver than the GTX 750, they won't work together, I've tried it myself.
Which driver are you using, the one from additional drivers or a manually installed one? Usually you can fix the OpenCL problem by placing a link to libOpenCL.so in /usr/lib depending on which driver you installed.
ID: 1735914 · Report as offensive
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 78
Credit: 18,032,718
RAC: 38
United States
Message 1735915 - Posted: 21 Oct 2015, 11:56:02 UTC - in response to Message 1735914.  

Yes, I grabbed the 340.93 driver from the NVidia site (64-bit Linux).

I recently read something about drivers up to 341.<something> not working.

So I grabbed a 352.55 driver to try next, and will try the link to libOpenCL.so

The GTS 250 had SLI support, and the 750 has something new called GSYNC.
Hopefully when I resolve the Kernel PANIC issue, I can get the second card going.
ID: 1735915 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1735917 - Posted: 21 Oct 2015, 12:12:29 UTC - in response to Message 1735915.  

I've had good success with this one, http://www.nvidia.com/download/driverResults.aspx/83686/en-us with Ubuntu 14.04. I'm not sure how it will work with the stock setiathome MBv7 App, but it works great on the APs and CUDA 6.0 App.
After dropping into the console and stopping lightdm, you might want to run
sudo apt-get remove --purge nvidia-*
or something similar to remove the existing driver Before installing the new one.
With that driver on my machine OpenCl worked, but I had to make links to libcuda.so in usr/lib to get cuda to work.
ID: 1735917 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22189
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1735918 - Posted: 21 Oct 2015, 12:16:11 UTC

SLI & GSYNC are ways of linking graphics cards - very useful in video processing, but totally unnecessary for SETI@Home.
As TBar says it is almost certain that you will never be able to get the '250 and '750s to work in the same machine - they are so different. Indeed I would say that the only use for the '250 is heating the room, they are very very power hungry compared to the much faster '750 - Expirence says that a '250 is, in reallity, about 10% as fast as a 750.
To get the '750 working you will need to have OpenCL running, TBar has already described that you will need to set a link to the appropriate library.

(Not this is OpenCL, and not OpenGL, they are very different beasts)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1735918 · Report as offensive
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 78
Credit: 18,032,718
RAC: 38
United States
Message 1735936 - Posted: 21 Oct 2015, 13:11:35 UTC - in response to Message 1735915.  

Thanks for the tips.

I've created the CUDA link from /usr/lib/libcuda.so to the appropriate spot.

/usr/lib/libOpenCL.so is another matter, it runs through /etc/alternatives, but ends up being a symlink to nothing (there seems to be no 64 bit OpenCL shared object on my system.)

The "Additional Drivers" window in Linux shows nvidia_340, still, after I've upgraded. So I'll try purging, the (re)installing the latest NVIDIA driver

Just want to confirm, OpenCL for Astropulse is what I need.
ID: 1735936 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1735940 - Posted: 21 Oct 2015, 13:31:25 UTC - in response to Message 1735936.  
Last modified: 21 Oct 2015, 14:08:19 UTC

Thanks for the tips.

I've created the CUDA link from /usr/lib/libcuda.so to the appropriate spot.

/usr/lib/libOpenCL.so is another matter, it runs through /etc/alternatives, but ends up being a symlink to nothing (there seems to be no 64 bit OpenCL shared object on my system.)

The "Additional Drivers" window in Linux shows nvidia_340, still, after I've upgraded. So I'll try purging, the (re)installing the latest NVIDIA driver

Just want to confirm, OpenCL for Astropulse is what I need.

Yes, Astropulse AND the New SetiatHome MBv7 App use OpenCL. You have to manually install the Linux cuda App. Additional Drivers just shows which drivers are available in the repository for Your installed version of Linux, it is different for each version of OS. It will Always show the same drivers available for your OS no matter what driver you have installed. If you have manually installed a driver from nVidia, Additional drivers should say something similar to 'continue using manually installed driver'. Which driver did you install? Oops, I forgot clinfo is a AMD thing. Forget about clinfo with nvidia, you'll just have to go with what BOINC says.

On my machine I have the manually installed driver 346.59, libnvidia-opencl.so.346.59 is in usr/lib/x86_64-linux-gnu, and I didn't have to make any links to get OpenCL to work. It's sometimes different depending on your OS & driver. If you have libnvidia-opencl.so.xxx and it still doesn't see OpenCL try making a link to it in usr/lib and naming it libOpenCL.so & libOpenCL.so.1. See how that works.
ID: 1735940 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1736136 - Posted: 22 Oct 2015, 5:21:23 UTC

The 250 is a waste of power anyways grab another 750 TI ...gets you around 22k rac a lil more if you oc the cards
I came down with a bad case of i don't give a crap
ID: 1736136 · Report as offensive
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 78
Credit: 18,032,718
RAC: 38
United States
Message 1736279 - Posted: 22 Oct 2015, 21:49:45 UTC - in response to Message 1735940.  
Last modified: 22 Oct 2015, 21:51:57 UTC

Despite making the links, something still wasn't right with OpenCL.

So I resolved the problem by purging and re-installing the Nvidia package.


  • download file from NVidia web site
  • shutting down to single user mode
  • purging Nvidia packages,
  • installing the driver




Thanks again for the advice. Now on to fix the kernel PANIC issue I have with two 750 Ti cards.


ID: 1736279 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1736307 - Posted: 23 Oct 2015, 1:03:41 UTC - in response to Message 1736279.  
Last modified: 23 Oct 2015, 1:30:57 UTC

It's good to hear you got 352 working. The fallback that seems to work for everyone is to install the nVidia Toolkit 7.5, it also installs driver 352. I just installed a fresh copy of Ubuntu 15.10 and it also has driver 352 in the Additional Drivers, I believe Ubuntu 14.04 and 15.04 have driver 346. I gave up on the 352 in additional drivers after failing to get BOINC to see OpenCL. In fact, I had to give up on BOINC as well, at least every BOINC up to 7.4.22. The only copy that would work with Ubuntu 15.10 is BOINC 7.4.22 and it has a bug where the last task started doesn't update the progress. Something about;
tbar@TBarsIntel:~/BOINC$ ./boincmgr
Fatal Error: Mismatch between the program and library build versions detected.
The library used 2.8 (no debug,Unicode,compiler with C++ ABI 1009,wx containers,compatible with 2.6),
and your program used 2.8 (no debug,Unicode,compiler with C++ ABI 1002,wx containers,compatible with 2.6).
Aborted (core dumped)
tbar@TBarsIntel:~/BOINC$ cd '/home/tbar'
tbar@TBarsIntel:~$ ./boinc_7.4.22_x86_64-pc-linux-gnu.sh
use /home/tbar/BOINC/run_manager to start BOINC
tbar@TBarsIntel:~$ cd '/home/tbar/BOINC'
tbar@TBarsIntel:~/BOINC$ ./boincmgr
./boincmgr: error while loading shared libraries: libwebkitgtk-1.0.so.0: cannot open shared object file: No such file or directory

I pasted libwebkitgtk into the Package Manager's Filter box, installed libwebkitgtk-1.0-0, and then BOINC worked....mostly.

The first thing I saw was NO USABLE GPU FOUND. Great. I installed nVidia-Modprobe and that gave me CUDA, but Nothing would get BOINC to see OpenCL using driver 352.41 from the repository. So, I installed driver 346.59 that I had downloaded from nVidia. That gave me OpenCL but Not CUDA. I installed Modprobe again, since it went away with the purge, but still No CUDA. So, I made a link to libcuda.so.346.59, moved it to usr/lib, named it libcuda.so, and that gave me CUDA.
Success!

So far, it looks to be working about the same as Ubuntu 14.04.3, except the last task started doesn't update...oh well.
ID: 1736307 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1736379 - Posted: 23 Oct 2015, 13:05:07 UTC - in response to Message 1736279.  
Last modified: 23 Oct 2015, 13:36:36 UTC

Despite making the links, something still wasn't right with OpenCL.

So I resolved the problem by purging and re-installing the Nvidia package.

  • download file from NVidia web site
  • shutting down to single user mode
  • purging Nvidia packages,
  • installing the driver


Thanks again for the advice. Now on to fix the kernel PANIC issue I have with two 750 Ti cards.


OK, I see some completed tasks now. It appears you are using the 340 driver from the Ubuntu 12.04.x repository and getting many OpenCL detection Errors, http://setiathome.berkeley.edu/result.php?resultid=4455996022. With my 750Ti I was getting Kernel Panics with my Mac in anything other than Yosemite. Even then I had to install the nVidia WebDriver 346.xx to get it to work in Yosemite. So, I would suggest upgrading to at least Ubuntu 14.04.x. I'm not going to suggest 15.10 because the version of BOINC at Berkeley doesn't work correctly with Ubuntu 15.10. I'm either going to have to relearn how to compile the BOINC Manager or go back to Ubuntu 14.04.3 which I still have on another partition.

I would also suggest downloading nVidia driver 346.59 I linked to previously since it gives you OpenCL without any mods and runs APs extremely well with the 750Ti. The way I install a driver is the same with nVidia or AMD.
1) download the driver, unzip it, move the driver part to your home folder and set the execute bit.
2) hit ctrl+alt+F1 to drop into the console and log in
3) enter sudo stop lightdm to stop the xScreen, with 15.xx it's sudo service lightdm stop
4) purge the existing driver with something like sudo apt-get purge "nvidia.*"
5) enter dir to print the driver name then install the driver with sudo ./whatever
6) follow the instructions and build for Ubuntu

Hopefully that will stop the Panics and the Errors.
Oh, if you use Dual cards in 14.04.x you will probably have this problem, https://bugs.launchpad.net/ubuntu/+source/ubuntu-drivers-common/+bug/1310489
You can solve that by commenting out the lines in the GPU_manager as explained in the thread,
a) Edit /etc/init/gpu-manager.conf commenting out lines until it looks like this:

#start on (starting lightdm
# or starting kdm
# or starting xdm
# or starting lxdm)
task
exec gpu-manager --log /var/log/gpu-manager.log

Good luck...
ID: 1736379 · Report as offensive
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 78
Credit: 18,032,718
RAC: 38
United States
Message 1736811 - Posted: 24 Oct 2015, 23:28:00 UTC - in response to Message 1736379.  
Last modified: 24 Oct 2015, 23:49:20 UTC

Things are better now - I adjusted the Memory Low Gap setting on my BIOS (setting it to 3).

Now I'm crunching happily and correctly on GPU 0 and GPU 1, both GTX 750 with no OpenCL detection problems.

HTTP connectivity to the S@home servers has been spotty today,

Along the way, I:
- updated Ubuntu to 12.04.5
- crashed my box running the nvidia-settings utility from the "upgrade"
- reinstalled my driver (NVIDIA-Linux-x86_64.340.93.run from NVIDIA's site)
- ran apt-get purge nvidia-*, reinstalled the 3.40 driver.

I'll experiment with more recent drivers now that the both cards are installed.

I no longer see the OpenCL detection problems you mentioned earlier -
http://setiathome.berkeley.edu/result.php?resultid=4465917759 shows one of the last two.
ID: 1736811 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1736960 - Posted: 25 Oct 2015, 16:16:39 UTC - in response to Message 1736811.  

It appears you are still receiving an OpenCL error;
setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah: /usr/lib/x86_64-linux-gnu/libOpenCL.so.1: no version information available...
http://setiathome.berkeley.edu/result.php?resultid=4471342927
You are also getting Many 'Validation inconclusives'. This is what we saw at Beta when using Drivers older than 350.xx with setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah. Most of the tasks were labeled inconclusive with a few eventually being Invalid. The OpenCL version Error is probably also caused by the driver. You're probably going to have to update the driver to around 352.xx to solve the inconclusives, which will probably solve the version error as well. You could try the 346.59 driver but I'm beginning to doubt it will be any better than the other pre-350 drivers with that App.
ID: 1736960 · Report as offensive
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 78
Credit: 18,032,718
RAC: 38
United States
Message 1737720 - Posted: 28 Oct 2015, 11:33:38 UTC - in response to Message 1736960.  

I added a GTX 950, though I'm not currently seeing the peak GFLOPS of that card reflected in time to do GPU workunits.

OpenCL: GPU 0: GeForce GTX 950 (driver 352.55, device OpenCL 1.2 CUDA, 2047MB, 1790MB available, 3208 GFLOPS peak)
OpenCL: GPU 1: GeForce GTX 750 Ti (driver 352.55, device OpenCL 1.2 CUDA, 2048MB, 2011MB available, 2409 GFLOPS peak)
OpenCL: GPU 2: GeForce GTX 750 Ti (driver 352.55, device OpenCL 1.2 CUDA, 2048MB, 2011MB available, 2409 GFLOPS peak)

At any rate, I updated the driver from 340.X to 352.55 as well. I'll play around to see if I can get CUDA workunits to process, but I think I'm ready to try an optimized client now.
ID: 1737720 · Report as offensive

Message boards : Number crunching : NVIDIA GPU blues (750Ti and 250)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.