No Usable GPU in Linux

Message boards : Number crunching : No Usable GPU in Linux
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1739992 - Posted: 5 Nov 2015, 23:57:52 UTC

I've modified app_info.xml on the dual-760's machine
to read along the lines you suggested, but naming
what apps and shared libraries I have right now.

I have no idea if the result is correct or useful.

<app_info>
<app>
<name>setiathome_v7</name>
</app>
<file_info>
<name>MBv7_7.05r2549_sse42_linux64</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_v7</app_name>
<version_num>705</version_num>
<cmdline></cmdline>
<file_ref>
<file_name>MBv7_7.05r2549_sse42_linux64</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>astropulse_v7</name>
</app>
<file_info>
<name>ap_7.05r2728_sse3_linux64</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class></plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>ap_7.05r2728_sse3_linux64</file_name>
<main_program/>
</file_ref>
</app_version>


<file_info>
<name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</name>
<executable/>
</file_info>

<file_info>
<name>libcudart.so.3</name>
<executable/>
</file_info>

<file_info>
<name>libcufft.so.3</name>
<executable/>
</file_info>

<app_version>
<app_name>setiathome_v7</app_name>
<version_num>705</version_num>
<plan_class>cuda32</plan_class>
<avg_ncpus>0.1</avg_ncpus>
<max_ncpus>0.1</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.3</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.3</file_name>
</file_ref>
</app_version>



</app_info>
ID: 1739992 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1740003 - Posted: 6 Nov 2015, 0:36:13 UTC - in response to Message 1739992.  

I've modified app_info.xml on the dual-760's machine
to read along the lines you suggested, but naming
what apps and shared libraries I have right now.

I have no idea if the result is correct or useful.

<app_info>
<app>
<name>setiathome_v7</name>
</app>
<file_info>
<name>MBv7_7.05r2549_sse42_linux64</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_v7</app_name>
<version_num>705</version_num>
<cmdline></cmdline>
<file_ref>
<file_name>MBv7_7.05r2549_sse42_linux64</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>astropulse_v7</name>
</app>
<file_info>
<name>ap_7.05r2728_sse3_linux64</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class></plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>ap_7.05r2728_sse3_linux64</file_name>
<main_program/>
</file_ref>
</app_version>

<file_info>
<name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</name>
<executable/>
</file_info>

<file_info>
<name>libcudart.so.3</name>
<executable/>
</file_info>

<file_info>
<name>libcufft.so.3</name>
<executable/>
</file_info>

<app_version>
<app_name>setiathome_v7</app_name>
<version_num>705</version_num>
<plan_class>cuda32</plan_class>
<avg_ncpus>0.1</avg_ncpus>
<max_ncpus>0.1</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.3</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.3</file_name>
</file_ref>
</app_version>

</app_info>

I think the problem is you have the AstroPulse App in between the Setiathome Apps. I like to make complete sections, it's less confusing to me. If you moved the AP App to below the Setiathome App it would probably work. The Newer CUDA 60 App will probably work better on your 760s. In that case you would just Remove the CUDA 32 section and Replace it with the section I posted. Of course you would need to add the three CUDA 60 files to the setiathome.berkeley.edu folder with the app_info.xml file. The 9500 would work better with the CUDA 32 App.

<app_info>
  <app>
   <name>setiathome_v7</name>
  </app>
   <file_info>
     <name>MBv7_7.05r2549_sse42_linux64</name>
     <executable/>
   </file_info>
  <app_version>
   <app_name>setiathome_v7</app_name>
   <version_num>705</version_num>
   <file_ref>
     <file_name>MBv7_7.05r2549_sse42_linux64</file_name>
     <main_program/>
   </file_ref>
  </app_version>
 <app>
   <name>astropulse_v7</name>
 </app>
  <file_info>
   <name>ap_7.05r2728_sse3_linux64</name>
   <executable/>
  </file_info>
  <app_version>
  <app_name>astropulse_v7</app_name>
  <version_num>705</version_num>
  <platform>x86_64-pc-linux-gnu</platform>
  <file_ref>
   <file_name>ap_7.05r2728_sse3_linux64</file_name>
   <main_program/>
  </file_ref>
 </app_version>

   <app>
      <name>setiathome_v7</name>
   </app>
 	<file_info>
 	   <name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda60</name>
	   <executable/>
 	</file_info>
	<file_info>
	   <name>libcudart.so.6.0</name>
	   <executable/>
	</file_info>
	<file_info>
	   <name>libcufft.so.6.0</name>
	   <executable/>
	</file_info>
       <app_version>
	  <app_name>setiathome_v7</app_name>
	    <version_num>704</version_num>
	    <plan_class>cuda60</plan_class>
	    <avg_ncpus>0.1</avg_ncpus>
	    <max_ncpus>0.1</max_ncpus>
	  <coproc>
	    <type>CUDA</type>
	    <count>1</count>
	   </coproc>
	  <file_ref>
	    <file_name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda60</file_name>
	    <main_program/>
	  </file_ref>
	  <file_ref>
	    <file_name>libcudart.so.6.0</file_name>
	  </file_ref>
	  <file_ref>
	    <file_name>libcufft.so.6.0</file_name>
	  </file_ref>
   </app_version>
</app_info>
ID: 1740003 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1740059 - Posted: 6 Nov 2015, 4:07:56 UTC - in response to Message 1739946.  
Last modified: 6 Nov 2015, 4:10:27 UTC

It would be Nice if the mbcuda.cfg settings worked in Linux...and OSX.


Agreed. It would be very useful. :-)


Requests noted, will have to work out how Linux renicing works, and put generic code in place for parsing the cfg file. ;)

On the original issue, my Linux dev machine is ubuntu 14.04 LTS and last I checked up to date. Every time a kernel update or driver install occurred, the X display would break and go to black screen on boot. (usual gymnastics of entering a text console and reinstalling the display driver would routinely follow)

Recently I worked out the reason for the driver pre install script failure message was that Ubuntu's dkms kernel management wasn't installed (odd), so have installed that about a week ago, and am hoping both kernel and display driver updates go more smoothly next time.

sudo apt-get install dkms
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1740059 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1740174 - Posted: 6 Nov 2015, 15:41:15 UTC

Some recent experiences that might help. It has become clear there is a problem when using the nVidia driver .run package to uninstall the driver. The best I can tell is, if you don't install another driver before rebooting you will probably be stuck in the Login Loop from Hell. Some posts about it, http://askubuntu.com/search?q=nvidia+login+loop. I've tried just about every suggestion at Ask.Ubuntu and the only solution I've found is to drop into the console and install another driver. There is also problems with this as some cards will not even display the console in this situation. The only solution to that is to remove the card and use a different card until you can get a driver installed. This reminds me Very much of the problem I was having in Windows 8/8.1 when having the nVidia driver do a Clean install. Basically the System went into lockdown as soon as the old driver was removed. Some say the Ubuntu problem is caused by not writing the Linux Headers back the way they should be written, all I know is it's nasty.

During a recent 'bout I found it was possible to have a situation where there were both a repository and a proprietary driver installed at the same time. According to the proprietary driver, it was 'backing-up' the repository driver while installing but the repository driver was left installed. In my case I had files for libnvidia-opencl.so.304 & libnvidia-opencl.so.343 installed side by side. BOINC refused to see OpenCL until one was removed, in fact, both were removed as there is a problem with the 343 clBuild program, ie, it doesn't work.

Anyway, for those that have been trying all sorts of remedies trying to get BOINC to see OpenCL, you might want to try a Clean install. It is possible to FUBAR a system to where it is not worth saving. I did get this system going again, but I've been stuck in the Login Loop before and just reinstalled everything, even the Home folder where the Login Loop appears to originate.

Have Fun.
ID: 1740174 · Report as offensive
Baiteh

Send message
Joined: 10 Sep 15
Posts: 34
Credit: 7,705,483
RAC: 0
United Kingdom
Message 1740674 - Posted: 8 Nov 2015, 23:17:32 UTC

Installed the toolkit - blimey! Now I get how you guy's get the RAC with 3 or 4 980Ti rigs, lol!
ID: 1740674 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1741370 - Posted: 11 Nov 2015, 17:47:40 UTC

Was running with nvidia driver installed by sgfxi.
A 14.04 kernel update resulted in absurd
window/text sizes (very very large)
Apparently noveau driver?
Uh Oh.

Did:
cntrl-alt F2
(to stop X)
(login to text window)
sudo su -
service lightdm stop
sudo apt-get remove `dpkg-query --show '*nvidia*'`
apt-get remove dkms
sgfxi
(and now back to 355 nvidia driver,
and on a reboot to test that this survives reboot,
all is well with boinc on restarting boinc-client).
Still not really getting Seti GPU tasks much if at all,
but oh well.
ID: 1741370 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1741395 - Posted: 11 Nov 2015, 19:42:22 UTC - in response to Message 1741370.  
Last modified: 11 Nov 2015, 19:47:49 UTC

For the Host with the two 760's you could just rename the app_info.xml to app_info1.xml and see if it will receive work running as Stock. It says it has OpenCL and you don't have any existing tasks.

If the one with the 750 is the one that threw the Error with the CUDA Toolkit I'd reformat and install a fresh copy. It Still doesn't list OpenCL and tracking down the problem would take longer than just reinstalling the OS and installing the Toolkit. If you want to try it with just CUDA I'll post my app_info with your CPU App listed, I'll leave the GPU AstroPulse App listed just in case you get OpenCL working. If you don't get OpenCL working it would be best to remove the GPU AstroPulse section or you will get Errors without OpenCL.
The AP App is here;
http://boinc2.ssl.berkeley.edu/beta/download/astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100
http://boinc2.ssl.berkeley.edu/beta/download/AstroPulse_Kernels_r2751.cl
http://boinc2.ssl.berkeley.edu/beta/download/ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt

<app_info>
   <app>
      <name>astropulse_v7</name>
   </app>
      <file_info>
         <name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name>
         <executable/>
      </file_info>
      <file_info>
         <name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name>
      </file_info>
    <app_version>
      <app_name>astropulse_v7</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>708</version_num>
      <plan_class>opencl_nvidia_linux</plan_class>
       <coproc>
        <type>NVIDIA</type>
        <count>1</count>
       </coproc>
      <avg_ncpus>0.1</avg_ncpus>
      <max_ncpus>0.1</max_ncpus>
      <file_ref>
        <file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name>
        <main_program/>
      </file_ref>
      <file_ref>
        <file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name>
        <open_name>ap_cmdline.txt</open_name>
      </file_ref>
   </app_version>
	<app>
	   <name>setiathome_v7</name>
	</app>
	<file_info>
	   <name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda60</name>
	   <executable/>
	</file_info>
	<file_info>
	   <name>libcudart.so.6.0</name>
	   <executable/>
	</file_info>
	<file_info>
	   <name>libcufft.so.6.0</name>
	   <executable/>
	</file_info>
	<file_info>
	   <name>mbcuda.cfg</name>
	</file_info>
	<app_version>
	  <app_name>setiathome_v7</app_name>
	    <version_num>704</version_num>
	    <plan_class>cuda60</plan_class>
	    <avg_ncpus>0.1</avg_ncpus>
	    <max_ncpus>0.1</max_ncpus>
	  <coproc>
	    <type>CUDA</type>
	    <count>1</count>
	   </coproc>
	  <file_ref>
	    <file_name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda60</file_name>
	    <main_program/>
	  </file_ref>
	  <file_ref>
	    <file_name>libcudart.so.6.0</file_name>
	  </file_ref>
	  <file_ref>
	    <file_name>libcufft.so.6.0</file_name>
	  </file_ref>
          <file_ref>
            <file_name>mbcuda.cfg</file_name>
          </file_ref>
	</app_version>
  <app>
      <name>astropulse_v7</name>
  </app>
    <file_info>
        <name>ap_7.05r2728_sse3_linux64</name>
        <executable/>
    </file_info>
    <app_version>
        <app_name>astropulse_v7</app_name>
        <version_num>705</version_num>
        <platform>x86_64-pc-linux-gnu</platform>
        <file_ref>
            <file_name>ap_7.05r2728_sse3_linux64</file_name>
            <main_program/>
        </file_ref>
    </app_version>
  <app>
     <name>setiathome_v7</name>
  </app>
    <file_info>
      <name>MBv7_7.05r2549_sse42_linux64</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v7</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>705</version_num>
      <file_ref>
        <file_name>MBv7_7.05r2549_sse42_linux64</file_name>
        <main_program/>
      </file_ref>
    </app_version>
</app_info>

That's the same app_info.xml I'm running. I stole the mbcuda.cfg file from Windows, it doesn't work in Linux though.

Hmmm, Now the 750 says it has OpenCL, http://setiathome.berkeley.edu/show_host_detail.php?hostid=7748035
So, how did you get it working?
ID: 1741395 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1741473 - Posted: 12 Nov 2015, 2:57:35 UTC

Been focusing on the host with one 750.
Event log shows Seti does not see the GPU
or reqeust GPU tasks
though Einstein and boinc do see the GPU.
ID: 1741473 · Report as offensive
Previous · 1 · 2 · 3

Message boards : Number crunching : No Usable GPU in Linux


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.