Trouble with GPU freezing machine

Message boards : Number crunching : Trouble with GPU freezing machine
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1789740 - Posted: 23 May 2016, 9:30:34 UTC

I noticed that BOINC downloaded and starting running GPU tasks on my NVIDIA GTX460. a quick check shows three running crunching programs:
setiathome_8.00_x86_64-pc-linux-gnu
setiathome_8.05_i686-pc-linux-gnu
setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG --device 0

It goes ok for awhile then my PC freezes and i have to power cycle it.
NVIDIA driver is 352.63
OS Linux OpenSuse 13.1 running x86_64 on a 4.1.2 kernel, quad core i7 with 8G of dram
The pc is dedicated to Seti and some weather station SW (low impact)
Whats the story on nvidia GPU computing under linux (with latest seti) is it finalized now or still working out the bugs?
Should i update the nvidia driver?
Any ideas as to how i can prevent this freeze up? Its happened twice now so i have GPU computing suspended
ID: 1789740 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1789754 - Posted: 23 May 2016, 11:50:37 UTC - in response to Message 1789740.  

I noticed that BOINC downloaded and starting running GPU tasks on my NVIDIA GTX460. a quick check shows three running crunching programs:
setiathome_8.00_x86_64-pc-linux-gnu
setiathome_8.05_i686-pc-linux-gnu
setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG --device 0

It goes ok for awhile then my PC freezes and i have to power cycle it.
NVIDIA driver is 352.63
OS Linux OpenSuse 13.1 running x86_64 on a 4.1.2 kernel, quad core i7 with 8G of dram
The pc is dedicated to Seti and some weather station SW (low impact)
Whats the story on nvidia GPU computing under linux (with latest seti) is it finalized now or still working out the bugs?
Should i update the nvidia driver?
Any ideas as to how i can prevent this freeze up? Its happened twice now so i have GPU computing suspended

Hi,
have you checked for dust hindering proper cooling.
How is your PSU ? Still good/healthy enough to feed a GTX 460 ?

There are a few updates that you could try :
-check if the "freezes" leave any log-entries behind to find out what goes wrong, otherwise you can't prevent that !
-allow running the cuda60 app for linux
-update NVIDIA driver is 352.63 -> 352.xx
-openSUSE 13.1 mainline kernel is 3.12.57 currently, try a downgrade (if your other software will allow that), because 4.1.2 is rather old.

Did you get any error messages in reported results on the freezes ? (Your hosts are hidden, so have to ask)

From applications page :
Linux/x86_64 8.10 (opencl_nvidia_sah) 18 May 2016, 1:10:51 UTC 940 GigaFLOPS
Linux/x86_64 8.10 (opencl_nvidia_SoG) 18 May 2016, 1:10:51 UTC 694 GigaFLOPS

_\|/_
U r s
ID: 1789754 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1789781 - Posted: 23 May 2016, 14:54:58 UTC - in response to Message 1789754.  

What does it mean? "-allow running the cuda60 app for linux" How do i do that?
I'll try updating the driver today to latest nvidia version and if that doesnt fix it I'll downgrade to 3.11 kernel and see what happens
ID: 1789781 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1789785 - Posted: 23 May 2016, 15:17:06 UTC - in response to Message 1789781.  

What does it mean? "-allow running the cuda60 app for linux" How do i do that?
If you're using only "stock" apps, then you should get cuda apps automatically through BOINC. That you have not till now could have two reasons :
-bad luck of draw.
-you have somehow excluded cuda apps by some config setting.
I'll try updating the driver today to latest nvidia version and if that doesnt fix it I'll downgrade to 3.11 kernel and see what happens

Or you could also try to upgrade to 4.1.25, just try some other kernel version to exclude that factor for freezes.
_\|/_
U r s
ID: 1789785 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1789877 - Posted: 23 May 2016, 22:42:31 UTC - in response to Message 1789785.  

OK, i just checked and I do see a cuda60 app there setiathome_8.01_x86_64-pc-linux-gnu__cuda60 but it doesn’t seem to be used at all
Here is what is actually running after updating my nvidia driver and rebooting:
setiathome_8.05_i686-pc-linux-gnu
setiathome_8.00_x86_64-pc-linux-gnu
setiathome_8.00_x86_64-pc-linux-gnu
setiathome_8.00_x86_64-pc-linux-gnu
setiathome_8.00_x86_64-pc-linux-gnu
setiathome_8.00_x86_64-pc-linux-gnu
setiathome_8.00_x86_64-pc-linux-gnu
setiathome_8.05_i686-pc-linux-gnu
setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG --device 0

Why is 8.05 running there? Is that normal?
Quad core w/HT so yeh, 8 threads plus a GPU thread so the count is right but whats 8.05 vs 8.00?
Anyway I'll let it run awhile and see if it freezes up
ID: 1789877 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1789917 - Posted: 24 May 2016, 2:17:47 UTC - in response to Message 1789877.  

3.5 hrs and so far so good after upgrading my nvidia driver - I'll know better after 24 hrs or so
My other PC has the same video card and driver but doesnt get any GPU work at all - any ideas as to why? Do i need to modify my app_info.xml?

<app_info>
    <app>
      <name>setiathome_v8</name>
    </app>
    <file_info>
      <name>MBv8_8.04r3306_sse42_linux64</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <version_num>804</version_num>
      <platform>x86_64-pc-linux-gnu</platform>
      <cmdline></cmdline>
      <file_ref>
        <file_name>MBv8_8.04r3306_sse42_linux64</file_name>
        <main_program/>
      </file_ref>
    </app_version>
</app_info>
ID: 1789917 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1789973 - Posted: 24 May 2016, 6:43:24 UTC - in response to Message 1789917.  

3.5 hrs and so far so good after upgrading my nvidia driver - I'll know better after 24 hrs or so
My other PC has the same video card and driver but doesnt get any GPU work at all - any ideas as to why? Do i need to modify my app_info.xml?

<app_info>
    <app>
      <name>setiathome_v8</name>
    </app>
    <file_info>
      <name>MBv8_8.04r3306_sse42_linux64</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <version_num>804</version_num>
      <platform>x86_64-pc-linux-gnu</platform>
      <cmdline></cmdline>
      <file_ref>
        <file_name>MBv8_8.04r3306_sse42_linux64</file_name>
        <main_program/>
      </file_ref>
    </app_version>
</app_info>

It's app_info for CPU SSE4.2 app. So, yes, you need to add section for GPU app.
ID: 1789973 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1790031 - Posted: 24 May 2016, 12:44:37 UTC - in response to Message 1789917.  

3.5 hrs and so far so good after upgrading my nvidia driver - I'll know better after 24 hrs or so
My other PC has the same video card and driver but doesnt get any GPU work at all - any ideas as to why? Do i need to modify my app_info.xml?

...

Check the downloaded lunatics package again. In the "example_app_info_files" are several examples how an app_info.xml could look alike. There is one section per <app></app> that you want to run. Replace the names from the example with your own app-data, check for correct spelling of data and try it.
_\|/_
U r s
ID: 1790031 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1790045 - Posted: 24 May 2016, 14:53:58 UTC

Does this look right?
<app_info>
    <app>
      <name>setiathome_v8</name>
    </app>
    <file_info>
      <name>MBv8_8.04r3306_sse42_linux64</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <version_num>804</version_num>
      <platform>x86_64-pc-linux-gnu</platform>
      <cmdline></cmdline>
      <file_ref>
        <file_name>MBv8_8.04r3306_sse42_linux64</file_name>
        <main_program/>
      </file_ref>
    </app_version>
    
      <app_version>
      <app_name>setiathome_v8</app_name>
      <version_num>810</version_num>
      <platform>x86_64-pc-linux-gnu</platform>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <plan_class>opencl_nvidia_SoG</plan_class>
      <avg_ncpus>0.05</avg_ncpus>
      <max_ncpus>0.2</max_ncpus>
      <cmdline></cmdline>
      <file_ref>
        <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG</file_name>
        <main_program/>
      </file_ref>
    </app_version>
  
      <app_version>
      <app_name>setiathome_v8</app_name>
      <version_num>801</version_num>
      <platform>x86_64-pc-linux-gnu</platform>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <plan_class>cuda60</plan_class>
      <avg_ncpus>0.05</avg_ncpus>
      <max_ncpus>0.2</max_ncpus>
      <cmdline></cmdline>
      <file_ref>
        <file_name>setiathome_8.01_x86_64-pc-linux-gnu__cuda60</file_name>
        <main_program/>
      </file_ref>
    </app_version>
    
      <app_version>
      <app_name>setiathome_v8</app_name>
      <version_num>810</version_num>
      <platform>x86_64-pc-linux-gnu</platform>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <plan_class>opencl_nvidia_sah</plan_class>
      <avg_ncpus>0.05</avg_ncpus>
      <max_ncpus>0.2</max_ncpus>
      <cmdline></cmdline>
      <file_ref>
        <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</file_name>
        <main_program/>
      </file_ref>
    </app_version>
    
</app_info>

ID: 1790045 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1790046 - Posted: 24 May 2016, 14:57:56 UTC - in response to Message 1790045.  

I guess it isnt right:
Tue 24 May 2016 07:55:11 AM PDT | SETI@home | Found app_info.xml; using anonymous platform
Tue 24 May 2016 07:55:11 AM PDT | SETI@home | [error] State file error: missing application file setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG
Tue 24 May 2016 07:55:11 AM PDT | SETI@home | [error] State file error: missing application file setiathome_8.01_x86_64-pc-linux-gnu__cuda60
Tue 24 May 2016 07:55:11 AM PDT | SETI@home | [error] State file error: missing application file setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah

The apps are there in BOINC/projects/setiathome.berkeley.edu
How do I fix this?
ID: 1790046 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1790071 - Posted: 24 May 2016, 21:37:30 UTC

I twigged it around and came up with what seems like a workable app_info.xml, now i get GPU WU's but they dont run due to this error:
"Waiting to run (0.05 CPUs + 1 NVIDIA GPU)(Scheduler Wait: Cant read CL file)"

Here's my app_info if anyone can comment on it it would be appreciated:
<app_info>
    <app>
        <name>setiathome_v8</name>
    </app>
    <file_info>
        <name>MBv8_8.04r3306_sse42_linux64</name>
        <executable/>
    </file_info>
    <app_version>
        <app_name>setiathome_v8</app_name>
        <version_num>804</version_num>
        <platform>x86_64-pc-linux-gnu</platform>
        <file_ref>
          <file_name>MBv8_8.04r3306_sse42_linux64</file_name>
            <main_program/>
        </file_ref>
    </app_version>
    <app>
      <name>setiathome_v8</name>
    </app>
    <file_info>
      <name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <version_num>810</version_num>
      <platform>x86_64-pc-linux-gnu</platform>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <plan_class>opencl_nvidia_sah</plan_class>
      <avg_ncpus>0.05</avg_ncpus>
      <max_ncpus>0.2</max_ncpus>
      <cmdline></cmdline>
      <file_ref>
        <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</file_name>
        <main_program/>
      </file_ref>
    </app_version>

    <app>
      <name>setiathome_v8</name>
    </app>
    <file_info>
      <name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <version_num>810</version_num>
      <platform>x86_64-pc-linux-gnu</platform>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <plan_class>opencl_nvidia_SoG</plan_class>
      <avg_ncpus>0.05</avg_ncpus>
      <max_ncpus>0.2</max_ncpus>
      <cmdline></cmdline>
      <file_ref>
        <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG</file_name>
        <main_program/>
      </file_ref>
    </app_version>

    <app>
      <name>setiathome_v8</name>
    </app>
    <file_info>
      <name>setiathome_8.01_x86_64-pc-linux-gnu__cuda60</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <version_num>801</version_num>
      <platform>x86_64-pc-linux-gnu</platform>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <plan_class>cuda60</plan_class>
      <avg_ncpus>0.05</avg_ncpus>
      <max_ncpus>0.2</max_ncpus>
      <cmdline></cmdline>
      <file_ref>
        <file_name>setiathome_8.01_x86_64-pc-linux-gnu__cuda60</file_name>
        <main_program/>
      </file_ref>
    </app_version>

</app_info>
ID: 1790071 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1790101 - Posted: 24 May 2016, 23:22:22 UTC
Last modified: 24 May 2016, 23:24:10 UTC

Eric B,

decide which of the two nvidia OpenCL apps works better on your host, run only that one. Check
https://setiathome.berkeley.edu/host_app_versions.php?hostid=[your-host-id] (just replace with the host-id you want to look at)
to see which of "nvidia_opencl_sah" or "nvidia_opencl_SoG" works better.
I twigged it around and came up with what seems like a workable app_info.xml, now i get GPU WU's but they dont run due to this error:

"Waiting to run (0.05 CPUs + 1 NVIDIA GPU)(Scheduler Wait: Cant read CL file)"
Is the *.cl file still present. BOINC has the habit of "cleaning up" too many files sometimes, when switching from stock to anonymous platform.

Set <max_ncpus>0.95</max_ncpus> a little higher and keep one CPU core free.
_\|/_
U r s
ID: 1790101 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1790123 - Posted: 25 May 2016, 0:02:51 UTC

when i try that URL and my id (5023) i get an empty page except for the seti headers and footers, no host info
ID: 1790123 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1790127 - Posted: 25 May 2016, 0:10:49 UTC

OK, I got it! Through trial and error it seems only the cuda60 will run
Too bad, I was hoping to use OpenCL as the GTX460 supports it according to the boinc log
CUDA: NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 8.0, compute capability 2.1, 964MB, 612MB available, 961 GFLOPS peak)
OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 367.18, device version OpenCL 1.1 CUDA, 964MB, 612MB available, 961 GFLOPS peak)
ID: 1790127 · Report as offensive

Message boards : Number crunching : Trouble with GPU freezing machine


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.