Message boards :
Number crunching :
Trouble with GPU freezing machine
Message board moderation
Author | Message |
---|---|
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
I noticed that BOINC downloaded and starting running GPU tasks on my NVIDIA GTX460. a quick check shows three running crunching programs: setiathome_8.00_x86_64-pc-linux-gnu setiathome_8.05_i686-pc-linux-gnu setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG --device 0 It goes ok for awhile then my PC freezes and i have to power cycle it. NVIDIA driver is 352.63 OS Linux OpenSuse 13.1 running x86_64 on a 4.1.2 kernel, quad core i7 with 8G of dram The pc is dedicated to Seti and some weather station SW (low impact) Whats the story on nvidia GPU computing under linux (with latest seti) is it finalized now or still working out the bugs? Should i update the nvidia driver? Any ideas as to how i can prevent this freeze up? Its happened twice now so i have GPU computing suspended |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
I noticed that BOINC downloaded and starting running GPU tasks on my NVIDIA GTX460. a quick check shows three running crunching programs: Hi, have you checked for dust hindering proper cooling. How is your PSU ? Still good/healthy enough to feed a GTX 460 ? There are a few updates that you could try : -check if the "freezes" leave any log-entries behind to find out what goes wrong, otherwise you can't prevent that ! -allow running the cuda60 app for linux -update NVIDIA driver is 352.63 -> 352.xx -openSUSE 13.1 mainline kernel is 3.12.57 currently, try a downgrade (if your other software will allow that), because 4.1.2 is rather old. Did you get any error messages in reported results on the freezes ? (Your hosts are hidden, so have to ask) From applications page : Linux/x86_64 8.10 (opencl_nvidia_sah) 18 May 2016, 1:10:51 UTC 940 GigaFLOPS _\|/_ U r s |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
What does it mean? "-allow running the cuda60 app for linux" How do i do that? I'll try updating the driver today to latest nvidia version and if that doesnt fix it I'll downgrade to 3.11 kernel and see what happens |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
What does it mean? "-allow running the cuda60 app for linux" How do i do that?If you're using only "stock" apps, then you should get cuda apps automatically through BOINC. That you have not till now could have two reasons : -bad luck of draw. -you have somehow excluded cuda apps by some config setting. I'll try updating the driver today to latest nvidia version and if that doesnt fix it I'll downgrade to 3.11 kernel and see what happens Or you could also try to upgrade to 4.1.25, just try some other kernel version to exclude that factor for freezes. _\|/_ U r s |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
OK, i just checked and I do see a cuda60 app there setiathome_8.01_x86_64-pc-linux-gnu__cuda60 but it doesn’t seem to be used at all Here is what is actually running after updating my nvidia driver and rebooting: setiathome_8.05_i686-pc-linux-gnu setiathome_8.00_x86_64-pc-linux-gnu setiathome_8.00_x86_64-pc-linux-gnu setiathome_8.00_x86_64-pc-linux-gnu setiathome_8.00_x86_64-pc-linux-gnu setiathome_8.00_x86_64-pc-linux-gnu setiathome_8.00_x86_64-pc-linux-gnu setiathome_8.05_i686-pc-linux-gnu setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG --device 0 Why is 8.05 running there? Is that normal? Quad core w/HT so yeh, 8 threads plus a GPU thread so the count is right but whats 8.05 vs 8.00? Anyway I'll let it run awhile and see if it freezes up |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
3.5 hrs and so far so good after upgrading my nvidia driver - I'll know better after 24 hrs or so My other PC has the same video card and driver but doesnt get any GPU work at all - any ideas as to why? Do i need to modify my app_info.xml? <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.04r3306_sse42_linux64</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>804</version_num> <platform>x86_64-pc-linux-gnu</platform> <cmdline></cmdline> <file_ref> <file_name>MBv8_8.04r3306_sse42_linux64</file_name> <main_program/> </file_ref> </app_version> </app_info> |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
3.5 hrs and so far so good after upgrading my nvidia driver - I'll know better after 24 hrs or so It's app_info for CPU SSE4.2 app. So, yes, you need to add section for GPU app. |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
3.5 hrs and so far so good after upgrading my nvidia driver - I'll know better after 24 hrs or so Check the downloaded lunatics package again. In the "example_app_info_files" are several examples how an app_info.xml could look alike. There is one section per <app></app> that you want to run. Replace the names from the example with your own app-data, check for correct spelling of data and try it. _\|/_ U r s |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
Does this look right? <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.04r3306_sse42_linux64</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>804</version_num> <platform>x86_64-pc-linux-gnu</platform> <cmdline></cmdline> <file_ref> <file_name>MBv8_8.04r3306_sse42_linux64</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_v8</app_name> <version_num>810</version_num> <platform>x86_64-pc-linux-gnu</platform> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <plan_class>opencl_nvidia_SoG</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.2</max_ncpus> <cmdline></cmdline> <file_ref> <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_v8</app_name> <version_num>801</version_num> <platform>x86_64-pc-linux-gnu</platform> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <plan_class>cuda60</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.2</max_ncpus> <cmdline></cmdline> <file_ref> <file_name>setiathome_8.01_x86_64-pc-linux-gnu__cuda60</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_v8</app_name> <version_num>810</version_num> <platform>x86_64-pc-linux-gnu</platform> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <plan_class>opencl_nvidia_sah</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.2</max_ncpus> <cmdline></cmdline> <file_ref> <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</file_name> <main_program/> </file_ref> </app_version> </app_info> |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
I guess it isnt right: Tue 24 May 2016 07:55:11 AM PDT | SETI@home | Found app_info.xml; using anonymous platform Tue 24 May 2016 07:55:11 AM PDT | SETI@home | [error] State file error: missing application file setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG Tue 24 May 2016 07:55:11 AM PDT | SETI@home | [error] State file error: missing application file setiathome_8.01_x86_64-pc-linux-gnu__cuda60 Tue 24 May 2016 07:55:11 AM PDT | SETI@home | [error] State file error: missing application file setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah The apps are there in BOINC/projects/setiathome.berkeley.edu How do I fix this? |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
I twigged it around and came up with what seems like a workable app_info.xml, now i get GPU WU's but they dont run due to this error: "Waiting to run (0.05 CPUs + 1 NVIDIA GPU)(Scheduler Wait: Cant read CL file)" Here's my app_info if anyone can comment on it it would be appreciated: <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.04r3306_sse42_linux64</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>804</version_num> <platform>x86_64-pc-linux-gnu</platform> <file_ref> <file_name>MBv8_8.04r3306_sse42_linux64</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>810</version_num> <platform>x86_64-pc-linux-gnu</platform> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <plan_class>opencl_nvidia_sah</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.2</max_ncpus> <cmdline></cmdline> <file_ref> <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_sah</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>810</version_num> <platform>x86_64-pc-linux-gnu</platform> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <plan_class>opencl_nvidia_SoG</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.2</max_ncpus> <cmdline></cmdline> <file_ref> <file_name>setiathome_8.10_x86_64-pc-linux-gnu__opencl_nvidia_SoG</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_8.01_x86_64-pc-linux-gnu__cuda60</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>801</version_num> <platform>x86_64-pc-linux-gnu</platform> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <plan_class>cuda60</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.2</max_ncpus> <cmdline></cmdline> <file_ref> <file_name>setiathome_8.01_x86_64-pc-linux-gnu__cuda60</file_name> <main_program/> </file_ref> </app_version> </app_info> |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
Eric B, decide which of the two nvidia OpenCL apps works better on your host, run only that one. Check https://setiathome.berkeley.edu/host_app_versions.php?hostid=[your-host-id] (just replace with the host-id you want to look at) to see which of "nvidia_opencl_sah" or "nvidia_opencl_SoG" works better. I twigged it around and came up with what seems like a workable app_info.xml, now i get GPU WU's but they dont run due to this error:Is the *.cl file still present. BOINC has the habit of "cleaning up" too many files sometimes, when switching from stock to anonymous platform. Set <max_ncpus>0.95</max_ncpus> a little higher and keep one CPU core free. _\|/_ U r s |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
when i try that URL and my id (5023) i get an empty page except for the seti headers and footers, no host info |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
OK, I got it! Through trial and error it seems only the cuda60 will run Too bad, I was hoping to use OpenCL as the GTX460 supports it according to the boinc log CUDA: NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 8.0, compute capability 2.1, 964MB, 612MB available, 961 GFLOPS peak) OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 367.18, device version OpenCL 1.1 CUDA, 964MB, 612MB available, 961 GFLOPS peak) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.