Message boards :
Number crunching :
GPU AP's error out on one host
Message board moderation
Author | Message |
---|---|
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
On one host, every AP that hits my GPU's errors out with Exit status "193 (0xc1) EXIT_SIGNAL" and: INFO: can't open binary kernel file: /home/jpsoifer/BOINC/projects/setiathome.berkeley.edu/AstroPulse_Kernels_r2751.cl_GeForceGTX750Ti.bin_V7_TWIN_FFA_35511, continue with recompile... terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_S_construct null not valid SIGABRT: abort called Any idea what's causing this and how I might fix it? This is the host: http://setiathome.berkeley.edu/show_host_detail.php?hostid=7772630 |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Try removing the 2 TUNE cmdline settings and see if that helps; TUNE: kernel 1 now has workgroup size of (64,8,1) TUNE: kernel 2 now has workgroup size of (128,8,1) Have you had any APs work on a 750 Ti with those settings? Also, Linux usually likes lower FFA numbers. I found my cards like; FFA thread block override value:3072 FFA thread fetchblock override value:1536 Now that I think about it, I remember having a problem with the FFA thread block override value set to 6144 or above. The post in somewhere at Beta. So, you might try using; FFA thread block override value:3072 FFA thread fetchblock override value:1536 On the 750s. |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
Hi TBar, Thanks, I'll make those changes and see what happens. This is what I've got now: <cmdline>-unroll 12 -ffa_block 3072 -ffa_block_fetch 1536 -oclFFT_plan 256 16 256 -hp</cmdline> |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
|
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
You are over clocking your card, 750Ti's don't like to be pushed that hard. You have 1254Mhz, I have 1150Mhz on mine. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, You should check that the file BOINC/projects/setiathome.berkeley.edu/AstroPulse_Kernels_r2751.cl exists and has read permissions and that its contents is not empty. You can set the permissions in xterm with chmod ugo+r BOINC/projects/setiathome.berkeley.edu/AstroPulse_Kernels_r2751.cl that gives read access to all. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
You are over clocking your card, 750Ti's don't like to be pushed that hard. I don't overclock any of my cards. Those are EVGA 750ti SC's, which have the following factory clock speeds: 1176 MHz Base Clock 1255 MHz Boost Clock |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
Hi petri, No, that file does not exist. Shouldn't it have been created automatically? On this host [http://setiathome.berkeley.edu/show_host_detail.php?hostid=7772567] I see AstroPulse_Kernels_r2751.cl_GeForceGTX970.bin_V7_TWIN_FFA_35511 Is it something that is created during the driver install? Should I try reinstalling the Nvidia drivers? Hi, |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
No, it's something that is written and supplied by the application developer. You need it: it should have been supplied with the application. |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
I'm running the stock Linux AP opencl app. astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100. Shouldn't AstroPulse_Kernels_r2751.cl have downloaded/been created when I first received GPU AP work? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
If you were running Stock the file would have been downloaded, however, both your 750Ti Hosts are running Anonymous platform. The other Host is now giving the same Error, http://setiathome.berkeley.edu/results.php?hostid=7772598&state=6&appid=. If you are missing AstroPulse_Kernels_r2751.cl on those Hosts you're going to have to supply them manually. |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
OK, where can I get the file? Do you have a link? If you were running Stock the file would have been downloaded, however, both your 750Ti Hosts are running Anonymous platform. The other Host is now giving the same Error, http://setiathome.berkeley.edu/results.php?hostid=7772598&state=6&appid=. If you are missing AstroPulse_Kernels_r2751.cl on those Hosts you're going to have to supply them manually. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
You should be able to use the .cl file from one of your other machines, they are basically the same even across GPUs vendors. The link from another post should still work though, http://boinc2.ssl.berkeley.edu/beta/download/AstroPulse_Kernels_r2751.cl Hmmm, change the ati to nvidia and this one works too, http://boinc2.ssl.berkeley.edu/beta/download/astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100 It appears you have quite a few Ghosts on those machines. You Can recover them, 20 at a time if you have cache space and want to jump through the hoops. You basically have to report a task twice to trigger a resend event, then it will send 20 tasks per event. It might take a while with all those Ghosts ;-) |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
Thank you so much! Let's see how things go with the kernel file, then maybe I'll try tackling the ghosts. :-) |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
Now I'm getting errors on a different host. All seemed fine through yesterday: http://setiathome.berkeley.edu/result.php?resultid=4430016187 This started happening today: http://setiathome.berkeley.edu/result.php?resultid=4430588341 I have no clue what's going on. |
castor Send message Joined: 2 Jan 02 Posts: 13 Credit: 17,721,708 RAC: 0 |
Was there by any chance a Linux kernel update recently, messing up the nvidia driver install? |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, Your host 7772567 used to have 4 GPUS, now it has 3. Have You tried a reboot ... Maybe one of the GPU's has overheated or suffered some kind of crash. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
Hi, I'm seeing 4. http://setiathome.berkeley.edu/show_host_detail.php?hostid=7772567 So strange. |
Fawkesguy Send message Joined: 8 Jan 01 Posts: 108 Credit: 188,578,766 RAC: 0 |
Was there by any chance a Linux kernel update recently, messing up the nvidia driver install? No, I haven't done any kernel updates. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> Not using ap_cmdline.txt-file, using commandline options. DATA_CHUNK_UNROLL set to:18 oclFFT plan class overrides requested: global radix 256; local radix 16; max workgroup size 256 FFA thread block override value:16384 FFA thread fetchblock override value:8192 TUNE: kernel 1 now has workgroup size of (64,8,1) TUNE: kernel 2 now has workgroup size of (64,8,1) Running on device number: 3 GPU not found: type=NVIDIA, opencl_device_index=3, device_num=-1 WARNING: boinc_get_opencl_ids failed with code -1 OpenCL platform detected: NVIDIA Corporation WARNING: BOINC supplied wrong platform! Number of OpenCL devices found : 3 BOINC assigns slot on device #3. Yes, it should have four. The application finds only three. The second last line : Number of OpenCL devices found : 3 Strange.. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.