Message boards :
Number crunching :
OpenCL kernel/call clGetEventProfilingInfo call failed
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
Hi, So far it has only be reported on ubuntu, so it should not be a general Linux problem. _\|/_ U r s |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, Yeah, thanks. I'm running Ubuntu too. Not the latest though. petri@Linux1:~$ cat /etc/*-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=15.10 DISTRIB_CODENAME=wily DISTRIB_DESCRIPTION="Ubuntu 15.10" NAME="Ubuntu" VERSION="15.10 (Wily Werewolf)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 15.10" VERSION_ID="15.10" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" And yes, I know. My SW is totally different from the OpenCL one and I may have some dust bunnies hiding in the cooling system of my GPUs. I just happen to have an error or two every day rendering my machine to halt to near zero productivity and resulting to errors saying "Cuda error 'Couldn't get cuda device count". When I get back from work to home the "nvidia-smi -l" window says ERR on one or two GPUs. I can revert back a version or two with the drivers just to test. I'll test that tomorrow. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
My SW is totally different from the OpenCL one and I may have some dust bunnies hiding in the cooling system of my GPUs. I just happen to have an error or two every day rendering my machine to halt to near zero productivity and resulting to errors saying "Cuda error 'Couldn't get cuda device count". When I get back from work to home the "nvidia-smi -l" window says ERR on one or two GPUs. It's quite easy way to check if OpenCL and CUDA device disappearing have common roots. When on host with OpenCL app errors start again worth to run just same command ( nvidia-smi -l ) and see what GPU state it reports. I suppose if it reports error no matter what runtime CUDA or OpenCL is used, the issue on deeper level than just runtime API. SETI apps news We're not gonna fight them. We're gonna transcend them. |
jnamath Send message Joined: 5 Jan 16 Posts: 4 Credit: 4,659,687 RAC: 56 |
I'm seeing the sporadic profiling error on a second machine - SuSE 42.1 this time. Doesn't hurt, since the task is aborted after 10s, so not much harm done. They have different driver versions, but both are 750tis Cheers Holger |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
My SW is totally different from the OpenCL one and I may have some dust bunnies hiding in the cooling system of my GPUs. I just happen to have an error or two every day rendering my machine to halt to near zero productivity and resulting to errors saying "Cuda error 'Couldn't get cuda device count". When I get back from work to home the "nvidia-smi -l" window says ERR on one or two GPUs. Affirmative. A terminal window running solely nvidia-smi -l reports ERR on one or two GPUs when this happens. That is on an ubuntu nvidia machine. EDIT: The machine is ground to the halt. You have to have the window open before launching BOINC. Just let it run a few days. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
Problem back again. Seems (?) to arise whenever an update from Ubuntu wants a reboot. The following is post-reboot (which I did yesterday at 10PM PST after removing seti bin and wisdom files again when I noticed the errors). I only noticed Raistmer's mention of 'nvidia-smi -l' just now. PST: Wed Feb 8 08:57:33 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.57 Driver Version: 367.57 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 750 Off | 0000:01:00.0 On | N/A | | N/A 58C P8 1W / 38W | 148MiB / 1998MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1099 G /usr/lib/xorg/Xorg 146MiB | +-----------------------------------------------------------------------------+ |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It's list from correctly operational state, right? Would be interesting to get same list when app failures start again (before rebooting). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
Problem back again. Seems (?) to arise whenever an update from Ubuntu You could also try the newer version. Its now downloadable at lunatics. Snippets for app_info.xml usage are also included, but if you have problems getting the app_info.xml working, ask. MBv8_r3602_Beta _\|/_ U r s |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
No reboot since last report (which was 8Feb 2017) Thu Feb 9 19:29:52 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.57 Driver Version: 367.57 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 750 Off | 0000:01:00.0 On | N/A | | N/A 79C P0 11W / 38W | 937MiB / 1998MiB | 46% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1099 G /usr/lib/xorg/Xorg 170MiB | | 0 13063 C ...17_x86_64-pc-linux-gnu__FGRPopencl-nvidia 764MiB | +-----------------------------------------------------------------------------+ |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
If I read the reports correctly, https://setiathome.berkeley.edu/result.php?resultid=5497421752 shows an error post last reboot, before the previous post a moment ago with nvidia-smi -l output. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
then looks like different issue. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.