OpenCL kernel/call clGetEventProfilingInfo call failed

Author	Message
Urs Echternacht Volunteer tester Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211	Message 1846001 - Posted: 2 Feb 2017, 22:14:10 UTC - in response to Message 1845996. Hi, I'm running linux and NVIDIA cuda, but... I've experienced random errors when using drivers 376-8.yyy. (Cannot get GPU count or similar). The OpenCL 'may' use the same libraries as CUDA during compilation and when executing code and allocating resources. -- p. So far it has only be reported on ubuntu, so it should not be a general Linux problem. _\\|/_ U r s ID: 1846001 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1846011 - Posted: 2 Feb 2017, 22:59:03 UTC - in response to Message 1846001. Last modified: 2 Feb 2017, 23:07:20 UTC Hi, I'm running linux and NVIDIA cuda, but... I've experienced random errors when using drivers 376-8.yyy. (Cannot get GPU count or similar). The OpenCL 'may' use the same libraries as CUDA during compilation and when executing code and allocating resources. -- p. So far it has only be reported on ubuntu, so it should not be a general Linux problem. Yeah, thanks. I'm running Ubuntu too. Not the latest though. petri@Linux1:~$ cat /etc/*-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=15.10 DISTRIB_CODENAME=wily DISTRIB_DESCRIPTION="Ubuntu 15.10" NAME="Ubuntu" VERSION="15.10 (Wily Werewolf)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 15.10" VERSION_ID="15.10" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" And yes, I know. My SW is totally different from the OpenCL one and I may have some dust bunnies hiding in the cooling system of my GPUs. I just happen to have an error or two every day rendering my machine to halt to near zero productivity and resulting to errors saying "Cuda error 'Couldn't get cuda device count". When I get back from work to home the "nvidia-smi -l" window says ERR on one or two GPUs. I can revert back a version or two with the drivers just to test. I'll test that tomorrow. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1846011 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1846131 - Posted: 3 Feb 2017, 10:56:06 UTC - in response to Message 1846011. My SW is totally different from the OpenCL one and I may have some dust bunnies hiding in the cooling system of my GPUs. I just happen to have an error or two every day rendering my machine to halt to near zero productivity and resulting to errors saying "Cuda error 'Couldn't get cuda device count". When I get back from work to home the "nvidia-smi -l" window says ERR on one or two GPUs. It's quite easy way to check if OpenCL and CUDA device disappearing have common roots. When on host with OpenCL app errors start again worth to run just same command ( nvidia-smi -l ) and see what GPU state it reports. I suppose if it reports error no matter what runtime CUDA or OpenCL is used, the issue on deeper level than just runtime API. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1846131 ·

jnamath Send message Joined: 5 Jan 16 Posts: 4 Credit: 4,659,687 RAC: 56	Message 1846486 - Posted: 4 Feb 2017, 14:34:52 UTC I'm seeing the sporadic profiling error on a second machine - SuSE 42.1 this time. Doesn't hurt, since the task is aborted after 10s, so not much harm done. They have different driver versions, but both are 750tis Cheers Holger ID: 1846486 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1846711 - Posted: 4 Feb 2017, 22:11:03 UTC - in response to Message 1846131. Last modified: 4 Feb 2017, 22:14:12 UTC My SW is totally different from the OpenCL one and I may have some dust bunnies hiding in the cooling system of my GPUs. I just happen to have an error or two every day rendering my machine to halt to near zero productivity and resulting to errors saying "Cuda error 'Couldn't get cuda device count". When I get back from work to home the "nvidia-smi -l" window says ERR on one or two GPUs. It's quite easy way to check if OpenCL and CUDA device disappearing have common roots. When on host with OpenCL app errors start again worth to run just same command ( nvidia-smi -l ) and see what GPU state it reports. I suppose if it reports error no matter what runtime CUDA or OpenCL is used, the issue on deeper level than just runtime API. Affirmative. A terminal window running solely nvidia-smi -l reports ERR on one or two GPUs when this happens. That is on an ubuntu nvidia machine. EDIT: The machine is ground to the halt. You have to have the window open before launching BOINC. Just let it run a few days. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1846711 ·

David Anderson (not that DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74	Message 1847347 - Posted: 8 Feb 2017, 17:04:54 UTC Problem back again. Seems (?) to arise whenever an update from Ubuntu wants a reboot. The following is post-reboot (which I did yesterday at 10PM PST after removing seti bin and wisdom files again when I noticed the errors). I only noticed Raistmer's mention of 'nvidia-smi -l' just now. PST: Wed Feb 8 08:57:33 2017 +-----------------------------------------------------------------------------+ \| NVIDIA-SMI 367.57 Driver Version: 367.57 \| \|-------------------------------+----------------------+----------------------+ \| GPU Name Persistence-M\| Bus-Id Disp.A \| Volatile Uncorr. ECC \| \| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \| \|===============================+======================+======================\| \| 0 GeForce GTX 750 Off \| 0000:01:00.0 On \| N/A \| \| N/A 58C P8 1W / 38W \| 148MiB / 1998MiB \| 0% Default \| +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 1099 G /usr/lib/xorg/Xorg 146MiB \| +-----------------------------------------------------------------------------+ ID: 1847347 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1847350 - Posted: 8 Feb 2017, 17:17:55 UTC - in response to Message 1847347. It's list from correctly operational state, right? Would be interesting to get same list when app failures start again (before rebooting). SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1847350 ·

Urs Echternacht Volunteer tester Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211	Message 1847624 - Posted: 9 Feb 2017, 22:04:22 UTC - in response to Message 1847347. Problem back again. Seems (?) to arise whenever an update from Ubuntu wants a reboot. The following is post-reboot (which I did yesterday at 10PM PST after removing seti bin and wisdom files again when I noticed the errors). I only noticed Raistmer's mention of 'nvidia-smi -l' just now. ... You could also try the newer version. Its now downloadable at lunatics. Snippets for app_info.xml usage are also included, but if you have problems getting the app_info.xml working, ask. MBv8_r3602_Beta _\\|/_ U r s ID: 1847624 ·

David Anderson (not that DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74	Message 1847680 - Posted: 10 Feb 2017, 3:35:56 UTC No reboot since last report (which was 8Feb 2017) Thu Feb 9 19:29:52 2017 +-----------------------------------------------------------------------------+ \| NVIDIA-SMI 367.57 Driver Version: 367.57 \| \|-------------------------------+----------------------+----------------------+ \| GPU Name Persistence-M\| Bus-Id Disp.A \| Volatile Uncorr. ECC \| \| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \| \|===============================+======================+======================\| \| 0 GeForce GTX 750 Off \| 0000:01:00.0 On \| N/A \| \| N/A 79C P0 11W / 38W \| 937MiB / 1998MiB \| 46% Default \| +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 1099 G /usr/lib/xorg/Xorg 170MiB \| \| 0 13063 C ...17_x86_64-pc-linux-gnu__FGRPopencl-nvidia 764MiB \| +-----------------------------------------------------------------------------+ ID: 1847680 ·

David Anderson (not that DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74	Message 1847682 - Posted: 10 Feb 2017, 3:37:42 UTC If I read the reports correctly, https://setiathome.berkeley.edu/result.php?resultid=5497421752 shows an error post last reboot, before the previous post a moment ago with nvidia-smi -l output. ID: 1847682 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1847745 - Posted: 10 Feb 2017, 8:26:33 UTC - in response to Message 1847682. then looks like different issue. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1847745 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.