OpenCL kernel/call clGetEventProfilingInfo call failed

Message boards : Number crunching : OpenCL kernel/call clGetEventProfilingInfo call failed
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1846001 - Posted: 2 Feb 2017, 22:14:10 UTC - in response to Message 1845996.  

Hi,

I'm running linux and NVIDIA cuda, but... I've experienced random errors when using drivers 376-8.yyy. (Cannot get GPU count or similar).

The OpenCL 'may' use the same libraries as CUDA during compilation and when executing code and allocating resources.


--
p.

So far it has only be reported on ubuntu, so it should not be a general Linux problem.
_\|/_
U r s
ID: 1846001 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1846011 - Posted: 2 Feb 2017, 22:59:03 UTC - in response to Message 1846001.  
Last modified: 2 Feb 2017, 23:07:20 UTC

Hi,

I'm running linux and NVIDIA cuda, but... I've experienced random errors when using drivers 376-8.yyy. (Cannot get GPU count or similar).

The OpenCL 'may' use the same libraries as CUDA during compilation and when executing code and allocating resources.


--
p.

So far it has only be reported on ubuntu, so it should not be a general Linux problem.


Yeah, thanks. I'm running Ubuntu too. Not the latest though.

petri@Linux1:~$ cat /etc/*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.10
DISTRIB_CODENAME=wily
DISTRIB_DESCRIPTION="Ubuntu 15.10"
NAME="Ubuntu"
VERSION="15.10 (Wily Werewolf)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 15.10"
VERSION_ID="15.10"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"


And yes, I know. My SW is totally different from the OpenCL one and I may have some dust bunnies hiding in the cooling system of my GPUs. I just happen to have an error or two every day rendering my machine to halt to near zero productivity and resulting to errors saying "Cuda error 'Couldn't get cuda device count". When I get back from work to home the "nvidia-smi -l" window says ERR on one or two GPUs.

I can revert back a version or two with the drivers just to test. I'll test that tomorrow.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1846011 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1846131 - Posted: 3 Feb 2017, 10:56:06 UTC - in response to Message 1846011.  

My SW is totally different from the OpenCL one and I may have some dust bunnies hiding in the cooling system of my GPUs. I just happen to have an error or two every day rendering my machine to halt to near zero productivity and resulting to errors saying "Cuda error 'Couldn't get cuda device count". When I get back from work to home the "nvidia-smi -l" window says ERR on one or two GPUs.

It's quite easy way to check if OpenCL and CUDA device disappearing have common roots.

When on host with OpenCL app errors start again worth to run just same command ( nvidia-smi -l ) and see what GPU state it reports.
I suppose if it reports error no matter what runtime CUDA or OpenCL is used, the issue on deeper level than just runtime API.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1846131 · Report as offensive
jnamath

Send message
Joined: 5 Jan 16
Posts: 4
Credit: 4,659,687
RAC: 56
Germany
Message 1846486 - Posted: 4 Feb 2017, 14:34:52 UTC

I'm seeing the sporadic profiling error on a second machine - SuSE 42.1 this time.
Doesn't hurt, since the task is aborted after 10s, so not much harm done.

They have different driver versions, but both are 750tis

Cheers
Holger
ID: 1846486 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1846711 - Posted: 4 Feb 2017, 22:11:03 UTC - in response to Message 1846131.  
Last modified: 4 Feb 2017, 22:14:12 UTC

My SW is totally different from the OpenCL one and I may have some dust bunnies hiding in the cooling system of my GPUs. I just happen to have an error or two every day rendering my machine to halt to near zero productivity and resulting to errors saying "Cuda error 'Couldn't get cuda device count". When I get back from work to home the "nvidia-smi -l" window says ERR on one or two GPUs.

It's quite easy way to check if OpenCL and CUDA device disappearing have common roots.

When on host with OpenCL app errors start again worth to run just same command ( nvidia-smi -l ) and see what GPU state it reports.
I suppose if it reports error no matter what runtime CUDA or OpenCL is used, the issue on deeper level than just runtime API.


Affirmative. A terminal window running solely nvidia-smi -l reports ERR on one or two GPUs when this happens. That is on an ubuntu nvidia machine.

EDIT: The machine is ground to the halt. You have to have the window open before launching BOINC. Just let it run a few days.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1846711 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1847347 - Posted: 8 Feb 2017, 17:04:54 UTC

Problem back again. Seems (?) to arise whenever an update from Ubuntu
wants a reboot.

The following is post-reboot (which I did yesterday at 10PM PST after
removing seti bin and wisdom files again when I noticed the errors).
I only noticed Raistmer's mention of 'nvidia-smi -l' just now.

PST:
Wed Feb 8 08:57:33 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Off | 0000:01:00.0 On | N/A |
| N/A 58C P8 1W / 38W | 148MiB / 1998MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1099 G /usr/lib/xorg/Xorg 146MiB |
+-----------------------------------------------------------------------------+
ID: 1847347 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1847350 - Posted: 8 Feb 2017, 17:17:55 UTC - in response to Message 1847347.  

It's list from correctly operational state, right?
Would be interesting to get same list when app failures start again (before rebooting).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1847350 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1847624 - Posted: 9 Feb 2017, 22:04:22 UTC - in response to Message 1847347.  

Problem back again. Seems (?) to arise whenever an update from Ubuntu
wants a reboot.

The following is post-reboot (which I did yesterday at 10PM PST after
removing seti bin and wisdom files again when I noticed the errors).
I only noticed Raistmer's mention of 'nvidia-smi -l' just now.
...

You could also try the newer version. Its now downloadable at lunatics. Snippets for app_info.xml usage are also included, but if you have problems getting the app_info.xml working, ask.
MBv8_r3602_Beta
_\|/_
U r s
ID: 1847624 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1847680 - Posted: 10 Feb 2017, 3:35:56 UTC

No reboot since last report (which was 8Feb 2017)
Thu Feb 9 19:29:52 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Off | 0000:01:00.0 On | N/A |
| N/A 79C P0 11W / 38W | 937MiB / 1998MiB | 46% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1099 G /usr/lib/xorg/Xorg 170MiB |
| 0 13063 C ...17_x86_64-pc-linux-gnu__FGRPopencl-nvidia 764MiB |
+-----------------------------------------------------------------------------+
ID: 1847680 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1847682 - Posted: 10 Feb 2017, 3:37:42 UTC

If I read the reports correctly,
https://setiathome.berkeley.edu/result.php?resultid=5497421752
shows an error post last reboot, before the previous post a moment ago
with nvidia-smi -l
output.
ID: 1847682 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1847745 - Posted: 10 Feb 2017, 8:26:33 UTC - in response to Message 1847682.  

then looks like different issue.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1847745 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : OpenCL kernel/call clGetEventProfilingInfo call failed


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.