OpenCL kernel/call clGetEventProfilingInfo call failed

Message boards : Number crunching : OpenCL kernel/call clGetEventProfilingInfo call failed
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1841797 - Posted: 13 Jan 2017, 5:00:17 UTC
Last modified: 13 Jan 2017, 5:01:21 UTC

After recent ubuntu update I get
ERROR: OpenCL kernel/call 'clGetEventProfilingInfo' call failed (-7) in file ../../src/GPU_lock.cpp near line 550.


on task:
https://setiathome.berkeley.edu/result.php?resultid=5423147715
and many others today.
SETI@home v8 v8.22 (opencl_nvidia_SoG) x86_64-pc-linux-gnu

Been fine on the GPU till today. (there was a security related update
yesterday, I recall vaguely).
Suspended Seti on this machine till I figure out something.
(This is with stock apps)
ID: 1841797 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1841821 - Posted: 13 Jan 2017, 6:04:57 UTC - in response to Message 1841797.  

#define CL_PROFILING_INFO_NOT_AVAILABLE -7

reboot or rollback.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1841821 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1841968 - Posted: 13 Jan 2017, 19:45:18 UTC - in response to Message 1841821.  

Rebooted and the first opencl_nvidia_SoG task finished in seconds
and I'm sure that will show up as an error before long.
I'm unsure how to roll recent updates back. I'll look at the logs
and see what I can see.
ID: 1841968 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1841972 - Posted: 13 Jan 2017, 20:12:00 UTC - in response to Message 1841968.  

Here are the recent changes (lines folded so viewing a tiny
bit easier).

Start-Date: 2017-01-07 13:27:18
Commandline: apt-get install libgnome-keyring-dev build-essential
Requested-By: davea (1000)


Start-Date: 2017-01-10 15:30:53
Commandline: aptdaemon role='role-commit-packages' sender=':1.164'
Install: linux-image-4.4.0-59-generic:amd64 (4.4.0-59.80,
automatic), linux-image-extra-4.4.0-59-generic:amd64
(4.4.0-59.80, automatic), linux-headers-4.4.0-59:amd64
(4.4.0-59.80, automatic), linux-headers-4.4.0-59-generic:amd64
(4.4.0-59.80, automatic)


Start-Date: 2017-01-12 07:29:21
Commandline: aptdaemon role='role-commit-packages' sender=':1.66'
Upgrade: libvncclient1:amd64 (0.9.10+dfsg-3build1,
0.9.10+dfsg-3ubuntu0.16.04.1)


Start-Date: 2017-01-13 08:11:56
Commandline: aptdaemon role='role-commit-packages' sender=':1.66'
Upgrade: libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3,
1:9.10.3.dfsg.P4-8ubuntu1.4), libisccfg140:amd64
(1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4),
libapt-inst2.0:amd64 (1.2.15ubuntu0.2, 1.2.18),
update-notifier-common:amd64 (3.168.2, 3.168.3),
apt:amd64 (1.2.15ubuntu0.2, 1.2.18), bind9-host:amd64
(1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4),
dnsutils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3,
1:9.10.3.dfsg.P4-8ubuntu1.4), libisc160:amd64
(1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4),
libapt-pkg5.0:amd64 (1.2.15ubuntu0.2, 1.2.18),
libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3,
1:9.10.3.dfsg.P4-8ubuntu1.4), liblwres141:amd64
(1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4),
apt-utils:amd64 (1.2.15ubuntu0.2, 1.2.18), libdns162:amd64
(1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4),
libisccc140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3,
1:9.10.3.dfsg.P4-8ubuntu1.4), apt-transport-https:amd64
(1.2.15ubuntu0.2, 1.2.18), libbind9-140:amd64
(1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4),
update-notifier:amd64 (3.168.2, 3.168.3)
ID: 1841972 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1841976 - Posted: 13 Jan 2017, 20:15:54 UTC - in response to Message 1841968.  

Wait. I just realized: Raistmer, I don't understand your
#define line means for me to do/try.
ID: 1841976 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1841977 - Posted: 13 Jan 2017, 20:18:40 UTC - in response to Message 1841976.  

Wait. I just realized: Raistmer, I don't understand your
#define line means for me to do/try.


It decodes -7 error.
Now your OpenCL runtime lack of profiling support.
It's abnormal, especially if it was before.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1841977 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1841988 - Posted: 13 Jan 2017, 21:14:37 UTC
Last modified: 13 Jan 2017, 21:15:16 UTC

Problem could be the kernel update. Eventually rebuilding the .wisdom and .bin files in your projects/setiatome.berkeley.edu folder could help.

If not, eventually a complete shutdown, wait and restart cycle will reinitialize the OpenCL driver correctly, at least that worked for me.
_\|/_
U r s
ID: 1841988 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1841995 - Posted: 13 Jan 2017, 21:53:05 UTC - in response to Message 1841977.  

The last *SoG gpu task that ran ok was Dec 16, 2016,
looking at pending tasks. *SoG tasks in pending
show the -7 error.
*sah gpu tasks seem ok still.\

Looking at valid tasks changes that picture:
task 5423147684 wu 2390938683 on Jan 11, 2017
for example, is a *SoG that
completed just fine on the problem host 7748035.

my other systems not having trouble AFAIK.
ID: 1841995 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1842014 - Posted: 13 Jan 2017, 22:55:01 UTC - in response to Message 1841988.  
Last modified: 13 Jan 2017, 22:57:36 UTC

Problem could be the kernel update. Eventually rebuilding the .wisdom and .bin files in your projects/setiatome.berkeley.edu folder could help.

If not, eventually a complete shutdown, wait and restart cycle will reinitialize the OpenCL driver correctly, at least that worked for me.


I don't understand what I could do with the *bin* files (96 of them in the directory) or the *wisdom* files (6 of them present).
Some are newish Jan 8 2017 but some from as far back as 2015.

I did shutdown, wait, and restart earlier today and an *SoG graphics task started and went away in seconds (and will likely
show up in Invalid at some point today).
ID: 1842014 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1842035 - Posted: 14 Jan 2017, 0:15:18 UTC - in response to Message 1842014.  


I don't understand what I could do with the *bin* files (96 of them in the directory) or the *wisdom* files (6 of them present).

delete them all
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1842035 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1842073 - Posted: 14 Jan 2017, 3:23:36 UTC

Deleted the bin and wisdom files.
It's too soon to be sure, but the SoG tasks just going away seems to have stopped,
so I hope things are back to normal good results.
ID: 1842073 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1842181 - Posted: 14 Jan 2017, 16:05:31 UTC

opencl_nvidia_SoG tasks getting the same error, still.
opencl_nvidia_sah tasks working ok.
ID: 1842181 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1842272 - Posted: 14 Jan 2017, 22:14:33 UTC - in response to Message 1842181.  

reboot as Urs suggested - helped?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1842272 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1842292 - Posted: 14 Jan 2017, 23:34:51 UTC

Not sure I did things in the right order yesterday.
So just now: Jan 14, 2017 3:28PM PST
I deleted wisdom and bin files (there were not many, all new)
while all tasks suspended
and rebooted. We'll see how things go now.
ID: 1842292 · Report as offensive
jnamath

Send message
Joined: 5 Jan 16
Posts: 4
Credit: 4,659,687
RAC: 56
Germany
Message 1842749 - Posted: 17 Jan 2017, 10:48:43 UTC - in response to Message 1842292.  

Just to add to the discussion:
I got the same error on 6 opencl_nvivia_SoG tasks yesterday. However, the system has been putting out many valid tasks of the same type before and after and continues to do so. I have not touched the system at all.
Running Ubuntu Linux.

Cheers
ID: 1842749 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1842839 - Posted: 18 Jan 2017, 8:38:30 UTC - in response to Message 1842749.  
Last modified: 18 Jan 2017, 8:48:14 UTC

Thanks for report. Such transient error could mean issue with app itself.
Will look closely on that area.

EDIT: there is third class of results for that host - the ones with -7 error but been able to complete after restart from that error. Insuch case result validates OK.
So, the place in processing sequence sometimes can be passed for relatively few attempts sometimes almost impassable (or there are lot of such points per single task),

There are results w/o restarts at all also. And AR value doesn't look drastically different for all these cases to trigger different behavior.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1842839 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1842932 - Posted: 18 Jan 2017, 22:01:18 UTC

I didn't spot any issues in code so far. That is, it's marked as driver/config/card issue still.
But this kind of error allows normal processing after recover so in revision 3602 this error will be recoverable one, w/o process termination.
App will just inform about error in stderr and continue.
So far no Windows-based app reported such issue so I will not rebuild Windows binaries.
Maybe Urs could provide standalone updated binary for Linux hosts with drivers/cards affected by this profiling abilities loss.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1842932 · Report as offensive
Profile David Anderson (not *that* DA) Project Donor
Avatar

Send message
Joined: 5 Dec 09
Posts: 215
Credit: 74,008,558
RAC: 74
United States
Message 1845939 - Posted: 2 Feb 2017, 15:27:50 UTC

On Jan 28 the profiling error returned (I just noticed today).
Five errors over 3 days.

There were a few Ubuntu updates (some requiring reboot)
recently though none were the kernel itself.

I just now removed the bin and wisdom files and rebooted.
It made the problem vanish for two weeks (Jan 14-28) before,
so I hope the problem will be gone again. My other two machines
(with very different hardware) have not had this problem.
ID: 1845939 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1845995 - Posted: 2 Feb 2017, 21:56:18 UTC - in response to Message 1845939.  

In the meantime i've build a newer version with Raistmers changes and running offline tests. Looks ok so far.
The ocl detection problem did not reappear on the first few repetitions of the loop.
I'll going to run tests for another week to see if this sporadic problem will reappear also with the newer version.
_\|/_
U r s
ID: 1845995 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1845996 - Posted: 2 Feb 2017, 21:59:08 UTC

Hi,

I'm running linux and NVIDIA cuda, but... I've experienced random errors when using drivers 376-8.yyy. (Cannot get GPU count or similar).

The OpenCL 'may' use the same libraries as CUDA during compilation and when executing code and allocating resources.


--
p.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1845996 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : OpenCL kernel/call clGetEventProfilingInfo call failed


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.