Message boards :
Number crunching :
OpenCL kernel/call clGetEventProfilingInfo call failed
Message board moderation
Author | Message |
---|---|
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
After recent ubuntu update I get ERROR: OpenCL kernel/call 'clGetEventProfilingInfo' call failed (-7) in file ../../src/GPU_lock.cpp near line 550. on task: https://setiathome.berkeley.edu/result.php?resultid=5423147715 and many others today. SETI@home v8 v8.22 (opencl_nvidia_SoG) x86_64-pc-linux-gnu Been fine on the GPU till today. (there was a security related update yesterday, I recall vaguely). Suspended Seti on this machine till I figure out something. (This is with stock apps) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
#define CL_PROFILING_INFO_NOT_AVAILABLE -7 reboot or rollback. SETI apps news We're not gonna fight them. We're gonna transcend them. |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
Rebooted and the first opencl_nvidia_SoG task finished in seconds and I'm sure that will show up as an error before long. I'm unsure how to roll recent updates back. I'll look at the logs and see what I can see. |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
Here are the recent changes (lines folded so viewing a tiny bit easier). Start-Date: 2017-01-07 13:27:18 Commandline: apt-get install libgnome-keyring-dev build-essential Requested-By: davea (1000) Start-Date: 2017-01-10 15:30:53 Commandline: aptdaemon role='role-commit-packages' sender=':1.164' Install: linux-image-4.4.0-59-generic:amd64 (4.4.0-59.80, automatic), linux-image-extra-4.4.0-59-generic:amd64 (4.4.0-59.80, automatic), linux-headers-4.4.0-59:amd64 (4.4.0-59.80, automatic), linux-headers-4.4.0-59-generic:amd64 (4.4.0-59.80, automatic) Start-Date: 2017-01-12 07:29:21 Commandline: aptdaemon role='role-commit-packages' sender=':1.66' Upgrade: libvncclient1:amd64 (0.9.10+dfsg-3build1, 0.9.10+dfsg-3ubuntu0.16.04.1) Start-Date: 2017-01-13 08:11:56 Commandline: aptdaemon role='role-commit-packages' sender=':1.66' Upgrade: libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), libisccfg140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), libapt-inst2.0:amd64 (1.2.15ubuntu0.2, 1.2.18), update-notifier-common:amd64 (3.168.2, 3.168.3), apt:amd64 (1.2.15ubuntu0.2, 1.2.18), bind9-host:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), dnsutils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), libisc160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), libapt-pkg5.0:amd64 (1.2.15ubuntu0.2, 1.2.18), libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), liblwres141:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), apt-utils:amd64 (1.2.15ubuntu0.2, 1.2.18), libdns162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), libisccc140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), apt-transport-https:amd64 (1.2.15ubuntu0.2, 1.2.18), libbind9-140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.3, 1:9.10.3.dfsg.P4-8ubuntu1.4), update-notifier:amd64 (3.168.2, 3.168.3) |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
Wait. I just realized: Raistmer, I don't understand your #define line means for me to do/try. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Wait. I just realized: Raistmer, I don't understand your It decodes -7 error. Now your OpenCL runtime lack of profiling support. It's abnormal, especially if it was before. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
Problem could be the kernel update. Eventually rebuilding the .wisdom and .bin files in your projects/setiatome.berkeley.edu folder could help. If not, eventually a complete shutdown, wait and restart cycle will reinitialize the OpenCL driver correctly, at least that worked for me. _\|/_ U r s |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
The last *SoG gpu task that ran ok was Dec 16, 2016, looking at pending tasks. *SoG tasks in pending show the -7 error. *sah gpu tasks seem ok still.\ Looking at valid tasks changes that picture: task 5423147684 wu 2390938683 on Jan 11, 2017 for example, is a *SoG that completed just fine on the problem host 7748035. my other systems not having trouble AFAIK. |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
Problem could be the kernel update. Eventually rebuilding the .wisdom and .bin files in your projects/setiatome.berkeley.edu folder could help. I don't understand what I could do with the *bin* files (96 of them in the directory) or the *wisdom* files (6 of them present). Some are newish Jan 8 2017 but some from as far back as 2015. I did shutdown, wait, and restart earlier today and an *SoG graphics task started and went away in seconds (and will likely show up in Invalid at some point today). |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
delete them all SETI apps news We're not gonna fight them. We're gonna transcend them. |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
Deleted the bin and wisdom files. It's too soon to be sure, but the SoG tasks just going away seems to have stopped, so I hope things are back to normal good results. |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
opencl_nvidia_SoG tasks getting the same error, still. opencl_nvidia_sah tasks working ok. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
reboot as Urs suggested - helped? SETI apps news We're not gonna fight them. We're gonna transcend them. |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
Not sure I did things in the right order yesterday. So just now: Jan 14, 2017 3:28PM PST I deleted wisdom and bin files (there were not many, all new) while all tasks suspended and rebooted. We'll see how things go now. |
jnamath Send message Joined: 5 Jan 16 Posts: 4 Credit: 4,659,687 RAC: 56 |
Just to add to the discussion: I got the same error on 6 opencl_nvivia_SoG tasks yesterday. However, the system has been putting out many valid tasks of the same type before and after and continues to do so. I have not touched the system at all. Running Ubuntu Linux. Cheers |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Thanks for report. Such transient error could mean issue with app itself. Will look closely on that area. EDIT: there is third class of results for that host - the ones with -7 error but been able to complete after restart from that error. Insuch case result validates OK. So, the place in processing sequence sometimes can be passed for relatively few attempts sometimes almost impassable (or there are lot of such points per single task), There are results w/o restarts at all also. And AR value doesn't look drastically different for all these cases to trigger different behavior. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I didn't spot any issues in code so far. That is, it's marked as driver/config/card issue still. But this kind of error allows normal processing after recover so in revision 3602 this error will be recoverable one, w/o process termination. App will just inform about error in stderr and continue. So far no Windows-based app reported such issue so I will not rebuild Windows binaries. Maybe Urs could provide standalone updated binary for Linux hosts with drivers/cards affected by this profiling abilities loss. SETI apps news We're not gonna fight them. We're gonna transcend them. |
David Anderson (not *that* DA) Send message Joined: 5 Dec 09 Posts: 215 Credit: 74,008,558 RAC: 74 |
On Jan 28 the profiling error returned (I just noticed today). Five errors over 3 days. There were a few Ubuntu updates (some requiring reboot) recently though none were the kernel itself. I just now removed the bin and wisdom files and rebooted. It made the problem vanish for two weeks (Jan 14-28) before, so I hope the problem will be gone again. My other two machines (with very different hardware) have not had this problem. |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
In the meantime i've build a newer version with Raistmers changes and running offline tests. Looks ok so far. The ocl detection problem did not reappear on the first few repetitions of the loop. I'll going to run tests for another week to see if this sporadic problem will reappear also with the newer version. _\|/_ U r s |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, I'm running linux and NVIDIA cuda, but... I've experienced random errors when using drivers 376-8.yyy. (Cannot get GPU count or similar). The OpenCL 'may' use the same libraries as CUDA during compilation and when executing code and allocating resources. -- p. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.