Optimized CUDA Issues & '-12 Unknown error'

Author	Message
Lint trap Send message Joined: 30 May 03 Posts: 871 Credit: 28,092,319 RAC: 0	Message 923262 - Posted: 3 Aug 2009, 0:46:22 UTC - in response to Message 922727. Last modified: 3 Aug 2009, 0:50:16 UTC AFAIK, - Unhandled Exception Record - Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7C812AFB - errors are the same like: -12 Unknown error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel - errors, but the app have a little BUG in the error detection/description. Nothing to worry about, I have a lot of this errors.. EDIT: BTW. I would update the nVIDIA_driver to _190.38 also would take the CUDA_V2.3 .dll's with Raistmer's new CUDA_V12_app. Look in the CUDA area here and at the lunatics crew site. Yes, I saw the Out of Memory text. That happens, no problem. What was unusual is the error report is truncated. Maybe new debugger code or another problem. The video card is fine, the pc is fine - no restarts or anything. I thought somebody might want to know about the truncated error report. Thanks for your input. Martin [edited/] ID: 923262 ·

mr.mac52 Send message Joined: 18 Mar 03 Posts: 67 Credit: 245,882,461 RAC: 0	Message 923347 - Posted: 3 Aug 2009, 12:46:22 UTC http://setiathome.berkeley.edu/workunit.php?wuid=487147003 WU true angle range is : 0.381546 After app init: total GPU memory 671088640 free GPU memory 538587136 Exception detected inside cudaAcc_find_triplets, dumping client state icfft=196155, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel File: ..\analyzePoT.cpp Line: 348 ID: 923347 ·

Fred W Volunteer tester Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0	Message 923348 - Posted: 3 Aug 2009, 12:49:38 UTC - in response to Message 923347. http://setiathome.berkeley.edu/workunit.php?wuid=487147003 WU true angle range is : 0.381546 After app init: total GPU memory 671088640 free GPU memory 538587136 Exception detected inside cudaAcc_find_triplets, dumping client state icfft=196155, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel File: ..\analyzePoT.cpp Line: 348 A realtively frequent problem that has not been tracked down yet as far as I know. Just ignore - not a problem your end. F. ID: 923348 ·

mr.mac52 Send message Joined: 18 Mar 03 Posts: 67 Credit: 245,882,461 RAC: 0	Message 923349 - Posted: 3 Aug 2009, 12:51:02 UTC Thanks Fred! ID: 923349 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 923363 - Posted: 3 Aug 2009, 15:44:59 UTC - in response to Message 923348. http://setiathome.berkeley.edu/workunit.php?wuid=487147003 WU true angle range is : 0.381546 After app init: total GPU memory 671088640 free GPU memory 538587136 Exception detected inside cudaAcc_find_triplets, dumping client state icfft=196155, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel File: ..\analyzePoT.cpp Line: 348 A realtively frequent problem that has not been tracked down yet as far as I know. Just ignore - not a problem your end. F. It's actually thoroughly understood, a design decision was made that the find_triplets_kernel running on the GPU would bail out if more than one triplet was found in the array it was checking. It sets a flag and quits, the comment attached is: // Reporting Error, more than one result per PoT, redo the calculations on CPU When the part of the CUDA code which runs on the CPU sees the flag, rather than trying to redo the calculation, it does: SETIERROR(UNSUPPORTED_FUNCTION, "cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel"); The "UNSUPPORTED_FUNCTION" becomes an exit code of -12 which BOINC shows as unknown. As has been seen, occurrence of a triplet which is actually part of a quadruplet or one of the other patterns which can cause more than one triplet in a PoT is fairly rare. But given hundreds of thousands of tasks a day it happens often enough that the design decision has been proven bad. If the VLAR problem didn't exist perhaps this lesser issue would have been cleaned up by now. Joe ID: 923363 ·

Fred W Volunteer tester Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0	Message 923371 - Posted: 3 Aug 2009, 16:19:20 UTC - in response to Message 923363. My apologies, Joe. I had not seen any explanation of the cause of this previously (I obviously don't frequent the right Boards). F. ID: 923371 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 923391 - Posted: 3 Aug 2009, 17:36:20 UTC - in response to Message 923371. My apologies, Joe. I had not seen any explanation of the cause of this previously (I obviously don't frequent the right Boards). F. No apology needed, I meant that those working on the code "thoroughly understood" the logic. I may have posted a similar analysis at SETI Beta when the issue was first seen, or in the public forum area at Lunatics later, but I don't remember for sure. I don't know whether it has come up in the CUDA Q&A forums, those aren't in my reading list. I think it's very sensible to report occurrences as they're noticed, it gives a rough idea of how often they happen and keeps some awareness of the issue alive. I suppose Eric's todo list has a relevant entry, but it's unlikely to reach top priority on that list anytime soon. Joe ID: 923391 ·

Anthony Byrnes Send message Joined: 27 Jul 09 Posts: 2 Credit: 132,591 RAC: 0	Message 924466 - Posted: 8 Aug 2009, 1:27:03 UTC Hi, Perhaps someone can help me. I dont know if I am in the right place. Config is Windows XP SP3, Cuda card, driver 190.38, Cuda version 2.3. My CUDA crunching was working fine. In last day or so, gone absolutely haywire. If I allow CUDA to crunch, then explorer.exe goes superbusy and machine no longer usable. If I suspend SETI, the machine goes back to normal immediately, if I RESUME it dies when explorer.exe goes superbusy. I can do this ad nauseum. Same result.Definitely, guaranteed caused by CUDA app. Question, has anything changed in last day or so ? I have rebooted machine to be clean. Same result. I have now aborted all samples until I can find some info on this. Would it pay me to detach and re-attach ? Get a new verion of CUDA app ? Anyone have info on this problem. I see others have had similar problems. Regards, Anthony. ID: 924466 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 924496 - Posted: 8 Aug 2009, 4:01:07 UTC - in response to Message 924466. You should do a forum search on VLAR (Very Low Angle Range). GruÃŸ, Gundolf ID: 924496 ·

Fred W Volunteer tester Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0	Message 924525 - Posted: 8 Aug 2009, 6:39:48 UTC - in response to Message 924496. You should do a forum search on VLAR (Very Low Angle Range). GruÃŸ, Gundolf But he is using VLARkill... F. ID: 924525 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 924527 - Posted: 8 Aug 2009, 6:56:46 UTC - in response to Message 924525. Last modified: 8 Aug 2009, 6:58:19 UTC But he is using VLARkill... I don't see where you're seeing that. In the results all I can see is manual aborts, of stock application... Lunatics apps via installer + optional rebranding VLAR tasks to CPU is my suggestion. Then at least we could see how much memory is free on that 256MiB card, and work out if there are any further issues. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 924527 ·

Fred W Volunteer tester Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0	Message 924537 - Posted: 8 Aug 2009, 7:41:28 UTC - in response to Message 924527. Aarghhh! Mea culpa. Sorry. That's what comes of browsing the Boards at ridiculous hours because you can't sleep... F. ID: 924537 ·

Si Send message Joined: 26 May 09 Posts: 3 Credit: 101,569 RAC: 0	Message 924874 - Posted: 9 Aug 2009, 8:21:18 UTC Hi, Can anyone help me with this error, I'm geting a quite a lot of them. i7 920 295 GTX 190.56 and 190.38 W7 RC1 x64 <core_client_version>6.6.38</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1 : GeForce GTX 295 totalGlobalMem = 939524096 sharedMemPerBlock = 16384 regsPerBlock = 16384 warpSize = 32 memPitch = 262144 maxThreadsPerBlock = 512 clockRate = 1242000 totalConstMem = 65536 major = 1 minor = 3 textureAlignment = 256 deviceOverlap = 1 multiProcessorCount = 30 Device 2 : GeForce GTX 295 totalGlobalMem = 939524096 sharedMemPerBlock = 16384 regsPerBlock = 16384 warpSize = 32 memPitch = 262144 maxThreadsPerBlock = 512 clockRate = 1242000 totalConstMem = 65536 major = 1 minor = 3 textureAlignment = 256 deviceOverlap = 1 multiProcessorCount = 30 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 295 is okay SETI@home using CUDA accelerated device GeForce GTX 295 V12 modification by Raistmer Priority of worker thread rised successfully Priority of process adjusted successfully Total GPU memory 939524096 free GPU memory 821575680 setiathome_enhanced 6.02 Visual Studio/Microsoft C++ Build features: Non-graphics CUDA VLAR autokill enabled FFTW USE_SSE x86 CPUID: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 libboinc: 6.3.22 Work Unit Info: ............... WU true angle range is : 0.432403 After app init: total GPU memory 939524096 free GPU memory 773079040 Cuda error 'GetFixedPoT_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 487 : unknown error. Cuda error 'NormalizePoT_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 496 : unknown error. Cuda error 'NormalizePoT_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 496 : unknown error. Cuda error 'cudaMemset(dev_flag, 0, sizeof(*dev_flag))' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 499 : unknown error. ID: 924874 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 924961 - Posted: 9 Aug 2009, 17:30:27 UTC - in response to Message 924874. Last modified: 9 Aug 2009, 17:32:21 UTC Hi, Can anyone help me with this error, I'm geting a quite a lot of them. i7 920 295 GTX 190.56 and 190.38 W7 RC1 x64 What Cuda dll's are you running with V12VlarKill?, Cuda 2.0, 2.1, 2.2 or 2.3? and where did you get the V12 app from?, did you download it on it's own?, or part of one of the installer packages?, if so which version? Claggy ID: 924961 ·

Si Send message Joined: 26 May 09 Posts: 3 Credit: 101,569 RAC: 0	Message 925099 - Posted: 10 Aug 2009, 7:49:09 UTC - in response to Message 924961. Hi, I'm using 2.3 dll's and I used the Lunatics_Win64v0.2_(SSE3+)_AP505r168_AKv8bx64_CudaV12 unified installer. ID: 925099 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 925207 - Posted: 10 Aug 2009, 19:35:50 UTC - in response to Message 925099. Last modified: 10 Aug 2009, 19:40:51 UTC I had a better look at your results today, can't say that you have a lot of error's, quite a few are VLAR killed, a couple don't give any useful info, and only four have cuda errors in them, two of those are timeout errors, and the other two just say unknown. The reason i asked about the Cuda dll's used, is that your your CPU times for Cuda wu's are about double my times, and i think the Cuda 2.3 dll's are supposed to reduce CPU usage, you did drop the dll's in your setiathome project folder, over writing the one's there already, and not the Boinc Data folder?, or it could be just the difference between my 4Ghz dual core and your Hyperthreaded i7, You could use ReSchedule1.9 to rebrand any GPU VLAR tasks to the CPU so they don't get aborted. Claggy ID: 925207 ·

Si Send message Joined: 26 May 09 Posts: 3 Credit: 101,569 RAC: 0	Message 925328 - Posted: 11 Aug 2009, 7:00:30 UTC - in response to Message 925207. Hi, Thanks for the help, I put the .dll's in the right folder. I loked at my results to and I seem to be missing a lot of error results, the wu would run right to the end so not VLAR killed and just before upload the nvidia driver would crash and restart the wu would then show computation error, I was geting 4 to 5 a day. I reinstalled W7 yesterday and so far OK. Thanks, S ID: 925328 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 925330 - Posted: 11 Aug 2009, 7:28:31 UTC - in response to Message 909296. In the old thread of Raistmer it was well to post the 'cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel - errors'. [...] If you would like to help the opt. crew.. please post the '-12 Unknown error'.. If you let run CUDA WUs.. look in your PC/tasks overview.. click on 'error' and look to the 'CPU time'. If you let run Raistmer's CUDA app, the 'VLAR kill' would/could be identified with ~ 1 sec. CPU time. This results aren't for interesting. The '-12 Unknown error' happen in the calculation of the WU, so some sec. CPU time is shown. Then click to the Task ID and copy/paste the part of the <stderr_txt>. It could look like this: Exception detected inside cudaAcc_find_triplets, dumping client state icfft=98384, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel File: ..\analyzePoT.cpp Line: 348 And only the [bolded] line is needed. ... I had a small look at my GPU cruncher results and found some.. http://setiathome.berkeley.edu/result.php?resultid=1324644157 icfft=81939, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1324195175 icfft=140743, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1324030165 icfft=164295, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323750030 icfft=177393, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323647918 icfft=119484, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323350002 icfft=130527, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323203454 icfft=106857, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323167228 icfft=170054, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323115634 icfft=155657, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1321624734 icfft=144759, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error ID: 925330 ·

samuel7 Volunteer tester Send message Joined: 2 Jan 00 Posts: 47 Credit: 2,194,240 RAC: 0	Message 925612 - Posted: 12 Aug 2009, 17:48:41 UTC Here's a new -12 error. So new I haven't managed to upload let alone report it yet. http://setiathome.berkeley.edu/result.php?resultid=1329923998 icfft=103904, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error One wingmate shows the same: http://setiathome.berkeley.edu/result.php?resultid=1328586677 icfft=65575, PoT_activity=0, PoT_fre [stderr_txt truncated] ID: 925612 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 925694 - Posted: 12 Aug 2009, 22:39:12 UTC - in response to Message 925612. Here's a new -12 error. So new I haven't managed to upload let alone report it yet. http://setiathome.berkeley.edu/result.php?resultid=1329923998 icfft=103904, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error One wingmate shows the same: http://setiathome.berkeley.edu/result.php?resultid=1328586677 icfft=65575, PoT_activity=0, PoT_fre [stderr_txt truncated] And the successful result on CPU shows result_overflow on triplets, so again it suggests the CUDA app should be rewritten to be able to report more than one triplet per array. Joe ID: 925694 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.