Message boards :
Number crunching :
Optimized CUDA Issues & '-12 Unknown error'
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Lint trap Send message Joined: 30 May 03 Posts: 871 Credit: 28,092,319 RAC: 0 |
Yes, I saw the Out of Memory text. That happens, no problem. What was unusual is the error report is truncated. Maybe new debugger code or another problem. The video card is fine, the pc is fine - no restarts or anything. I thought somebody might want to know about the truncated error report. Thanks for your input. Martin [edited/] |
mr.mac52 Send message Joined: 18 Mar 03 Posts: 67 Credit: 245,882,461 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=487147003 WU true angle range is : 0.381546 After app init: total GPU memory 671088640 free GPU memory 538587136 Exception detected inside cudaAcc_find_triplets, dumping client state icfft=196155, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel File: ..\analyzePoT.cpp Line: 348 |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=487147003 A realtively frequent problem that has not been tracked down yet as far as I know. Just ignore - not a problem your end. F. |
mr.mac52 Send message Joined: 18 Mar 03 Posts: 67 Credit: 245,882,461 RAC: 0 |
Thanks Fred! |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=487147003 It's actually thoroughly understood, a design decision was made that the find_triplets_kernel running on the GPU would bail out if more than one triplet was found in the array it was checking. It sets a flag and quits, the comment attached is: // Reporting Error, more than one result per PoT, redo the calculations on CPU When the part of the CUDA code which runs on the CPU sees the flag, rather than trying to redo the calculation, it does: SETIERROR(UNSUPPORTED_FUNCTION, "cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel"); The "UNSUPPORTED_FUNCTION" becomes an exit code of -12 which BOINC shows as unknown. As has been seen, occurrence of a triplet which is actually part of a quadruplet or one of the other patterns which can cause more than one triplet in a PoT is fairly rare. But given hundreds of thousands of tasks a day it happens often enough that the design decision has been proven bad. If the VLAR problem didn't exist perhaps this lesser issue would have been cleaned up by now. Joe |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
My apologies, Joe. I had not seen any explanation of the cause of this previously (I obviously don't frequent the right Boards). F. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
My apologies, Joe. I had not seen any explanation of the cause of this previously (I obviously don't frequent the right Boards). No apology needed, I meant that those working on the code "thoroughly understood" the logic. I may have posted a similar analysis at SETI Beta when the issue was first seen, or in the public forum area at Lunatics later, but I don't remember for sure. I don't know whether it has come up in the CUDA Q&A forums, those aren't in my reading list. I think it's very sensible to report occurrences as they're noticed, it gives a rough idea of how often they happen and keeps some awareness of the issue alive. I suppose Eric's todo list has a relevant entry, but it's unlikely to reach top priority on that list anytime soon. Joe |
Anthony Byrnes Send message Joined: 27 Jul 09 Posts: 2 Credit: 132,591 RAC: 0 |
Hi, Perhaps someone can help me. I dont know if I am in the right place. Config is Windows XP SP3, Cuda card, driver 190.38, Cuda version 2.3. My CUDA crunching was working fine. In last day or so, gone absolutely haywire. If I allow CUDA to crunch, then explorer.exe goes superbusy and machine no longer usable. If I suspend SETI, the machine goes back to normal immediately, if I RESUME it dies when explorer.exe goes superbusy. I can do this ad nauseum. Same result.Definitely, guaranteed caused by CUDA app. Question, has anything changed in last day or so ? I have rebooted machine to be clean. Same result. I have now aborted all samples until I can find some info on this. Would it pay me to detach and re-attach ? Get a new verion of CUDA app ? Anyone have info on this problem. I see others have had similar problems. Regards, Anthony. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
You should do a forum search on VLAR (Very Low Angle Range). Gruß, Gundolf |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
You should do a forum search on VLAR (Very Low Angle Range). But he is using VLARkill... F. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
But he is using VLARkill... I don't see where you're seeing that. In the results all I can see is manual aborts, of stock application... Lunatics apps via installer + optional rebranding VLAR tasks to CPU is my suggestion. Then at least we could see how much memory is free on that 256MiB card, and work out if there are any further issues. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Aarghhh! Mea culpa. Sorry. That's what comes of browsing the Boards at ridiculous hours because you can't sleep... F. |
Si Send message Joined: 26 May 09 Posts: 3 Credit: 101,569 RAC: 0 |
Hi, Can anyone help me with this error, I'm geting a quite a lot of them. i7 920 295 GTX 190.56 and 190.38 W7 RC1 x64 <core_client_version>6.6.38</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1 : GeForce GTX 295 totalGlobalMem = 939524096 sharedMemPerBlock = 16384 regsPerBlock = 16384 warpSize = 32 memPitch = 262144 maxThreadsPerBlock = 512 clockRate = 1242000 totalConstMem = 65536 major = 1 minor = 3 textureAlignment = 256 deviceOverlap = 1 multiProcessorCount = 30 Device 2 : GeForce GTX 295 totalGlobalMem = 939524096 sharedMemPerBlock = 16384 regsPerBlock = 16384 warpSize = 32 memPitch = 262144 maxThreadsPerBlock = 512 clockRate = 1242000 totalConstMem = 65536 major = 1 minor = 3 textureAlignment = 256 deviceOverlap = 1 multiProcessorCount = 30 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 295 is okay SETI@home using CUDA accelerated device GeForce GTX 295 V12 modification by Raistmer Priority of worker thread rised successfully Priority of process adjusted successfully Total GPU memory 939524096 free GPU memory 821575680 setiathome_enhanced 6.02 Visual Studio/Microsoft C++ Build features: Non-graphics CUDA VLAR autokill enabled FFTW USE_SSE x86 CPUID: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 libboinc: 6.3.22 Work Unit Info: ............... WU true angle range is : 0.432403 After app init: total GPU memory 939524096 free GPU memory 773079040 Cuda error 'GetFixedPoT_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 487 : unknown error. Cuda error 'NormalizePoT_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 496 : unknown error. Cuda error 'NormalizePoT_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 496 : unknown error. Cuda error 'cudaMemset(dev_flag, 0, sizeof(*dev_flag))' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 499 : unknown error. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Hi, What Cuda dll's are you running with V12VlarKill?, Cuda 2.0, 2.1, 2.2 or 2.3? and where did you get the V12 app from?, did you download it on it's own?, or part of one of the installer packages?, if so which version? Claggy |
Si Send message Joined: 26 May 09 Posts: 3 Credit: 101,569 RAC: 0 |
Hi, I'm using 2.3 dll's and I used the Lunatics_Win64v0.2_(SSE3+)_AP505r168_AKv8bx64_CudaV12 unified installer. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I had a better look at your results today, can't say that you have a lot of error's, quite a few are VLAR killed, a couple don't give any useful info, and only four have cuda errors in them, two of those are timeout errors, and the other two just say unknown. The reason i asked about the Cuda dll's used, is that your your CPU times for Cuda wu's are about double my times, and i think the Cuda 2.3 dll's are supposed to reduce CPU usage, you did drop the dll's in your setiathome project folder, over writing the one's there already, and not the Boinc Data folder?, or it could be just the difference between my 4Ghz dual core and your Hyperthreaded i7, You could use ReSchedule1.9 to rebrand any GPU VLAR tasks to the CPU so they don't get aborted. Claggy |
Si Send message Joined: 26 May 09 Posts: 3 Credit: 101,569 RAC: 0 |
Hi, Thanks for the help, I put the .dll's in the right folder. I loked at my results to and I seem to be missing a lot of error results, the wu would run right to the end so not VLAR killed and just before upload the nvidia driver would crash and restart the wu would then show computation error, I was geting 4 to 5 a day. I reinstalled W7 yesterday and so far OK. Thanks, S |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
In the old thread of Raistmer it was well to post the 'cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel - errors'. I had a small look at my GPU cruncher results and found some.. http://setiathome.berkeley.edu/result.php?resultid=1324644157 icfft=81939, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1324195175 icfft=140743, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1324030165 icfft=164295, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323750030 icfft=177393, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323647918 icfft=119484, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323350002 icfft=130527, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323203454 icfft=106857, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323167228 icfft=170054, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1323115634 icfft=155657, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error http://setiathome.berkeley.edu/result.php?resultid=1321624734 icfft=144759, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error |
samuel7 Send message Joined: 2 Jan 00 Posts: 47 Credit: 2,194,240 RAC: 0 |
Here's a new -12 error. So new I haven't managed to upload let alone report it yet. http://setiathome.berkeley.edu/result.php?resultid=1329923998 icfft=103904, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error One wingmate shows the same: http://setiathome.berkeley.edu/result.php?resultid=1328586677 icfft=65575, PoT_activity=0, PoT_fre [stderr_txt truncated] |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Here's a new -12 error. So new I haven't managed to upload let alone report it yet. And the successful result on CPU shows result_overflow on triplets, so again it suggests the CUDA app should be rewritten to be able to report more than one triplet per array. Joe |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.