Optimized CUDA Issues & '-12 Unknown error'


log in

Advanced search

Message boards : Number crunching : Optimized CUDA Issues & '-12 Unknown error'

Previous · 1 · 2 · 3 · 4 · Next
Author Message
Profile Lint trap
Send message
Joined: 30 May 03
Posts: 871
Credit: 28,060,519
RAC: 12,220
United States
Message 923262 - Posted: 3 Aug 2009, 0:46:22 UTC - in response to Message 922727.
Last modified: 3 Aug 2009, 0:50:16 UTC


AFAIK, - Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7C812AFB

- errors are the same like: -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel

- errors, but the app have a little BUG in the error detection/description.

Nothing to worry about, I have a lot of this errors..


EDIT:
BTW.
I would update the nVIDIA_driver to _190.38 also would take the CUDA_V2.3 .dll's with Raistmer's new CUDA_V12_app.
Look in the CUDA area here and at the lunatics crew site.


Yes, I saw the Out of Memory text. That happens, no problem.

What was unusual is the error report is truncated. Maybe new debugger code or another problem. The video card is fine, the pc is fine - no restarts or anything.

I thought somebody might want to know about the truncated error report.

Thanks for your input.

Martin

[edited/]

Profile mr.mac52
Avatar
Send message
Joined: 18 Mar 03
Posts: 30
Credit: 89,680,237
RAC: 83,044
United States
Message 923347 - Posted: 3 Aug 2009, 12:46:22 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=487147003

WU true angle range is : 0.381546
After app init: total GPU memory 671088640 free GPU memory 538587136
Exception detected inside cudaAcc_find_triplets, dumping client state
icfft=196155, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: ..\analyzePoT.cpp
Line: 348


____________

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 923348 - Posted: 3 Aug 2009, 12:49:38 UTC - in response to Message 923347.

http://setiathome.berkeley.edu/workunit.php?wuid=487147003

WU true angle range is : 0.381546
After app init: total GPU memory 671088640 free GPU memory 538587136
Exception detected inside cudaAcc_find_triplets, dumping client state
icfft=196155, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: ..\analyzePoT.cpp
Line: 348


A realtively frequent problem that has not been tracked down yet as far as I know. Just ignore - not a problem your end.

F.
____________

Profile mr.mac52
Avatar
Send message
Joined: 18 Mar 03
Posts: 30
Credit: 89,680,237
RAC: 83,044
United States
Message 923349 - Posted: 3 Aug 2009, 12:51:02 UTC

Thanks Fred!
____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4346
Credit: 1,123,775
RAC: 740
United States
Message 923363 - Posted: 3 Aug 2009, 15:44:59 UTC - in response to Message 923348.

http://setiathome.berkeley.edu/workunit.php?wuid=487147003

WU true angle range is : 0.381546
After app init: total GPU memory 671088640 free GPU memory 538587136
Exception detected inside cudaAcc_find_triplets, dumping client state
icfft=196155, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: ..\analyzePoT.cpp
Line: 348


A realtively frequent problem that has not been tracked down yet as far as I know. Just ignore - not a problem your end.

F.

It's actually thoroughly understood, a design decision was made that the find_triplets_kernel running on the GPU would bail out if more than one triplet was found in the array it was checking. It sets a flag and quits, the comment attached is:

// Reporting Error, more than one result per PoT, redo the calculations on CPU

When the part of the CUDA code which runs on the CPU sees the flag, rather than trying to redo the calculation, it does:

SETIERROR(UNSUPPORTED_FUNCTION, "cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel");

The "UNSUPPORTED_FUNCTION" becomes an exit code of -12 which BOINC shows as unknown.

As has been seen, occurrence of a triplet which is actually part of a quadruplet or one of the other patterns which can cause more than one triplet in a PoT is fairly rare. But given hundreds of thousands of tasks a day it happens often enough that the design decision has been proven bad. If the VLAR problem didn't exist perhaps this lesser issue would have been cleaned up by now.
Joe

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 923371 - Posted: 3 Aug 2009, 16:19:20 UTC - in response to Message 923363.

My apologies, Joe. I had not seen any explanation of the cause of this previously (I obviously don't frequent the right Boards).

F.
____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4346
Credit: 1,123,775
RAC: 740
United States
Message 923391 - Posted: 3 Aug 2009, 17:36:20 UTC - in response to Message 923371.

My apologies, Joe. I had not seen any explanation of the cause of this previously (I obviously don't frequent the right Boards).

F.

No apology needed, I meant that those working on the code "thoroughly understood" the logic. I may have posted a similar analysis at SETI Beta when the issue was first seen, or in the public forum area at Lunatics later, but I don't remember for sure. I don't know whether it has come up in the CUDA Q&A forums, those aren't in my reading list.

I think it's very sensible to report occurrences as they're noticed, it gives a rough idea of how often they happen and keeps some awareness of the issue alive. I suppose Eric's todo list has a relevant entry, but it's unlikely to reach top priority on that list anytime soon.
Joe

Anthony Byrnes
Send message
Joined: 27 Jul 09
Posts: 2
Credit: 132,591
RAC: 0
New Zealand
Message 924466 - Posted: 8 Aug 2009, 1:27:03 UTC

Hi,

Perhaps someone can help me. I dont know if I am in the right place.
Config is Windows XP SP3, Cuda card, driver 190.38, Cuda version 2.3.
My CUDA crunching was working fine.
In last day or so, gone absolutely haywire.
If I allow CUDA to crunch, then explorer.exe goes superbusy and machine no longer usable.
If I suspend SETI, the machine goes back to normal immediately, if I RESUME it dies when explorer.exe goes superbusy. I can do this ad nauseum. Same result.Definitely, guaranteed caused by CUDA app.
Question, has anything changed in last day or so ? I have rebooted machine to be clean. Same result. I have now aborted all samples until I can find some info on this.
Would it pay me to detach and re-attach ? Get a new verion of CUDA app ?
Anyone have info on this problem. I see others have had similar problems.

Regards,
Anthony.

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 361,286
RAC: 37
Germany
Message 924496 - Posted: 8 Aug 2009, 4:01:07 UTC - in response to Message 924466.

You should do a forum search on VLAR (Very Low Angle Range).

Gruß,
Gundolf

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 924525 - Posted: 8 Aug 2009, 6:39:48 UTC - in response to Message 924496.

You should do a forum search on VLAR (Very Low Angle Range).

Gruß,
Gundolf

But he is using VLARkill...

F.
____________

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5087
Credit: 74,232,236
RAC: 7,090
Australia
Message 924527 - Posted: 8 Aug 2009, 6:56:46 UTC - in response to Message 924525.
Last modified: 8 Aug 2009, 6:58:19 UTC

But he is using VLARkill...


I don't see where you're seeing that. In the results all I can see is manual aborts, of stock application...

Lunatics apps via installer + optional rebranding VLAR tasks to CPU is my suggestion. Then at least we could see how much memory is free on that 256MiB card, and work out if there are any further issues.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 924537 - Posted: 8 Aug 2009, 7:41:28 UTC - in response to Message 924527.

Aarghhh! Mea culpa. Sorry. That's what comes of browsing the Boards at ridiculous hours because you can't sleep...

F.
____________

Si
Send message
Joined: 26 May 09
Posts: 3
Credit: 101,569
RAC: 0
United Kingdom
Message 924874 - Posted: 9 Aug 2009, 8:21:18 UTC

Hi,
Can anyone help me with this error, I'm geting a quite a lot of them.
i7 920
295 GTX 190.56 and 190.38
W7 RC1 x64

<core_client_version>6.6.38</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
Device 1 : GeForce GTX 295
totalGlobalMem = 939524096
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1242000
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 30
Device 2 : GeForce GTX 295
totalGlobalMem = 939524096
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1242000
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 30
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 295 is okay
SETI@home using CUDA accelerated device GeForce GTX 295
V12 modification by Raistmer
Priority of worker thread rised successfully
Priority of process adjusted successfully
Total GPU memory 939524096 free GPU memory 821575680
setiathome_enhanced 6.02 Visual Studio/Microsoft C++

Build features: Non-graphics CUDA VLAR autokill enabled FFTW USE_SSE x86
CPUID: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz

Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is : 0.432403
After app init: total GPU memory 939524096 free GPU memory 773079040
Cuda error 'GetFixedPoT_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 487 : unknown error.
Cuda error 'NormalizePoT_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 496 : unknown error.
Cuda error 'NormalizePoT_kernel' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 496 : unknown error.
Cuda error 'cudaMemset(dev_flag, 0, sizeof(*dev_flag))' in file 'd:/BoincSeti_Prog/sinbad_repositories/LunaticsUnited/SETI_CUDA_MB_exp/client/cuda/cudaAcc_gaussfit.cu' in line 499 : unknown error.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4239
Credit: 34,931,101
RAC: 23,457
United Kingdom
Message 924961 - Posted: 9 Aug 2009, 17:30:27 UTC - in response to Message 924874.
Last modified: 9 Aug 2009, 17:32:21 UTC

Hi,
Can anyone help me with this error, I'm geting a quite a lot of them.
i7 920
295 GTX 190.56 and 190.38
W7 RC1 x64


What Cuda dll's are you running with V12VlarKill?, Cuda 2.0, 2.1, 2.2 or 2.3?
and where did you get the V12 app from?, did you download it on it's own?,
or part of one of the installer packages?, if so which version?

Claggy

Si
Send message
Joined: 26 May 09
Posts: 3
Credit: 101,569
RAC: 0
United Kingdom
Message 925099 - Posted: 10 Aug 2009, 7:49:09 UTC - in response to Message 924961.

Hi,
I'm using 2.3 dll's and I used the Lunatics_Win64v0.2_(SSE3+)_AP505r168_AKv8bx64_CudaV12 unified installer.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4239
Credit: 34,931,101
RAC: 23,457
United Kingdom
Message 925207 - Posted: 10 Aug 2009, 19:35:50 UTC - in response to Message 925099.
Last modified: 10 Aug 2009, 19:40:51 UTC

I had a better look at your results today, can't say that you have a lot of error's,
quite a few are VLAR killed, a couple don't give any useful info,
and only four have cuda errors in them, two of those are timeout errors, and the other two just say unknown.

The reason i asked about the Cuda dll's used, is that your your CPU times for Cuda wu's are about double my times,
and i think the Cuda 2.3 dll's are supposed to reduce CPU usage,
you did drop the dll's in your setiathome project folder, over writing the one's there already, and not the Boinc Data folder?,
or it could be just the difference between my 4Ghz dual core and your Hyperthreaded i7,

You could use ReSchedule1.9 to rebrand any GPU VLAR tasks to the CPU so they don't get aborted.

Claggy
____________


Si
Send message
Joined: 26 May 09
Posts: 3
Credit: 101,569
RAC: 0
United Kingdom
Message 925328 - Posted: 11 Aug 2009, 7:00:30 UTC - in response to Message 925207.

Hi,
Thanks for the help, I put the .dll's in the right folder. I loked at my results to and I seem to be missing a lot of error results, the wu would run right to the end so not VLAR killed and just before upload the nvidia driver would crash and restart the wu would then show computation error, I was geting 4 to 5 a day.
I reinstalled W7 yesterday and so far OK.
Thanks,
S

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7122
Credit: 61,590,723
RAC: 16,404
Germany
Message 925330 - Posted: 11 Aug 2009, 7:28:31 UTC - in response to Message 909296.

In the old thread of Raistmer it was well to post the 'cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel - errors'.
[...]
If you would like to help the opt. crew.. please post the '-12 Unknown error'..
If you let run CUDA WUs.. look in your PC/tasks overview.. click on 'error' and look to the 'CPU time'.

If you let run Raistmer's CUDA app, the 'VLAR kill' would/could be identified with ~ 1 sec. CPU time. This results aren't for interesting.

The '-12 Unknown error' happen in the calculation of the WU, so some sec. CPU time is shown.
Then click to the Task ID and copy/paste the part of the <stderr_txt>.


It could look like this:
Exception detected inside cudaAcc_find_triplets, dumping client state
icfft=98384, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: ..\analyzePoT.cpp
Line: 348

And only the [bolded] line is needed.
...



I had a small look at my GPU cruncher results and found some..


http://setiathome.berkeley.edu/result.php?resultid=1324644157
icfft=81939, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1324195175
icfft=140743, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1324030165
icfft=164295, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1323750030
icfft=177393, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1323647918
icfft=119484, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1323350002
icfft=130527, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1323203454
icfft=106857, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1323167228
icfft=170054, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1323115634
icfft=155657, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

http://setiathome.berkeley.edu/result.php?resultid=1321624734
icfft=144759, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

samuel7
Volunteer tester
Send message
Joined: 2 Jan 00
Posts: 47
Credit: 2,194,240
RAC: 0
Finland
Message 925612 - Posted: 12 Aug 2009, 17:48:41 UTC

Here's a new -12 error. So new I haven't managed to upload let alone report it yet.

http://setiathome.berkeley.edu/result.php?resultid=1329923998
icfft=103904, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

One wingmate shows the same:
http://setiathome.berkeley.edu/result.php?resultid=1328586677
icfft=65575, PoT_activity=0, PoT_fre
[stderr_txt truncated]

____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4346
Credit: 1,123,775
RAC: 740
United States
Message 925694 - Posted: 12 Aug 2009, 22:39:12 UTC - in response to Message 925612.

Here's a new -12 error. So new I haven't managed to upload let alone report it yet.

http://setiathome.berkeley.edu/result.php?resultid=1329923998
icfft=103904, PoT_activity=0, PoT_freq_bin=-1SETI@home error -12 Unknown error

One wingmate shows the same:
http://setiathome.berkeley.edu/result.php?resultid=1328586677
icfft=65575, PoT_activity=0, PoT_fre
[stderr_txt truncated]

And the successful result on CPU shows result_overflow on triplets, so again it suggests the CUDA app should be rewritten to be able to report more than one triplet per array.
Joe

Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Optimized CUDA Issues & '-12 Unknown error'

Copyright © 2014 University of California