CUDA error, fall back to CPU

Message boards : Number crunching : CUDA error, fall back to CPU
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Golden_Frog
Volunteer tester
Avatar

Send message
Joined: 28 Oct 99
Posts: 27
Credit: 1,650,057
RAC: 0
United States
Message 887026 - Posted: 21 Apr 2009, 22:12:48 UTC

Anyone know what would cause this error and how to prevent it?

<core_client_version>6.6.20</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : GeForce 8800 GS
totalGlobalMem = 402653184
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1600000
totalConstMem = 65536
major = 1
minor = 1
textureAlignment = 256
deviceOverlap = 0
multiProcessorCount = 12
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce 8800 GS is okay
SETI@home using CUDA accelerated device GeForce 8800 GS
V10 modification by Raistmer
Priority of worker thread rised successfully
Priority of process adjusted successfully
Affinity of worker thread adjusted successfully
Total GPU memory 402653184 free GPU memory 143392768
setiathome_enhanced 6.02 Visual Studio/Microsoft C++

Build features: Non-graphics VLAR autokill enabled Affinity lock FFTW x86
CPUID: AMD Athlon(tm) Dual Core Processor 4450e

Cache: L1=64K L2=512K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
libboinc: 6.4.5

Work Unit Info:
...............
WU true angle range is : 0.411847
Cuda error 'cudaMalloc((void**) &dev_GaussFitResults' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcceleration.cu' in line 317 : out of memory.
setiathome_CUDA: CUDA runtime ERROR in device memory allocation (Step 1 of 3). Falling back to HOST CPU processing...


Flopcounter: 16275728075702.543000

Spike count: 0
Pulse count: 2
Triplet count: 0
Gaussian count: 0

Wall-clock time elapsed since last restart: 23615.7 seconds
called boinc_finish

</stderr_txt>
]]>
ID: 887026 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 887033 - Posted: 21 Apr 2009, 22:28:08 UTC - in response to Message 887026.  

ID: 887033 · Report as offensive
Galadriel

Send message
Joined: 24 Jan 09
Posts: 42
Credit: 8,422,996
RAC: 0
Romania
Message 887049 - Posted: 21 Apr 2009, 22:55:24 UTC - in response to Message 887033.  

i do not agree to youre answer. the graphic card run out of memory. so problay the user did some intensive rendering wich filled up the mem.
ID: 887049 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 887050 - Posted: 21 Apr 2009, 22:56:31 UTC
Last modified: 21 Apr 2009, 22:57:08 UTC

AR 0.411847 is nowhere near VLAR, the subject of the thread SJ has directed you to. VLAR is a true description only of tasks with AR less than 0.05

You may have suffered the problem which I warned about in the BOINC 6.6.20 released ... thread.

If you have that problem - specifically, if you see one or more CUDA tasks 'waiting to run' - then a simple computer reboot should restore you to full crunching speed.
ID: 887050 · Report as offensive
Profile Voyager
Volunteer tester
Avatar

Send message
Joined: 2 Nov 99
Posts: 602
Credit: 3,264,813
RAC: 0
United States
Message 887053 - Posted: 21 Apr 2009, 23:02:40 UTC

If you have that problem - specifically, if you see one or more CUDA tasks 'waiting to run' - then a simple computer reboot should restore you to full crunching speed.

I have had it a few times.Only noticed when my gpu temp drops way down.Just exiting boinc and restart fixes it.
ID: 887053 · Report as offensive
Golden_Frog
Volunteer tester
Avatar

Send message
Joined: 28 Oct 99
Posts: 27
Credit: 1,650,057
RAC: 0
United States
Message 887070 - Posted: 21 Apr 2009, 23:29:43 UTC

This is a pure BOINC comp, SETI GPU and WGC CPU. It runs headless in a corner. I have to unhook the monitor from my main rig and carry it back to this one to make changes. I'll have a look at it later tonight, give it a reboot and make sure it's not overheating or something.
ID: 887070 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 887074 - Posted: 21 Apr 2009, 23:36:47 UTC - in response to Message 887026.  

Golden_Frog,
My first guess is that you're running an out-of-date video driver. What version of NVIDIA driver are you running? Have you tried upgrading to the latest version?
ID: 887074 · Report as offensive
Golden_Frog
Volunteer tester
Avatar

Send message
Joined: 28 Oct 99
Posts: 27
Credit: 1,650,057
RAC: 0
United States
Message 887083 - Posted: 21 Apr 2009, 23:46:40 UTC
Last modified: 21 Apr 2009, 23:51:02 UTC

It's all up-to-date. I gave the comp a good cleaning and installed the newest drives just this last Saturday.
ID: 887083 · Report as offensive
Golden_Frog
Volunteer tester
Avatar

Send message
Joined: 28 Oct 99
Posts: 27
Credit: 1,650,057
RAC: 0
United States
Message 887110 - Posted: 22 Apr 2009, 1:19:24 UTC

I looked and I did have a few wu's started but "waiting to run". I went ahead and gave the comp a reboot and it looks to be back on track. Thanks guys

If this is going to continue to be a problem I might have to rethink running it headless.
ID: 887110 · Report as offensive
piper69

Send message
Joined: 25 Sep 08
Posts: 49
Credit: 3,042,244
RAC: 0
Romania
Message 887116 - Posted: 22 Apr 2009, 1:44:56 UTC

try installing teamviewer_host. it does a really good job. i am very pleased with it. no more need to carry monitors around. :P
ID: 887116 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 887194 - Posted: 22 Apr 2009, 4:26:09 UTC - in response to Message 887110.  
Last modified: 22 Apr 2009, 4:26:39 UTC

I looked and I did have a few wu's started but "waiting to run". I went ahead and gave the comp a reboot and it looks to be back on track. Thanks guys

If this is going to continue to be a problem I might have to rethink running it headless.


My other suggestion was that running headless may be the culprit. It would not surprise me if the newer video cards save power by turning off components when a monitor is not connected to it. My 8600 GT will not render a display unless there is a monitor connected upon driver startup. I can't later connect a monitor to it unless I reboot. Strange. They do make "dummy plugs" for video cards to fool it to think there is a monitor attached. Not sure where to get those, but I've heard they work.

Anyway, glad the CUDA issue was nothing serious.
ID: 887194 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 887254 - Posted: 22 Apr 2009, 11:43:20 UTC - in response to Message 887194.  

They do make "dummy plugs" for video cards to fool it to think there is a monitor attached. Not sure where to get those, but I've heard they work.

I haven't come across the "they" that make them. Basically a couple of 100 ohm resistors shoved in a plug header (Google is your friend) and yes, they do work.

F.
ID: 887254 · Report as offensive
Golden_Frog
Volunteer tester
Avatar

Send message
Joined: 28 Oct 99
Posts: 27
Credit: 1,650,057
RAC: 0
United States
Message 887332 - Posted: 22 Apr 2009, 17:24:42 UTC

I boot the comp with a monitor, mouse and keyboard attached. After everything is up and running I take the monitor off. I have 3 comps that run this way, only one has had that problem.

The GPU was also running with a slight overclock. I reset it back to default.

The card ran all night and appears to only have completed and uploaded 1 wu.

Looks like I need to do a little more troubleshooting.

ID: 887332 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 887336 - Posted: 22 Apr 2009, 17:33:30 UTC - in response to Message 887332.  

After everything is up and running I take the monitor off.

That's a nice way to blow up your video outlet or worse, the whole videocard. Don't do it too often. Never heard of KVM switches?
ID: 887336 · Report as offensive
Andy Williams
Volunteer tester
Avatar

Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 887337 - Posted: 22 Apr 2009, 17:33:44 UTC - in response to Message 887332.  

The card ran all night and appears to only have completed and uploaded 1 wu.


It may have just been a VLAR.
--
Classic 82353 WU / 400979 h
ID: 887337 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 887340 - Posted: 22 Apr 2009, 17:40:38 UTC - in response to Message 887116.  

try installing teamviewer_host. it does a really good job. i am very pleased with it. no more need to carry monitors around. :P

VNC works too.
ID: 887340 · Report as offensive
molson

Send message
Joined: 4 Nov 02
Posts: 12
Credit: 57,600,502
RAC: 0
United States
Message 887352 - Posted: 22 Apr 2009, 18:02:02 UTC

I have two computers running 6.6.20 and CUDA on 8500/8600 cards. Both computers consistently get errors like the following. What's worse is then the revert-to-CPU kicks in and I end up running 5 tasks on a 4-core machine, which really slows everything else down (4 other AP tasks):

<core_client_version>6.6.20</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : Device Emulation (CPU)
totalGlobalMem = -1
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 1
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1350000
totalConstMem = 65536
major = 9999
minor = 9999
textureAlignment = 256
deviceOverlap = 0
multiProcessorCount = 16
setiathome_CUDA: device 1 is emulation device and should not be used, supports 9999.9999
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: Device Emulation (CPU) is okay
SETI@home using CUDA accelerated device Device Emulation (CPU)
Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_fft.cu' in line 49 : initialization error.
Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_fft.cu' in line 49 : initialization error.
setiathome_CUDA: CUDA runtime ERROR in plan FFT. Falling back to HOST CPU processing...
setiathome_enhanced 6.03 Visual Studio/Microsoft C++
libboinc: 6.3.22


Should I even be running CUDA on these cards? When they run, they appear to efficiently crunch, using .03/.01 CPU.

ID: 887352 · Report as offensive
Golden_Frog
Volunteer tester
Avatar

Send message
Joined: 28 Oct 99
Posts: 27
Credit: 1,650,057
RAC: 0
United States
Message 887353 - Posted: 22 Apr 2009, 18:03:24 UTC - in response to Message 887336.  

That's a nice way to blow up your video outlet or worse, the whole videocard. Don't do it too often. Never heard of KVM switches?


I would love to have a KVM switch and a dedicated monitor but I just don't have the funds.
ID: 887353 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 887356 - Posted: 22 Apr 2009, 18:14:55 UTC - in response to Message 887353.  

There's also TightVNC, which is a free open source version of VNC. :-)
ID: 887356 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 887360 - Posted: 22 Apr 2009, 18:27:56 UTC - in response to Message 887026.  

Anyone know what would cause this error and how to prevent it?

<core_client_version>6.6.20</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : GeForce 8800 GS
totalGlobalMem = 402653184
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1600000
totalConstMem = 65536
major = 1
minor = 1
textureAlignment = 256
deviceOverlap = 0
multiProcessorCount = 12
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce 8800 GS is okay
SETI@home using CUDA accelerated device GeForce 8800 GS
V10 modification by Raistmer
Priority of worker thread rised successfully
Priority of process adjusted successfully
Affinity of worker thread adjusted successfully
Total GPU memory 402653184 free GPU memory 143392768
setiathome_enhanced 6.02 Visual Studio/Microsoft C++

Build features: Non-graphics VLAR autokill enabled Affinity lock FFTW x86
CPUID: AMD Athlon(tm) Dual Core Processor 4450e

Cache: L1=64K L2=512K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
libboinc: 6.4.5

Work Unit Info:
...............
WU true angle range is : 0.411847
Cuda error 'cudaMalloc((void**) &dev_GaussFitResults' in file 'd:/BTR/SETI6/SETI_MB_CUDA/client/cuda/cudaAcceleration.cu' in line 317 : out of memory.
setiathome_CUDA: CUDA runtime ERROR in device memory allocation (Step 1 of 3). Falling back to HOST CPU processing...


Flopcounter: 16275728075702.543000

Spike count: 0
Pulse count: 2
Triplet count: 0
Gaussian count: 0

Wall-clock time elapsed since last restart: 23615.7 seconds
called boinc_finish

</stderr_txt>
]]>


So there is true low GPU memory condition (should be ~250MB of free GPU RAM, here much less). It's no VLAR issue in any way. The question is why this GPU had so low amount of RAM available (and "who" took all other onboard RAM).
ID: 887360 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : CUDA error, fall back to CPU


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.