Cuda stopped working


log in

Advanced search

Questions and Answers : GPU applications : Cuda stopped working

Author Message
Profile Zombu2Project donor
Volunteer tester
Avatar
Send message
Joined: 24 Feb 01
Posts: 99
Credit: 4,981,656
RAC: 3
United States
Message 1007627 - Posted: 24 Jun 2010, 0:15:15 UTC

hello
i have a odd problem since this morning
both cuda cards (480 gtx and 260 gtx) started to finish all their work in less then 3 seconds which i find very unbelievable
anyone here that expierienced the same problem ??

funny enough the WU do not get marked as computation error they seem to have run completely
____________


Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2632
Credit: 5,977,591
RAC: 3,956
Bulgaria
Message 1007655 - Posted: 24 Jun 2010, 2:14:54 UTC - in response to Message 1007627.


Since the results are not reported yet
search the client_state.xml file for the string <stderr_txt> and find for yourself.

(copy client_state.xml to another directory and "play" with the copy)


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Zombu2Project donor
Volunteer tester
Avatar
Send message
Joined: 24 Feb 01
Posts: 99
Credit: 4,981,656
RAC: 3
United States
Message 1007664 - Posted: 24 Jun 2010, 2:41:56 UTC

thnx i get right on that as soon as i m back at the machine
____________


FiveHamlet
Avatar
Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 1007824 - Posted: 24 Jun 2010, 18:19:21 UTC

If you have flops in your app_info.xml file that might be causing the problem.

Dave
____________

Profile Zombu2Project donor
Volunteer tester
Avatar
Send message
Joined: 24 Feb 01
Posts: 99
Credit: 4,981,656
RAC: 3
United States
Message 1008007 - Posted: 25 Jun 2010, 2:09:41 UTC
Last modified: 25 Jun 2010, 2:30:32 UTC

here is what i ve found

don t seem to have an error

<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):

Device 1 : GeForce GTX 480

totalGlobalMem = 1576468480

sharedMemPerBlock = 49152

regsPerBlock = 32768

warpSize = 32

memPitch = 2147483647

maxThreadsPerBlock = 1024

clockRate = 810000

totalConstMem = 65536

major = 2

minor = 0

textureAlignment = 512

deviceOverlap = 1

multiProcessorCount = 15

Device 2 : GeForce GTX 260

totalGlobalMem = 920125440

sharedMemPerBlock = 16384

regsPerBlock = 16384

warpSize = 32

memPitch = 2147483647

maxThreadsPerBlock = 512

clockRate = 1350000

totalConstMem = 65536

major = 1

minor = 3

textureAlignment = 256

deviceOverlap = 1

multiProcessorCount = 27

setiathome_CUDA: CUDA Device 1 specified, checking...

Device 1: GeForce GTX 480 is okay

SETI@home using CUDA accelerated device GeForce GTX 480

V12 modification by Raistmer

Priority of worker thread rised successfully

Priority of process adjusted successfully

Total GPU memory 1576468480 free GPU memory 1083334656

setiathome_enhanced 6.02 Visual Studio/Microsoft C++



Build features: Non-graphics CUDA VLAR autokill enabled FFTW USE_SSE x86

CPUID: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz



Cache: L1=64K L2=256K



CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3

libboinc: 6.3.22



Work Unit Info:

...............

WU true angle range is : 0.444691

After app init: total GPU memory 1576468480 free GPU memory 1083334656

SETI@Home Informational message -9 result_overflow

NOTE: The number of results detected exceeds the storage space allocated.



Flopcounter: 204570588.421281



Spike count: 0

Pulse count: 31

Triplet count: 0

Gaussian count: 0



Wall-clock time elapsed since last restart: 14.5 seconds

class T_FFT<0>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<8>: total=0.00e+000, N=1, <>=0 (0.00e+000), min=0 (0.00e+000)

class T_FFT<16>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<64>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<256>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<512>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<1024>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<2048>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<4096>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<8192>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

called boinc_finish


</stderr_txt>
____________


Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2632
Credit: 5,977,591
RAC: 3,956
Bulgaria
Message 1008065 - Posted: 25 Jun 2010, 5:56:30 UTC - in response to Message 1008007.
Last modified: 25 Jun 2010, 6:17:55 UTC

Normally this info:

"SETI@Home Informational message -9 result_overflow"
"Pulse count: 31"

is not considered as error -
it says that too many "signals" are found -
"signals" which are caused by our (human) radio emissions
(radars near Arecibo)


But Fermi GPU's are known to produce -9 overflows on good data.

Read:
Possible problem with cuda WU's i'm having.
http://setiathome.berkeley.edu/forum_thread.php?id=60249&nowrap=true




____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Zombu2Project donor
Volunteer tester
Avatar
Send message
Joined: 24 Feb 01
Posts: 99
Credit: 4,981,656
RAC: 3
United States
Message 1008131 - Posted: 25 Jun 2010, 12:11:38 UTC - in response to Message 1008065.

here s the funny thing i run a 260 in the same machine just for crunching and it does the same thing
____________


Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 357,953
RAC: 37
Germany
Message 1008139 - Posted: 25 Jun 2010, 13:21:47 UTC - in response to Message 1008131.

When did you last reboot? Sometimes something gets stuck in the GPU memory, which can only be solved by a reboot.

Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours

Profile Zombu2Project donor
Volunteer tester
Avatar
Send message
Joined: 24 Feb 01
Posts: 99
Credit: 4,981,656
RAC: 3
United States
Message 1008375 - Posted: 25 Jun 2010, 23:33:29 UTC

well i rebooted multiple times since i figured it ll purge the memory but both cards still do the same thing on all work units

i m puzzled
____________


Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 357,953
RAC: 37
Germany
Message 1008542 - Posted: 26 Jun 2010, 7:38:14 UTC - in response to Message 1008375.

I just checked this task that ran on the Fermi GPU and there's no -9 result_overflow. So, at least not all tasks generate this problem, as has happened for others when they used the wrong application for the 4xx cards.

Gruß,
Gundolf

Profile Zombu2Project donor
Volunteer tester
Avatar
Send message
Joined: 24 Feb 01
Posts: 99
Credit: 4,981,656
RAC: 3
United States
Message 1008596 - Posted: 26 Jun 2010, 13:17:05 UTC

yeah
i reinstalled the whole boinc app using the stock cuda app and everything is back to normal i wonder what happened to the optimized app
____________


Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 357,953
RAC: 37
Germany
Message 1008661 - Posted: 26 Jun 2010, 16:53:35 UTC - in response to Message 1008596.

i wonder what happened to the optimized app

Have a look at Running SETI@home on an nVidia Fermi GPU and CUDA MB V12b rebuild supposed to work with Fermi GPUs. There have been other threads before that, explaining how running the wrong optimised application on a Fermi card generated only -9 overflows on good work units.

Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours

Profile Zombu2Project donor
Volunteer tester
Avatar
Send message
Joined: 24 Feb 01
Posts: 99
Credit: 4,981,656
RAC: 3
United States
Message 1008953 - Posted: 27 Jun 2010, 17:51:49 UTC

i have finally managed with the help of SciManStev to get both my optimized cpu apps and my fermi to run right

the trick is to install the unified installer with cuda then replace the lunatics
cuda stuff with the stock fermi files and the right dll's and make the apropriate
changes to the app_info.xml file and voila everything is running as intended
(your milage may vary )


____________


Questions and Answers : GPU applications : Cuda stopped working

Copyright © 2014 University of California