Cuda stopped working

Questions and Answers : GPU applications : Cuda stopped working
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1007627 - Posted: 24 Jun 2010, 0:15:15 UTC

hello
i have a odd problem since this morning
both cuda cards (480 gtx and 260 gtx) started to finish all their work in less then 3 seconds which i find very unbelievable
anyone here that expierienced the same problem ??

funny enough the WU do not get marked as computation error they seem to have run completely
I came down with a bad case of i don't give a crap
ID: 1007627 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1007655 - Posted: 24 Jun 2010, 2:14:54 UTC - in response to Message 1007627.  


Since the results are not reported yet
search the client_state.xml file for the string <stderr_txt> and find for yourself.

(copy client_state.xml to another directory and "play" with the copy)


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1007655 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1007664 - Posted: 24 Jun 2010, 2:41:56 UTC

thnx i get right on that as soon as i m back at the machine
I came down with a bad case of i don't give a crap
ID: 1007664 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 1007824 - Posted: 24 Jun 2010, 18:19:21 UTC

If you have flops in your app_info.xml file that might be causing the problem.

Dave
ID: 1007824 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1008007 - Posted: 25 Jun 2010, 2:09:41 UTC
Last modified: 25 Jun 2010, 2:30:32 UTC

here is what i ve found

don t seem to have an error

<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):

Device 1 : GeForce GTX 480

totalGlobalMem = 1576468480

sharedMemPerBlock = 49152

regsPerBlock = 32768

warpSize = 32

memPitch = 2147483647

maxThreadsPerBlock = 1024

clockRate = 810000

totalConstMem = 65536

major = 2

minor = 0

textureAlignment = 512

deviceOverlap = 1

multiProcessorCount = 15

Device 2 : GeForce GTX 260

totalGlobalMem = 920125440

sharedMemPerBlock = 16384

regsPerBlock = 16384

warpSize = 32

memPitch = 2147483647

maxThreadsPerBlock = 512

clockRate = 1350000

totalConstMem = 65536

major = 1

minor = 3

textureAlignment = 256

deviceOverlap = 1

multiProcessorCount = 27

setiathome_CUDA: CUDA Device 1 specified, checking...

Device 1: GeForce GTX 480 is okay

SETI@home using CUDA accelerated device GeForce GTX 480

V12 modification by Raistmer

Priority of worker thread rised successfully

Priority of process adjusted successfully

Total GPU memory 1576468480 free GPU memory 1083334656

setiathome_enhanced 6.02 Visual Studio/Microsoft C++



Build features: Non-graphics CUDA VLAR autokill enabled FFTW USE_SSE x86

CPUID: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz



Cache: L1=64K L2=256K



CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3

libboinc: 6.3.22



Work Unit Info:

...............

WU true angle range is : 0.444691

After app init: total GPU memory 1576468480 free GPU memory 1083334656

SETI@Home Informational message -9 result_overflow

NOTE: The number of results detected exceeds the storage space allocated.



Flopcounter: 204570588.421281



Spike count: 0

Pulse count: 31

Triplet count: 0

Gaussian count: 0



Wall-clock time elapsed since last restart: 14.5 seconds

class T_FFT<0>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<8>: total=0.00e+000, N=1, <>=0 (0.00e+000), min=0 (0.00e+000)

class T_FFT<16>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<64>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<256>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<512>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<1024>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<2048>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<4096>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

class T_FFT<8192>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019)

called boinc_finish


</stderr_txt>
I came down with a bad case of i don't give a crap
ID: 1008007 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1008065 - Posted: 25 Jun 2010, 5:56:30 UTC - in response to Message 1008007.  
Last modified: 25 Jun 2010, 6:17:55 UTC

Normally this info:

"SETI@Home Informational message -9 result_overflow"
"Pulse count: 31"

is not considered as error -
it says that too many "signals" are found -
"signals" which are caused by our (human) radio emissions
(radars near Arecibo)


But Fermi GPU's are known to produce -9 overflows on good data.

Read:
Possible problem with cuda WU's i'm having.
http://setiathome.berkeley.edu/forum_thread.php?id=60249&nowrap=true




 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1008065 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1008131 - Posted: 25 Jun 2010, 12:11:38 UTC - in response to Message 1008065.  

here s the funny thing i run a 260 in the same machine just for crunching and it does the same thing
I came down with a bad case of i don't give a crap
ID: 1008131 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1008139 - Posted: 25 Jun 2010, 13:21:47 UTC - in response to Message 1008131.  

When did you last reboot? Sometimes something gets stuck in the GPU memory, which can only be solved by a reboot.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 1008139 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1008375 - Posted: 25 Jun 2010, 23:33:29 UTC

well i rebooted multiple times since i figured it ll purge the memory but both cards still do the same thing on all work units

i m puzzled
I came down with a bad case of i don't give a crap
ID: 1008375 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1008542 - Posted: 26 Jun 2010, 7:38:14 UTC - in response to Message 1008375.  

I just checked this task that ran on the Fermi GPU and there's no -9 result_overflow. So, at least not all tasks generate this problem, as has happened for others when they used the wrong application for the 4xx cards.

Gruß,
Gundolf
ID: 1008542 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1008596 - Posted: 26 Jun 2010, 13:17:05 UTC

yeah
i reinstalled the whole boinc app using the stock cuda app and everything is back to normal i wonder what happened to the optimized app
I came down with a bad case of i don't give a crap
ID: 1008596 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1008661 - Posted: 26 Jun 2010, 16:53:35 UTC - in response to Message 1008596.  

i wonder what happened to the optimized app

Have a look at Running SETI@home on an nVidia Fermi GPU and CUDA MB V12b rebuild supposed to work with Fermi GPUs. There have been other threads before that, explaining how running the wrong optimised application on a Fermi card generated only -9 overflows on good work units.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 1008661 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1008953 - Posted: 27 Jun 2010, 17:51:49 UTC

i have finally managed with the help of SciManStev to get both my optimized cpu apps and my fermi to run right

the trick is to install the unified installer with cuda then replace the lunatics
cuda stuff with the stock fermi files and the right dll's and make the apropriate
changes to the app_info.xml file and voila everything is running as intended
(your milage may vary )


I came down with a bad case of i don't give a crap
ID: 1008953 · Report as offensive

Questions and Answers : GPU applications : Cuda stopped working


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.