Cuda stopped working

Author	Message
Zombu2 Volunteer tester Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0	Message 1007627 - Posted: 24 Jun 2010, 0:15:15 UTC hello i have a odd problem since this morning both cuda cards (480 gtx and 260 gtx) started to finish all their work in less then 3 seconds which i find very unbelievable anyone here that expierienced the same problem ?? funny enough the WU do not get marked as computation error they seem to have run completely I came down with a bad case of i don't give a crap ID: 1007627 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1007655 - Posted: 24 Jun 2010, 2:14:54 UTC - in response to Message 1007627. Since the results are not reported yet search the client_state.xml file for the string <stderr_txt> and find for yourself. (copy client_state.xml to another directory and "play" with the copy) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1007655 ·

Zombu2 Volunteer tester Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0	Message 1007664 - Posted: 24 Jun 2010, 2:41:56 UTC thnx i get right on that as soon as i m back at the machine I came down with a bad case of i don't give a crap ID: 1007664 ·

FiveHamlet Send message Joined: 5 Oct 99 Posts: 783 Credit: 32,638,578 RAC: 0	Message 1007824 - Posted: 24 Jun 2010, 18:19:21 UTC If you have flops in your app_info.xml file that might be causing the problem. Dave ID: 1007824 ·

Zombu2 Volunteer tester Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0	Message 1008007 - Posted: 25 Jun 2010, 2:09:41 UTC Last modified: 25 Jun 2010, 2:30:32 UTC here is what i ve found don t seem to have an error <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1 : GeForce GTX 480 totalGlobalMem = 1576468480 sharedMemPerBlock = 49152 regsPerBlock = 32768 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 810000 totalConstMem = 65536 major = 2 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 15 Device 2 : GeForce GTX 260 totalGlobalMem = 920125440 sharedMemPerBlock = 16384 regsPerBlock = 16384 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 512 clockRate = 1350000 totalConstMem = 65536 major = 1 minor = 3 textureAlignment = 256 deviceOverlap = 1 multiProcessorCount = 27 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 480 is okay SETI@home using CUDA accelerated device GeForce GTX 480 V12 modification by Raistmer Priority of worker thread rised successfully Priority of process adjusted successfully Total GPU memory 1576468480 free GPU memory 1083334656 setiathome_enhanced 6.02 Visual Studio/Microsoft C++ Build features: Non-graphics CUDA VLAR autokill enabled FFTW USE_SSE x86 CPUID: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 libboinc: 6.3.22 Work Unit Info: ............... WU true angle range is : 0.444691 After app init: total GPU memory 1576468480 free GPU memory 1083334656 SETI@Home Informational message -9 result_overflow NOTE: The number of results detected exceeds the storage space allocated. Flopcounter: 204570588.421281 Spike count: 0 Pulse count: 31 Triplet count: 0 Gaussian count: 0 Wall-clock time elapsed since last restart: 14.5 seconds class T_FFT<0>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019) class T_FFT<8>: total=0.00e+000, N=1, <>=0 (0.00e+000), min=0 (0.00e+000) class T_FFT<16>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019) class T_FFT<64>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019) class T_FFT<256>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019) class T_FFT<512>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019) class T_FFT<1024>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019) class T_FFT<2048>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019) class T_FFT<4096>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019) class T_FFT<8192>: total=0.00e+000, N=0, <>=0 (0.00e+000), min=18446744073709551615 (1.84e+019) called boinc_finish </stderr_txt> I came down with a bad case of i don't give a crap ID: 1008007 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1008065 - Posted: 25 Jun 2010, 5:56:30 UTC - in response to Message 1008007. Last modified: 25 Jun 2010, 6:17:55 UTC Normally this info: "SETI@Home Informational message -9 result_overflow" "Pulse count: 31" is not considered as error - it says that too many "signals" are found - "signals" which are caused by our (human) radio emissions (radars near Arecibo) But Fermi GPU's are known to produce -9 overflows on good data. Read: Possible problem with cuda WU's i'm having. http://setiathome.berkeley.edu/forum_thread.php?id=60249&nowrap=true Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1008065 ·

Zombu2 Volunteer tester Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0	Message 1008131 - Posted: 25 Jun 2010, 12:11:38 UTC - in response to Message 1008065. here s the funny thing i run a 260 in the same machine just for crunching and it does the same thing I came down with a bad case of i don't give a crap ID: 1008131 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1008139 - Posted: 25 Jun 2010, 13:21:47 UTC - in response to Message 1008131. When did you last reboot? Sometimes something gets stuck in the GPU memory, which can only be solved by a reboot. GruÃŸ, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours ID: 1008139 ·

Zombu2 Volunteer tester Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0	Message 1008375 - Posted: 25 Jun 2010, 23:33:29 UTC well i rebooted multiple times since i figured it ll purge the memory but both cards still do the same thing on all work units i m puzzled I came down with a bad case of i don't give a crap ID: 1008375 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1008542 - Posted: 26 Jun 2010, 7:38:14 UTC - in response to Message 1008375. I just checked this task that ran on the Fermi GPU and there's no -9 result_overflow. So, at least not all tasks generate this problem, as has happened for others when they used the wrong application for the 4xx cards. GruÃŸ, Gundolf ID: 1008542 ·

Zombu2 Volunteer tester Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0	Message 1008596 - Posted: 26 Jun 2010, 13:17:05 UTC yeah i reinstalled the whole boinc app using the stock cuda app and everything is back to normal i wonder what happened to the optimized app I came down with a bad case of i don't give a crap ID: 1008596 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1008661 - Posted: 26 Jun 2010, 16:53:35 UTC - in response to Message 1008596. i wonder what happened to the optimized app Have a look at Running SETI@home on an nVidia Fermi GPU and CUDA MB V12b rebuild supposed to work with Fermi GPUs. There have been other threads before that, explaining how running the wrong optimised application on a Fermi card generated only -9 overflows on good work units. GruÃŸ, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours ID: 1008661 ·

Zombu2 Volunteer tester Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0	Message 1008953 - Posted: 27 Jun 2010, 17:51:49 UTC i have finally managed with the help of SciManStev to get both my optimized cpu apps and my fermi to run right the trick is to install the unified installer with cuda then replace the lunatics cuda stuff with the stock fermi files and the right dll's and make the apropriate changes to the app_info.xml file and voila everything is running as intended (your milage may vary ) I came down with a bad case of i don't give a crap ID: 1008953 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.