CUDA task runs 8h at 0.00%, seems to be stuck

Questions and Answers : GPU applications : CUDA task runs 8h at 0.00%, seems to be stuck
Message board moderation

To post messages, you must log in.

AuthorMessage
Maik

Send message
Joined: 15 May 99
Posts: 163
Credit: 9,208,555
RAC: 0
Germany
Message 842012 - Posted: 19 Dec 2008, 14:54:24 UTC

after closing boinc and restarting i got a compute error as result in boinc manager overview. After uploading of the result i found this:

task-ID: 1092856149
work unit ID: 381499165

copy of taks details:

Work Unit Info:
...............
WU true angle range is : 0.010346
Optimal function choices:
-----------------------------------------------------
name
-----------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.00020 0.00000
v_ChirpData 0.01657 0.00000
v_Transpose4 0.01054 0.00000
FPU opt folding 0.00695 0.00000
SETI@home error -12 Unknown error
cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel
File: c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_pulsefind.cu
Line: 232


I tried search but didnt got some informations.
Also im wondering about the "File: c:/sw/....". I cant find this directory or the file on my PC.
If you speak german, a copy of your answere in german language would be nice ;)

some more info's of my PC:
19.12.2008 15:17:31||Starting BOINC client version 6.4.5 for windows_intelx86
19.12.2008 15:17:31||log flags: task, file_xfer, sched_ops
19.12.2008 15:17:31||Libraries: libcurl/7.19.0 OpenSSL/0.9.8i zlib/1.2.3
19.12.2008 15:17:31||Data directory: D:\boinc_data
19.12.2008 15:17:31||Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz [x86 Family 6 Model 23 Stepping 7]
19.12.2008 15:17:31||Processor features: fpu tsc pae nx sse sse2 mmx
19.12.2008 15:17:31||OS: Microsoft Windows XP: Professional x86 Editon, Service Pack 3, (05.01.2600.00)
19.12.2008 15:17:31||Memory: 2.00 GB physical, 4.85 GB virtual
19.12.2008 15:17:31||Disk: 455.76 GB total, 155.01 GB free
19.12.2008 15:17:31||Local time is UTC +1 hours
19.12.2008 15:17:31||Not using a proxy
19.12.2008 15:17:31||CUDA devices found
19.12.2008 15:17:31||Coprocessor: GeForce 9600 GT (1)

GPU-Drivers are up to date. Download and installation yesterday ...
ID: 842012 · Report as offensive
Maik

Send message
Joined: 15 May 99
Posts: 163
Credit: 9,208,555
RAC: 0
Germany
Message 842101 - Posted: 19 Dec 2008, 17:44:37 UTC

I've updated GFX again and now it seems to work.
ID: 842101 · Report as offensive
Profile BMaytum
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 104
Credit: 4,382,041
RAC: 2
United States
Message 842603 - Posted: 20 Dec 2008, 18:04:36 UTC - in response to Message 842012.  

Yesterday I had my first (so far my ONLY) Seti@home Beta Test WU terminate with a Compute Error:
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=5020623
The stderrout reported:
Cuda error 'cudaMemcpy(best_PoT, dev_tmp_pot, max_nb_of_elems * sizeof(float), cudaMemcpyDeviceToHost)' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_pulsefind.cu' in line 1265 : unknown error.


This WU was crunched using v6.06-cuda application (Boinc v6.4.5), with nVidia 32-bit v180.60 driver package (includes Cuda v2.1Beta) on WinXP32 SP3. This WU likewise showed 0.000% progress and Increasing (not decreasing) time To Complete whilst it ran for ~5 minutes wallclock time.
Sabertooth Z77, i7-3770K@4.2GHz, GTX680, W8.1Pro x64
P5N32-E SLI, C2D E8400@3Ghz, GTX580, Win7SP1Pro x64 & PCLinuxOS2015 x64
ID: 842603 · Report as offensive
Maik

Send message
Joined: 15 May 99
Posts: 163
Credit: 9,208,555
RAC: 0
Germany
Message 842791 - Posted: 20 Dec 2008, 23:04:43 UTC
Last modified: 20 Dec 2008, 23:08:35 UTC

Your Boinc is using the v6.06-cuda application ?!?
Hmm... mine is 6.05 ... i'll check that.
I'm using a GeForce 9600GT with driver package nVidia 32-bit v180.48 / Cuda v2.0 on WinXP SP3

WU's still getting stuck (around 5% of all Cuda-WU's)
Some WU's at 5%, some at 58% ...

Next thing is, some WU's are reported with:
"SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated."
Take a look at this:
http://setiathome.berkeley.edu/workunit.php?wuid=382225403
None-Cuda applications reporting "normal" results of the same WU. How can that be?
ID: 842791 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 843744 - Posted: 22 Dec 2008, 18:34:00 UTC - in response to Message 842791.  

Sounds like general numerical problems. Have you checked your GPU temperature?


@SETIEric@qoto.org (Mastodon)

ID: 843744 · Report as offensive
Maik

Send message
Joined: 15 May 99
Posts: 163
Credit: 9,208,555
RAC: 0
Germany
Message 843774 - Posted: 22 Dec 2008, 19:39:10 UTC - in response to Message 843744.  
Last modified: 22 Dec 2008, 19:45:09 UTC

The GPU temp looking fine, everest reporting atm (Cuda is running) arround 35°
Boinc and Cuda was running fine about 24h, then i tried the beta drivers.
Load-stuck WU's (0.000% for minutes / hours) ) again ... so im back at Cuda v2.0 and displaydrivers v180.48 with the MB_6.04_Winx86_CUDA application. That was running fine bevore i installed the beta drivers.

Hmm..., i was typing Cuda is running atm ... i'm wrong. WU goes stuck again, 0.000% since 20min. WinTaskManager reporting 00 CPU-Time and the same amount of memory-usage (72,964K).

Last WU that goes stuck 4 times i've aborted.
There is a debugging log. May you take an eye on it to find out what the problem is. I understand nothing of that ;)

http://setiathome.berkeley.edu/result.php?resultid=1097953246

Next Problem: I have lot "lag's" if cuda is runnig.
Doesn't matter what drivers ore application version i use.
System is freezing for arround 5sec and then it goes on normaly.
ID: 843774 · Report as offensive
Profile Spectrum
Avatar

Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 843777 - Posted: 22 Dec 2008, 19:44:03 UTC

I have abandoned Cuda as a bad joke, until its proven 100% stable I will let both my cpu's do the work, I keep getting units stuck at 0 progress after 8+ hours.



I am not saying Cuda is a bad idea just that it should have stayed Beta a while longer.
ID: 843777 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 843779 - Posted: 22 Dec 2008, 19:48:34 UTC - in response to Message 843774.  
Last modified: 22 Dec 2008, 20:04:08 UTC

http://setiathome.berkeley.edu/result.php?resultid=1097953246


The example you've given is VLAR and is known bug in CUDA for everyone. If this is the AR that you're getting on the errors you're having, it's probably the reason for the problems.

There's also this message that can show you how to spot VLAR in the client_state.xml file.
ID: 843779 · Report as offensive
Maik

Send message
Joined: 15 May 99
Posts: 163
Credit: 9,208,555
RAC: 0
Germany
Message 843786 - Posted: 22 Dec 2008, 20:08:24 UTC
Last modified: 22 Dec 2008, 20:09:55 UTC

Thanks, that must be the answere of the problem i have.
I've chacked that. Most results with compute error have an AR lower than 0.00
And the only answere of this problem is to abort the WU's / delete the files manually o_O?
ID: 843786 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 843790 - Posted: 22 Dec 2008, 20:16:31 UTC - in response to Message 843786.  
Last modified: 22 Dec 2008, 20:17:22 UTC

Thanks, that must be the answere of the problem i have.
I've chacked that. Most results with compute error have an AR lower than 0.00
And the only answere of this problem is to abort the WU's / delete the files manually o_O?

I don't get that many, and most of the time they just get stuck. So for me they're easy to see. When I see one I usually go into the stderr and verify. Once I've done that I just abort them and then report them.

You could go thru the client state file and see if it's going to get out of that AR or have considerable more to do. At least you'll know what to expect and then if it's that bad, you could match them up in BM and abort them before they start and freeze up your sys.

Just make sure to post here if you do about the ones you found.
ID: 843790 · Report as offensive
Maik

Send message
Joined: 15 May 99
Posts: 163
Credit: 9,208,555
RAC: 0
Germany
Message 844082 - Posted: 23 Dec 2008, 7:59:08 UTC - in response to Message 843790.  

I checked the client_state.xml like reported at this post.
I have stopped all files i have found. Cuda is running very well without errors.

Thanks for this hint ;)
ID: 844082 · Report as offensive

Questions and Answers : GPU applications : CUDA task runs 8h at 0.00%, seems to be stuck


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.