CUDA still hangs!

Questions and Answers : GPU applications : CUDA still hangs!
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Alex

Send message
Joined: 9 Oct 00
Posts: 13
Credit: 3,766,221
RAC: 0
United States
Message 892269 - Posted: 7 May 2009, 12:20:13 UTC

Hey all. I've been getting CUDA tasks that hang randomly until I either suspend and restart them (and BOINC) or abort them completely. I've gone through the forums and I thought I had the problem fixed, but it seems to be doing it again. I was hoping someone could look at my app_info file and let me know if I'm using the wrong apps, and what I'm doing wrong. Thanks, I really appreciate any help you can give!

Machine specs:
Phenom 8450 - Triple-core
2GB RAM
9600 GT 512MB
Windows XP SP2 32bit

<app_info>
<app>
<name>astropulse</name>
</app>
<file_info>
<name>ap_5.00r103_SSE3.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse</app_name>
<version_num>500</version_num>
<flops>5633505003</flops>
<file_ref>
<file_name>ap_5.00r103_SSE3.exe</file_name>
<main_program/>
</file_ref>
</app_version>

<app>
<name>astropulse_v5</name>
</app>
<file_info>
<name>ap_5.03r112_SSE3.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v5</app_name>
<version_num>503</version_num>
<flops>6509828003</flops>
<file_ref>
<file_name>ap_5.03r112_SSE3.exe</file_name>
<main_program/>
</file_ref>
</app_version>

<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>AK_v8_win_SSE3.exe</name>
<executable/>
</file_info>
<file_info>
<name>MB_6.08_mod_CUDA_V11_NoPerfLog.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft.dll</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-1-1a_upx.dll</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>603</version_num>
<platform>windows_intelx86</platform>
<flops>4381615002</flops>
<file_ref>
<file_name>AK_v8_win_SSE3.exe</file_name>
<main_program/>
</file_ref>
</app_version>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>608</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.127970</avg_ncpus>
<max_ncpus>0.127970</max_ncpus>
<flops>7400000000</flops>
<plan_class>cuda</plan_class>
<file_ref>
<file_name>MB_6.08_mod_CUDA_V11_NoPerfLog.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart.dll</file_name>
</file_ref>
<file_ref>
<file_name>cufft.dll</file_name>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-1-1a_upx.dll</file_name>
</file_ref>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
</app_version>
</app_info>
ID: 892269 · Report as offensive
Default
Avatar

Send message
Joined: 23 Aug 08
Posts: 50
Credit: 2,222,384
RAC: 0
United States
Message 892479 - Posted: 7 May 2009, 22:58:16 UTC - in response to Message 892269.  

Alex,
Have you looked at the hardware side of things yet? Open your taskmanager with BOINC running and see what percentage your CPU cores are running at and also how much available memory you have. This can be changed in "Preferences". 2 Gb's of RAM may not be adequate for a three or four core CPU with a lot of other processes running, especially under Vista. Also, check you fan speed and temperature of your GPU using Nvidia Ntune or Rivatuner. I run my GPU fans at 100 percent all the time. You might also try unchecking the box in preferences that allows GPU work while the computer is in use.
ID: 892479 · Report as offensive
Profile Alex

Send message
Joined: 9 Oct 00
Posts: 13
Credit: 3,766,221
RAC: 0
United States
Message 892608 - Posted: 8 May 2009, 5:25:11 UTC - in response to Message 892479.  

Right now my CPU cores are all at 100% according to SpeedFan, and the GPU is maxing out at about 57 degrees C. Task Manager says I'm using 516MB of the RAM I have right now, and BOINC is set to use 90% while in use. Vista's a bit of a memory-hog, so I'm still running XP right now.

I usually seem to have to abort the hanging tasks, and afterwards it works fine for another dozen CUDA tasks until it finds one it hangs on again. I'm not sure it's hardware related - I know there are some issues with BOINC and CUDA still. Are there any optimized apps out there that have gotten around these issues?
ID: 892608 · Report as offensive
Default
Avatar

Send message
Joined: 23 Aug 08
Posts: 50
Credit: 2,222,384
RAC: 0
United States
Message 892635 - Posted: 8 May 2009, 10:05:33 UTC - in response to Message 892608.  

Your hardware seems OK. Have you tried this opp app yet, MB_6.08_mod_CUDA_V11_VLARKill_refined.exe? I had a problem with the diplay freezing on my 9600GT rig until I tried this one. Also, I might close any unnecessary background processes.
ID: 892635 · Report as offensive
Default
Avatar

Send message
Joined: 23 Aug 08
Posts: 50
Credit: 2,222,384
RAC: 0
United States
Message 892637 - Posted: 8 May 2009, 10:09:32 UTC - in response to Message 892635.  

One other thing that seems to work sometimes is to disable the BOINC screensaver. Just set the screensaver to none in display properties, then in power settings turn off the display after however many minutes you like. Don't allow the PC to go into sleep.
ID: 892637 · Report as offensive
Profile Alex

Send message
Joined: 9 Oct 00
Posts: 13
Credit: 3,766,221
RAC: 0
United States
Message 892719 - Posted: 8 May 2009, 16:02:01 UTC

Thanks for your help! I've disabled anything else running on this machine, and never have the screensaver on. I can't for the life of me find MB_6.08_mod_CUDA_V11_VLARKill_refined.exe, I find a lot of references to it, but not a place to download it. Where can I find this app?

Thanks!
ID: 892719 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 892729 - Posted: 8 May 2009, 16:34:15 UTC - in response to Message 892719.  
Last modified: 8 May 2009, 16:37:22 UTC

http://lunatics.kwsn.net/12-gpu-crunching/v10-of-modified-seti-mb-cuda-opt-ap-package-for-full-multi-gpucpu-use.msg16812.html#msg16812


You do have to register on the site to download but it is a very good site.

Ok, this should be a more direct link to the app. You may still have to register though... http://lunatics.kwsn.net/index.php?action=dlattach;topic=543.0;id=2572


PROUD MEMBER OF Team Starfire World BOINC
ID: 892729 · Report as offensive
Profile Alex

Send message
Joined: 9 Oct 00
Posts: 13
Credit: 3,766,221
RAC: 0
United States
Message 892802 - Posted: 8 May 2009, 21:36:01 UTC - in response to Message 892729.  

Thanks for the link! Hopefully that'll work without a hitch. Everything seems to be going ok for right now, but I'll post if they start sticking again.

Thanks!
ID: 892802 · Report as offensive
Profile Alex

Send message
Joined: 9 Oct 00
Posts: 13
Credit: 3,766,221
RAC: 0
United States
Message 893121 - Posted: 9 May 2009, 19:17:42 UTC

Nope, I'm getting work units that hang again! Just came home and it had gotten stuck on one at 0% for 4 hours.

Any more ideas on how I can fix this?
ID: 893121 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 893134 - Posted: 9 May 2009, 19:49:21 UTC - in response to Message 893121.  

You might try the idea posted here.. http://setiathome.berkeley.edu/forum_thread.php?id=53153&nowrap=true#886422


PROUD MEMBER OF Team Starfire World BOINC
ID: 893134 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 893175 - Posted: 9 May 2009, 21:35:12 UTC - in response to Message 893121.  

Nope, I'm getting work units that hang again! Just came home and it had gotten stuck on one at 0% for 4 hours.

Any more ideas on how I can fix this?


If you still have the option to RMA it back to the manufacturer I would do that. I had to do that with one of my 9800 GTX GPU's and have not had a problem since. It appears to me that you have exhausted all other possible resolutions.

Boinc....Boinc....Boinc....Boinc....
ID: 893175 · Report as offensive
Profile Alex

Send message
Joined: 9 Oct 00
Posts: 13
Credit: 3,766,221
RAC: 0
United States
Message 893363 - Posted: 10 May 2009, 14:38:31 UTC

Well, I can't RMA it (only 30 day warranty), and I even when I shut down and restart BOINC, it still tries to do the hung task. I guess I'm stuck just checking the machine every few hours, and hoping it hasn't hung! Hopefully a future app release will fix these issues.
ID: 893363 · Report as offensive
Default
Avatar

Send message
Joined: 23 Aug 08
Posts: 50
Credit: 2,222,384
RAC: 0
United States
Message 893520 - Posted: 10 May 2009, 21:27:10 UTC

I just found this posted in another thread, but worth a shot. May be useful in determining if you have a faulty vidcard:

https://simtk.org/home/memtest/
ID: 893520 · Report as offensive
Profile Alex

Send message
Joined: 9 Oct 00
Posts: 13
Credit: 3,766,221
RAC: 0
United States
Message 893526 - Posted: 10 May 2009, 22:31:01 UTC - in response to Message 893520.  

I just found this posted in another thread, but worth a shot. May be useful in determining if you have a faulty vidcard:

https://simtk.org/home/memtest/


Thanks! I've been running it for about 1000 iterations now, and it's actually found 32 errors so far. I'm going to call the card manufacturer tomorrow, and see if I can get this replaced or repaired.
ID: 893526 · Report as offensive

Questions and Answers : GPU applications : CUDA still hangs!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.