Questions and Answers :
GPU applications :
CUDA still hangs!
Message board moderation
Author | Message |
---|---|
Alex Send message Joined: 9 Oct 00 Posts: 13 Credit: 3,766,221 RAC: 0 |
Hey all. I've been getting CUDA tasks that hang randomly until I either suspend and restart them (and BOINC) or abort them completely. I've gone through the forums and I thought I had the problem fixed, but it seems to be doing it again. I was hoping someone could look at my app_info file and let me know if I'm using the wrong apps, and what I'm doing wrong. Thanks, I really appreciate any help you can give! Machine specs: Phenom 8450 - Triple-core 2GB RAM 9600 GT 512MB Windows XP SP2 32bit <app_info> <app> <name>astropulse</name> </app> <file_info> <name>ap_5.00r103_SSE3.exe</name> <executable/> </file_info> <app_version> <app_name>astropulse</app_name> <version_num>500</version_num> <flops>5633505003</flops> <file_ref> <file_name>ap_5.00r103_SSE3.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>astropulse_v5</name> </app> <file_info> <name>ap_5.03r112_SSE3.exe</name> <executable/> </file_info> <app_version> <app_name>astropulse_v5</app_name> <version_num>503</version_num> <flops>6509828003</flops> <file_ref> <file_name>ap_5.03r112_SSE3.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>AK_v8_win_SSE3.exe</name> <executable/> </file_info> <file_info> <name>MB_6.08_mod_CUDA_V11_NoPerfLog.exe</name> <executable/> </file_info> <file_info> <name>cudart.dll</name> <executable/> </file_info> <file_info> <name>cufft.dll</name> <executable/> </file_info> <file_info> <name>libfftw3f-3-1-1a_upx.dll</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <platform>windows_intelx86</platform> <flops>4381615002</flops> <file_ref> <file_name>AK_v8_win_SSE3.exe</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>608</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.127970</avg_ncpus> <max_ncpus>0.127970</max_ncpus> <flops>7400000000</flops> <plan_class>cuda</plan_class> <file_ref> <file_name>MB_6.08_mod_CUDA_V11_NoPerfLog.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart.dll</file_name> </file_ref> <file_ref> <file_name>cufft.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> <coproc> <type>CUDA</type> <count>1</count> </coproc> </app_version> </app_info> |
Default Send message Joined: 23 Aug 08 Posts: 50 Credit: 2,222,384 RAC: 0 |
Alex, Have you looked at the hardware side of things yet? Open your taskmanager with BOINC running and see what percentage your CPU cores are running at and also how much available memory you have. This can be changed in "Preferences". 2 Gb's of RAM may not be adequate for a three or four core CPU with a lot of other processes running, especially under Vista. Also, check you fan speed and temperature of your GPU using Nvidia Ntune or Rivatuner. I run my GPU fans at 100 percent all the time. You might also try unchecking the box in preferences that allows GPU work while the computer is in use. |
Alex Send message Joined: 9 Oct 00 Posts: 13 Credit: 3,766,221 RAC: 0 |
Right now my CPU cores are all at 100% according to SpeedFan, and the GPU is maxing out at about 57 degrees C. Task Manager says I'm using 516MB of the RAM I have right now, and BOINC is set to use 90% while in use. Vista's a bit of a memory-hog, so I'm still running XP right now. I usually seem to have to abort the hanging tasks, and afterwards it works fine for another dozen CUDA tasks until it finds one it hangs on again. I'm not sure it's hardware related - I know there are some issues with BOINC and CUDA still. Are there any optimized apps out there that have gotten around these issues? |
Default Send message Joined: 23 Aug 08 Posts: 50 Credit: 2,222,384 RAC: 0 |
Your hardware seems OK. Have you tried this opp app yet, MB_6.08_mod_CUDA_V11_VLARKill_refined.exe? I had a problem with the diplay freezing on my 9600GT rig until I tried this one. Also, I might close any unnecessary background processes. |
Default Send message Joined: 23 Aug 08 Posts: 50 Credit: 2,222,384 RAC: 0 |
One other thing that seems to work sometimes is to disable the BOINC screensaver. Just set the screensaver to none in display properties, then in power settings turn off the display after however many minutes you like. Don't allow the PC to go into sleep. |
Alex Send message Joined: 9 Oct 00 Posts: 13 Credit: 3,766,221 RAC: 0 |
Thanks for your help! I've disabled anything else running on this machine, and never have the screensaver on. I can't for the life of me find MB_6.08_mod_CUDA_V11_VLARKill_refined.exe, I find a lot of references to it, but not a place to download it. Where can I find this app? Thanks! |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
http://lunatics.kwsn.net/12-gpu-crunching/v10-of-modified-seti-mb-cuda-opt-ap-package-for-full-multi-gpucpu-use.msg16812.html#msg16812 You do have to register on the site to download but it is a very good site. Ok, this should be a more direct link to the app. You may still have to register though... http://lunatics.kwsn.net/index.php?action=dlattach;topic=543.0;id=2572 PROUD MEMBER OF Team Starfire World BOINC |
Alex Send message Joined: 9 Oct 00 Posts: 13 Credit: 3,766,221 RAC: 0 |
Thanks for the link! Hopefully that'll work without a hitch. Everything seems to be going ok for right now, but I'll post if they start sticking again. Thanks! |
Alex Send message Joined: 9 Oct 00 Posts: 13 Credit: 3,766,221 RAC: 0 |
Nope, I'm getting work units that hang again! Just came home and it had gotten stuck on one at 0% for 4 hours. Any more ideas on how I can fix this? |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
You might try the idea posted here.. http://setiathome.berkeley.edu/forum_thread.php?id=53153&nowrap=true#886422 PROUD MEMBER OF Team Starfire World BOINC |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Nope, I'm getting work units that hang again! Just came home and it had gotten stuck on one at 0% for 4 hours. If you still have the option to RMA it back to the manufacturer I would do that. I had to do that with one of my 9800 GTX GPU's and have not had a problem since. It appears to me that you have exhausted all other possible resolutions. Boinc....Boinc....Boinc....Boinc.... |
Alex Send message Joined: 9 Oct 00 Posts: 13 Credit: 3,766,221 RAC: 0 |
Well, I can't RMA it (only 30 day warranty), and I even when I shut down and restart BOINC, it still tries to do the hung task. I guess I'm stuck just checking the machine every few hours, and hoping it hasn't hung! Hopefully a future app release will fix these issues. |
Default Send message Joined: 23 Aug 08 Posts: 50 Credit: 2,222,384 RAC: 0 |
I just found this posted in another thread, but worth a shot. May be useful in determining if you have a faulty vidcard: https://simtk.org/home/memtest/ |
Alex Send message Joined: 9 Oct 00 Posts: 13 Credit: 3,766,221 RAC: 0 |
I just found this posted in another thread, but worth a shot. May be useful in determining if you have a faulty vidcard: Thanks! I've been running it for about 1000 iterations now, and it's actually found 32 errors so far. I'm going to call the card manufacturer tomorrow, and see if I can get this replaced or repaired. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.