Is your CUDA client crashing/giving computation errors?

Questions and Answers : GPU applications : Is your CUDA client crashing/giving computation errors?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile enusbaum
Volunteer tester

Send message
Joined: 29 Apr 00
Posts: 15
Credit: 5,921,750
RAC: 0
United States
Message 844484 - Posted: 24 Dec 2008, 7:10:36 UTC
Last modified: 24 Dec 2008, 7:15:14 UTC

I had the same problem. I tried three different drivers (178.74, 180.48, 180.84) and also updated to the LATEST BOINC client (6.5.0) with no help to the issue.

What fixed my issue? Cooling!

I installed an application called SpeedFan which told me while a SETI CUDA client was running that my GPU temp was getting as high as 90c (194f)! Holy smokes!

It seems that the nVidia drivers for whatever reason don't realize that CUDA work is spiking the GPU (because it's not DirectX or OpenGL) so it keeps the fan speed low as to not cause too much noise.

The problem? The heat then causes system instability! I even had my Vista system reboot on me with a BSOD!

The solution? An application RivaTuner. It allowed me to set my GPU fan at 100% permanently which has solved my issue (for now at least)! My GPU temp now hovers around 65c (149f) which is hot, but within safe limits :)

My only request to the SETI CUDA developers would be to hook into the nvidia API and set the fan speed while crunching GPU units.

Please give this solution a go and let me know if it works for anyone else! I think GPU computing is the future for SETI and if we can stabilize the situation quickly it'll be much better for the project!

Cheers!

SpeedFan: http://www.almico.com/sfdownload.php
Riva Tuner: http://downloads.guru3d.com/downloadget.php?id=163&file=7&evp=2d213e528fb556c354432a3976bff55a
ID: 844484 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 844767 - Posted: 24 Dec 2008, 23:39:45 UTC - in response to Message 844484.  

I have forwarded your request to the developers. They just confirmed to me that they are looking into options to do thermal throttling on the GPU or let the drivers control the fan speed on the graphics card.
ID: 844767 · Report as offensive
Profile Ian Terry

Send message
Joined: 22 May 99
Posts: 1
Credit: 1,725,013
RAC: 0
United Kingdom
Message 845088 - Posted: 25 Dec 2008, 20:22:29 UTC

After running on NVidia CUDA for 3 or 4 hours I got a pixellated screen and "Driver not responding" errors. Tried this three times - same effect. Suspending SETI fixed this. I ran these recommended NVidia utilities and found that my fan speed was fixed at 26% giving 58C without SETI, so I guess it was a cooling issue. I've increased fan speed to 70%, restarted SETI on CUDA and monitored it and all seems fine now - steady at 53C, 5 degrees lower than normal. SETI for CUDA needs to automate this real soon or users will cook their NVidia cards - not good P.R. for the project!
ID: 845088 · Report as offensive

Questions and Answers : GPU applications : Is your CUDA client crashing/giving computation errors?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.