Cuda running much better when GPU under 60 C

Questions and Answers : GPU applications : Cuda running much better when GPU under 60 C
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile MrJeep
Avatar

Send message
Joined: 29 May 99
Posts: 29
Credit: 38,476,981
RAC: 0
United States
Message 849298 - Posted: 4 Jan 2009, 16:01:32 UTC

I may be wrong on this so correct me if I am. I was getting a whole bunch of failed work units in the beginning after upgrading to boink 6.4.5. Than got hold of EVGA Precision V 1.4.0 and started running the cooling fan at 50% and faster , default or auto was set at 30%. Now the GPU cooling fan is running at 80% and temperature went down to below 60 C. Ever since I have had only few work units fail. Now I do not overclock the graphic card at all and am not running optimized application on this computer. Hey it works for me maybe it'll work for you.
MrJeep
ID: 849298 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 849356 - Posted: 4 Jan 2009, 18:09:33 UTC - in response to Message 849298.  
Last modified: 4 Jan 2009, 18:10:14 UTC

I may be wrong on this so correct me if I am. I was getting a whole bunch of failed work units in the beginning after upgrading to boink 6.4.5. Than got hold of EVGA Precision V 1.4.0 and started running the cooling fan at 50% and faster , default or auto was set at 30%. Now the GPU cooling fan is running at 80% and temperature went down to below 60 C. Ever since I have had only few work units fail. Now I do not overclock the graphic card at all and am not running optimized application on this computer. Hey it works for me maybe it'll work for you.
MrJeep


I think that is just a coincidence. Supposidly, the GPU can take temps up to 110c (cuda forum at nvidia) and I am running 57c (no load) - 63c (full load) I have two systems and both of them get a bunch of bad WU's in a row then seem to get some good ones. Switching to gpugrid does not generate any similar problems so I assume it is a problem processing the data.

There could also be a memory release problem (leakage) and that would be most evident in seti since they have very short duration wu's compared to gpugrid. I cannot seem to go over about 24 hours without a reboot when running seti or seti beta but I can an easily last an entire weekend or more with gpugrid.
ID: 849356 · Report as offensive
Profile MrJeep
Avatar

Send message
Joined: 29 May 99
Posts: 29
Credit: 38,476,981
RAC: 0
United States
Message 849787 - Posted: 5 Jan 2009, 19:17:05 UTC - in response to Message 849356.  


I think that is just a coincidence. Supposedly, the GPU can take temps up to 110c (cuda forum at nvidia) and I am running 57c (no load) - 63c (full load) I have two systems and both of them get a bunch of bad WU's in a row then seem to get some good ones. Switching to gpugrid does not generate any similar problems so I assume it is a problem processing the data.

There could also be a memory release problem (leakage) and that would be most evident in seti since they have very short duration wu's compared to gpugrid. I cannot seem to go over about 24 hours without a reboot when running seti or seti beta but I can an easily last an entire weekend or more with gpugrid.[/quote]

yeah.... I think you may be right, after letting the EVGA Precision run in auto again it seems that nothing has changed. Even though the GPU temperature hoovers around 70/80 C no more failures or at least not as often.
ID: 849787 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 849859 - Posted: 5 Jan 2009, 22:30:32 UTC - in response to Message 849356.  

Supposidly, the GPU can take temps up to 110c (cuda forum at nvidia) and I am running 57c (no load) - 63c (full load)


That depends. I've seen earlier generation GPUs (not necessarily NVIDIA GPUs) that needed to be underclocked straight out of the box in order to function properly. If a GPU consistently fails at what should be a normal operating temperature, and if underclocking or boosting the fan from the auto setting solves the problem, you might have a problem card, or a problem with the heat sink mounting. It's also possible that the automatic fan setting isn't speeding up the fan soon enough.

Try out some 3D benchmarks. See if they operate properly at the same temperatures. That's not necessarily proof that numerics should work, since most 3D benchmarks don't check whether the results are mathematically correct. Many just look for image defects (pixels that contain obviously bad values).

If 3D benchmarks also fail at normal operating temperatures, and if you are under warranty, don't do anything that might void the warranty. You might want to investigate getting a replacement if you are convinced its a hardware problem.



@SETIEric@qoto.org (Mastodon)

ID: 849859 · Report as offensive

Questions and Answers : GPU applications : Cuda running much better when GPU under 60 C


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.