Cuda Making video cards over heat!

Questions and Answers : GPU applications : Cuda Making video cards over heat!
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Robert Curtiss
Avatar

Send message
Joined: 9 Apr 08
Posts: 49
Credit: 3,105,422
RAC: 0
United States
Message 896338 - Posted: 18 May 2009, 6:40:51 UTC

Then the system reboots. The GPU'S get to about 70C then system reboots. Anyway to Cool them down? I have x2 Ge force 285's OC. I tried Putting SLI on so it would just run 1 GPU but it still over heated.


Here is the machine:

http://setiathome.berkeley.edu/show_host_detail.php?hostid=4934218
ID: 896338 · Report as offensive
Profile Robert Curtiss
Avatar

Send message
Joined: 9 Apr 08
Posts: 49
Credit: 3,105,422
RAC: 0
United States
Message 896340 - Posted: 18 May 2009, 6:50:54 UTC - in response to Message 896338.  

This machine can reach well over 20k Rac. If i can get the video cards cooled down.
ID: 896340 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 896342 - Posted: 18 May 2009, 7:04:48 UTC

70 is no where near the max temp for any GPU. It must be some other component of the computer overheating/failing.

Could you try a) running the cuda memory tester Here
b) setting the cards back to stock clocks
c) rolling back to driver version 182.xx

Bob


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 896342 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 896360 - Posted: 18 May 2009, 9:36:45 UTC - in response to Message 896338.  
Last modified: 18 May 2009, 9:41:15 UTC

Then the system reboots. The GPU'S get to about 70C then system reboots. Anyway to Cool them down? I have x2 Ge force 285's OC. I tried Putting SLI on so it would just run 1 GPU but it still over heated.


Here is the machine:

http://setiathome.berkeley.edu/show_host_detail.php?hostid=4934218


Are your video cards in adjacent slots? try separating them by 1 slot and also use a 120mm fan near the front aiming toward the back. it sounds like they are in a 'hot spot' in the case and there is insufficient air flow to keep the ambient temperature cool enough. A machine like that will need at the minimum 90 to 120cfm air flow out the back and top. There is no such thing as 'quiet'fan operation for this powerhouse.

Which case do you have?

What size power supply do you have? Insufficient power can cause random reboots too.

Also, 70C is considered warm not hot for these vid cards so there may be a motherboard setting that is set improperly as well such as too fast a pci-e bus speed. It is not impossible to have some bad ram in one of the video cards. Try running one at a time to see if one of the cards is causing this. If one at a time does not cause reboot for either card, then it may point to a probable insufficient power supply especially if anything is overclocked.

Do not overlook PopandBob's advice. He has given solid recommendations.
ID: 896360 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 896363 - Posted: 18 May 2009, 9:54:27 UTC - in response to Message 896338.  

Then the system reboots. The GPU'S get to about 70C then system reboots. Anyway to Cool them down? I have x2 Ge force 285's OC. I tried Putting SLI on so it would just run 1 GPU but it still over heated.


Here is the machine:

http://setiathome.berkeley.edu/show_host_detail.php?hostid=4934218

Just curious what you have the cards fan speed set at? I didn't see any mention of it in your post, so if you can set that to a higher %, that could bring the temp down too.
ID: 896363 · Report as offensive
Profile Robert Curtiss
Avatar

Send message
Joined: 9 Apr 08
Posts: 49
Credit: 3,105,422
RAC: 0
United States
Message 896399 - Posted: 18 May 2009, 13:00:23 UTC - in response to Message 896363.  

The cards are sitting right next to each other. And the Fans are going at 1725 RPM at about 60-70C. When i get home ill try moving them apart. Ill keep you posted.
ID: 896399 · Report as offensive
Profile Robert Curtiss
Avatar

Send message
Joined: 9 Apr 08
Posts: 49
Credit: 3,105,422
RAC: 0
United States
Message 896401 - Posted: 18 May 2009, 13:06:25 UTC - in response to Message 896360.  

It has a 1600 Watt PSU I have a Thermaltake Speedo with 15 Fans modded in it.(Including 2 230MM Fans one on side and on TOP. Motherboard is a ASUS Rampage 2 Extreme.
ID: 896401 · Report as offensive
Profile Robert Curtiss
Avatar

Send message
Joined: 9 Apr 08
Posts: 49
Credit: 3,105,422
RAC: 0
United States
Message 896403 - Posted: 18 May 2009, 13:13:16 UTC - in response to Message 896401.  

That Mobo has 3 PCI-E but only 2 of them are PCI-E 2.0. The other is a PCIE-E 16x normal. How much of a performance decrease do you think ill be looking at?
ID: 896403 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 896406 - Posted: 18 May 2009, 13:20:19 UTC
Last modified: 18 May 2009, 13:29:48 UTC

How much watts is your PSU really supplying? Sounds like every time I try to run a card like that without upgrading PSU I have to underclock undervolt to prevent machine turning off at random. This indicates voltages (particularly 12v bus) are at or near spec limits.

Do you have ability to check operating voltages under full load?

Download Hardware Monitor 114..

Use this program to check your voltages. I would be interested in your 12v, Vcore, 5v, 3.3v, NB and memory voltages. When I installed my 1060 and 9800GTX with an Antec 750w PS. My 12v bus got down to 10.4v before computer would turn off. That definitely can't be good. Too low voltage condition scary for components. also 5V was less than 1v. Currently with 3 cards my system draws ~780w @ the wall. Obviously anything less than 1200watt PS would be flaky at best with this configuration. Problem with PSUs is, I can buy a $50-$500 PSU all having the same wattage rating. They aren't all the same. Many companies especially targeting gaming/highend applications will overrate there PSUs.

Only way to know for sure is check what the voltage regulators are delivering to the sensors under max loading.

Good luck. Let us know. I would look to low voltage condition rather than too much heat.
Easy can find out. Undervolt and underclock until voltages remail in spec. Card heating should remain constant. So if sytem stable then know problem is PS related not card related directly.
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 896406 · Report as offensive
Profile Robert Curtiss
Avatar

Send message
Joined: 9 Apr 08
Posts: 49
Credit: 3,105,422
RAC: 0
United States
Message 896408 - Posted: 18 May 2009, 13:30:46 UTC - in response to Message 896406.  

CPUID Hardware Monitor 1.1.4.0
-----------------------------------------------------

Mainboard Vendor ASUSTeK Computer INC.
Mainboard Model Rampage II Extreme (0x669 - 0xCCE9B6A0)

LPCIO
-----------------------------------------------------
Vendor Winbond
Model W83667HG
Vendor ID 0x5CA3
Chip ID 0xA5
Revision ID 0x13
Config Mode I/O address 0x2E
Hardware monitor
-----------------------------------------------------

Winbond W83627DHG hardware monitor

Voltage sensor 0 1.14 Volts [0x8F] (CPU VCORE)
Voltage sensor 1 12.43 Volts [0xDF] (+12V)
Voltage sensor 2 3.36 Volts [0xD2] (AVCC)
Voltage sensor 3 3.34 Volts [0xD1] (+3.3V)
Voltage sensor 4 5.06 Volts [0xD3] (+5V)
Voltage sensor 6 1.70 Volts [0xD4] (DRAM)
Temperature sensor 0 31°C (87°F) [0x1F] (SYSTIN)
Temperature sensor 1 51°C (123°F) [0x66] (CPUTIN)
Temperature sensor 2 6°C (42°F) [0x1F4] (AUXTIN)
Fan sensor 1 2280 RPM [0x25] (CPUFANIN0)
Intel Core i7 920 hardware monitor

Power sensor 0 149.84 W [0xFFFFFFFF] (Processor)
Temperature sensor 0 65°C (149°F) [0x23] (Core #0)
Temperature sensor 1 61°C (141°F) [0x27] (Core #1)
Temperature sensor 2 64°C (147°F) [0x24] (Core #2)
Temperature sensor 3 61°C (141°F) [0x27] (Core #3)

Dump hardware monitor



Hardware monitor
-----------------------------------------------------

GeForce GTX 285 hardware monitor

Temperature sensor 0 68°C (154°F) [0x44] (GPU Core)

Dump hardware monitor



Hardware monitor
-----------------------------------------------------

ST31000333AS hardware monitor

Temperature sensor 0 33°C (91°F) [0x21] (HDD)

Dump hardware monitor



Hardware monitor
-----------------------------------------------------

ST31500341AS hardware monitor

Temperature sensor 0 35°C (95°F) [0x23] (HDD)

Dump hardware monitor


ID: 896408 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 896410 - Posted: 18 May 2009, 13:36:46 UTC
Last modified: 18 May 2009, 13:41:11 UTC

LOL...

*shrugs*
Ermm, Never mind....Take your cases off (if you are comfortable with that) and clean all the stuffs out of the fins deep deep inside and squirrel fan. Wouldn't believe what these cards pick up in a month in a seemingly clean environment...

Also, I have a card of 8xxx 9xxx 2xxx 1xxx and all run about 130F under full load. I have tested them each as stable all the way up to 200F for 24 hours.
My wife's 9500GT sits at about 212F 24/7 due to overclock and micro case.

Edit to add:

Voltage sensor 0 1.14 Volts [0x8F] (CPU VCORE)

Is this sufficient?
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 896410 · Report as offensive
Profile Robert Curtiss
Avatar

Send message
Joined: 9 Apr 08
Posts: 49
Credit: 3,105,422
RAC: 0
United States
Message 896412 - Posted: 18 May 2009, 13:41:22 UTC - in response to Message 896410.  

Haha there is no dust everything just came out of the BOX new!
ID: 896412 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 896413 - Posted: 18 May 2009, 13:54:55 UTC - in response to Message 896410.  

LOL...

*shrugs*
Ermm, Never mind....Take your cases off (if you are comfortable with that) and clean all the stuffs out of the fins deep deep inside and squirrel fan. Wouldn't believe what these cards pick up in a month in a seemingly clean environment...

Also, I have a card of 8xxx 9xxx 2xxx 1xxx and all run about 130F under full load. I have tested them each as stable all the way up to 200F for 24 hours.
My wife's 9500GT sits at about 212F 24/7 due to overclock and micro case.

Edit to add:

Voltage sensor 0 1.14 Volts [0x8F] (CPU VCORE)

Is this sufficient?


That does seem low, mine is 1.23v. What cpu?

Hehe yeah I literally once a month give my workstation a complete 'bath' along with air-blow out of all heatsinks everywhere and thorough vacuuming, remove case fans and clean fins etc. It is amazing the junk they pick up!

Every 2 months I remove my cpu cooler (Zalman 9700N which gathers dirt like it was a magnet!) to clean the lower fins that nothing can get to since they are circular and the bottom fins are positioned such that little can get through besides an air-blow from the front. Unfortunately this wastes thermal compound since I have to clean and re-apply every time I remove it, but my machine runs nice and cool in a hot florida environment so I have no complaints.

ID: 896413 · Report as offensive
Profile Robert Curtiss
Avatar

Send message
Joined: 9 Apr 08
Posts: 49
Credit: 3,105,422
RAC: 0
United States
Message 896825 - Posted: 19 May 2009, 3:00:54 UTC - in response to Message 896413.  

So.... my GPU's fans where only going at 40%. I cranked then to 100% (Loud as hell) but has been running stable for 1 hour now (Crosses Fingers)
ID: 896825 · Report as offensive
Profile KW2E
Avatar

Send message
Joined: 18 May 99
Posts: 346
Credit: 104,396,190
RAC: 34
United States
Message 896847 - Posted: 19 May 2009, 4:37:16 UTC - in response to Message 896825.  

The fans on my 295's are at 100% all the time.

I just installed a heat exhaust system on my machine that suckes the hot air out of the back of the computer and blows it outside. This has helped a great deal in cooling things down a bit in here.

My 295's are running between 68 and 74 at full load now.

Rob
ID: 896847 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 896861 - Posted: 19 May 2009, 5:34:34 UTC

oh yea..there's that...
"stupid auto fan control !#!@#%$^%@" <-In Homer Simpsons voice

Glad you got it sorted! *thumbs up*


--------
off topic:
@Rob...How's that RAC? ;)
Have to giggle everytime I see is. *grins*

"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 896861 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 896917 - Posted: 19 May 2009, 11:21:58 UTC - in response to Message 896861.  

oh yea..there's that...
"stupid auto fan control !#!@#%$^%@" <-In Homer Simpsons voice

Glad you got it sorted! *thumbs up*


--------
off topic:
@Rob...How's that RAC? ;)
Have to giggle everytime I see is. *grins*


does auto fan work in your 1060? mine says auto but i never see the speed change.
when it boots its manual so i set it via the driver.
ID: 896917 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 896921 - Posted: 19 May 2009, 11:30:11 UTC - in response to Message 896917.  

same same...Always used manual set to 100% so not really thought about it. I think it may have worked with 182.50 Quatro drivers? Cause I just recently discovered it no worky now; thought it odd and wondered if always been that way.
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 896921 · Report as offensive
Profile Bob Mahoney Design
Avatar

Send message
Joined: 4 Apr 04
Posts: 178
Credit: 9,205,632
RAC: 0
United States
Message 897234 - Posted: 20 May 2009, 13:36:23 UTC

Before anyone decides their CUDA card lacks auto fan control, you need to let it get over 74C or more. My GTX295 and GTX285 cards sit at 40% fan until the temp gets up towards 80C, then the fans slowly increase in speed. For example, the two GTX285 sweltering in my new build are running at 82C and 79C, respective fan speeds (on auto, exactly as new out of box) are at 58% and 47%.

My priority is noise abatement before equipment longevity, and the auto fan setting has kept noise very low.

Disclaimer: I can't guarantee how long they will live at those temperatures. I have had a few GTX295 running in the low 80's for 4 months now.

These temps are what I see in Rivatuner, EVGA Precision, and Realtemp, which all agree with eachother, a nice surprise.

Bob
Opinion stated as fact? Who, me?
ID: 897234 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 897311 - Posted: 20 May 2009, 16:49:19 UTC - in response to Message 897234.  

Before anyone decides their CUDA card lacks auto fan control, you need to let it get over 74C or more. My GTX295 and GTX285 cards sit at 40% fan until the temp gets up towards 80C, then the fans slowly increase in speed. For example, the two GTX285 sweltering in my new build are running at 82C and 79C, respective fan speeds (on auto, exactly as new out of box) are at 58% and 47%.

My priority is noise abatement before equipment longevity, and the auto fan setting has kept noise very low.

Disclaimer: I can't guarantee how long they will live at those temperatures. I have had a few GTX295 running in the low 80's for 4 months now.

These temps are what I see in Rivatuner, EVGA Precision, and Realtemp, which all agree with eachother, a nice surprise.

Bob


interesting... i wonder if there is a way to set the threshold lower.. i dont like my card running hotter than 65c (without cuda it stays a comfy 45-47c) and fan noise is not a concern so i would think 100% by 70c is more than plenty hot. my office sounds like a jet engine warming up anyway :)

will test that out though just to see if auto does really work.
ID: 897311 · Report as offensive
1 · 2 · Next

Questions and Answers : GPU applications : Cuda Making video cards over heat!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.