Noobie CUDA GPU temperature worry

Message boards : Number crunching : Noobie CUDA GPU temperature worry
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Tony Habergham

Send message
Joined: 17 May 99
Posts: 10
Credit: 9,453,098
RAC: 7
United Kingdom
Message 940734 - Posted: 17 Oct 2009, 9:43:50 UTC

Please excuse if this is going over old ground, but a half-hour search didn't turn up anything relevant.

After passively boincing for years, I checked the seti forums a few weeks ago, and learned about CUDA.

Pursuading myself that my eighteen-month-old ATI card was on it's last legs, I splashed out on ebay and yesterday acquired a 8800 GTS OC with 320 MB.

Installed it last night; seems OK: processing WU's in 12-40 minutes, compared to 8-10 hours for the CPU.

However GPU-z is reporting temperatures of 88-93C, which is worrying me a lot - the old ATI card peaked at 60C in games, and my CPU claims to be a tepid 37C these days.

I've browsed the Nvidia forum and cant get a clear answer - it varies from 'anything over 70C is slowly cooking your memory' to 'the GPUs are fine up to 120C'.

I'm thinking of using an app (any recommendations?) to throttle back the card - I understand the 'OC' means it's factory overclocked already. On the other hand, it's not actually givining any problems, even when 'utilize GPU when computer is active'.

Any considered advice on the long-term impacts of this temperature?

TonyH
ID: 940734 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 940735 - Posted: 17 Oct 2009, 10:01:26 UTC - in response to Message 940734.  
Last modified: 17 Oct 2009, 10:02:08 UTC

Probably not too worrying in itself but I assume all that heat is being dumped into the case which can also affect other components.

If you do want to play around with the settings, the easiest tool I have found it EVGA Precision (available here) which seems to work fine with any make of graphics card.

I've no experience with the 8800 but you may find that the automatic fan speed does not kick in until the temps get quite high and setting the fan speed manually to 100% with Precision will bring the temp down to more comfortable levels. Alternatively, the tool will allow you to underclock the card.

F.

[edit]And - welcome to the Boards[/edit]
ID: 940735 · Report as offensive
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 940746 - Posted: 17 Oct 2009, 11:44:23 UTC
Last modified: 17 Oct 2009, 11:46:12 UTC

Just be aware that conventional Video cards are neither designed not specced for 24/7 data crunching and its ensuing thermal stress.

For a few, irregular hours of gaming each day, such temperatures may be okay and will result in a normal life span of the device.

But as a permanent load, critical parts like VRM modules, condensators and to some extend the VRAM will experience a reduced life span. With these going defective, a still intact GPU (which can accept such temperatures for long periods of time) becomes worthless. The formentioned components are the weakest links in that chain.

For anything that goes like 24/7 GPU crunching, one should have clearly above-average case cooling, specifically a cooling that directly benefits the video card.
Stuffed into generic 0815-cases or given inadequate cooling for the job, those modern cards will run for a variable while, then die.
ID: 940746 · Report as offensive
Nick

Send message
Joined: 16 Oct 09
Posts: 81
Credit: 112,909
RAC: 0
Canada
Message 940762 - Posted: 17 Oct 2009, 13:04:51 UTC - in response to Message 940734.  



Pursuading myself that my eighteen-month-old ATI card was on it's last legs, I splashed out on ebay and yesterday acquired a 8800 GTS OC with 320 MB.

Installed it last night; seems OK: processing WU's in 12-40 minutes, compared to 8-10 hours for the CPU.

However GPU-z is reporting temperatures of 88-93C, which is worrying me a lot - the old ATI card peaked at 60C in games, and my CPU claims to be a tepid 37C these days.

I've browsed the Nvidia forum and cant get a clear answer - it varies from 'anything over 70C is slowly cooking your memory' to 'the GPUs are fine up to 120C'.

I'm thinking of using an app (any recommendations?) to throttle back the card - I understand the 'OC' means it's factory overclocked already. On the other hand, it's not actually givining any problems, even when 'utilize GPU when computer is active'.

Any considered advice on the long-term impacts of this temperature?

TonyH


88-93C seems on the hot side. What is GPU-Z saying the fan speed is? This may be a clue for you. My GTS250 has been running now for 18 hrs non-stop and it's at 67-68C with a fan speed of 49-50%.

My ATX tower has one HD and CD drive. I had to remove the cover where the second CD goes to get more air into the box because it was starving for cold air. This lowered the temp by 5C.

See if you can get more fresh air into the box.
ID: 940762 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 15956
Credit: 7,508,002
RAC: 20
United Kingdom
Message 940765 - Posted: 17 Oct 2009, 13:16:38 UTC - in response to Message 940746.  
Last modified: 17 Oct 2009, 13:23:02 UTC

Just be aware that conventional Video cards are neither designed not specced for 24/7 data crunching and its ensuing thermal stress.

[...]

For anything that goes like 24/7 GPU crunching, one should have clearly above-average case cooling, specifically a cooling that directly benefits the video card.
Stuffed into generic 0815-cases or given inadequate cooling for the job, those modern cards will run for a variable while, then die.

OK... So...

What is 'different' for a graphics card after running at 100% for 24 hours over running at 100% for 1 hour?

If cooling isn't adequate, then it isn't adequate. That makes no difference for whether the cooling is inadequate for 1 hour or 24 hours, it's still inadequate just the same!

The real test is whether the temperatures seen are considered 'reasonable' or not.


Happy fast crunchin',
Martin

[edit]

Aside: I've been running CUDA on a passively cooled GPU for months now, 24/7, without any issues at all. The GPU reports 60 - 68 deg C depending on what WU.

[/edit]
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 940765 · Report as offensive
Profile Tony Habergham

Send message
Joined: 17 May 99
Posts: 10
Credit: 9,453,098
RAC: 7
United Kingdom
Message 940778 - Posted: 17 Oct 2009, 14:02:13 UTC

Thanks to all who've responded, several nice pointers to go on.

I do think the temps were too high, having had one 'Computation error' today, which I've never seen before. On the other hand, it's the fans which seem to fail on graphics cards these days, rather than the electronics directly. It was having blue screens on high GPU load which started me paying attention to GPU temp, and running GPU-z. I was running passively cooled, without knowing it, for months).

I dont have my machine on 24/7 since they invested global warming (and it seems kind of silly when I have a 60% dedication to climatepredition.net) - just about 6 hours straight in the evenings and 14 hours at wekends - just so I can pop back for a quick surf as the mood takes me).

EVGA Precision is very nice (there's a more recent release on the link given). Throttling back to the Minimimum 415 MHz core seems to give temps in the mid/high 80's, which seems more comfortable place for me. Does anyone have any advice on tweaking the settings for seti? The shader clock is independently settable, and I got the impression that these were the beasties which were doing the work for CUDA/Boinc. And the memory clock... the 800MHz DDR3 seems to be set to run at 792 as standard. I understand heard faster memory is better for seti, and wouldn't genarate heat, I'd think. I'm open to input.

Better case cooling is an obvious step, but I am sat in front of the thing for a fair proportion of the time while crunching (either surfing or running 15 year old games with DosBox) so I dont want too much noise. In any event, there's not much room left, without a major restructure... the dual-width graphics card (which is venting outside the case) leaves no room for the modified case (quieter fan) cooler I had in as well, before the graphics upgrade.

I was quite pleased with the main setup I'd just got to - picked up a massive dual-heatpipe cpu cooler cheaply, which was capable of keeping my Athlon 64 at below 55C at 100% load, even with the CPU fan off completely. It produced CPU temps in the high 30's at minimum fan speed, producing a nice airflow over the ramsinks and chipset sink as well.

Anyway, thanks to all.

TonyH
ID: 940778 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51445
Credit: 1,018,363,574
RAC: 1,004
United States
Message 940782 - Posted: 17 Oct 2009, 14:18:53 UTC

I asked the question directly to an EVGA tech.....

He simply responded........they just run that hot.
Flat response....

I guess that's why most of the top end cards at EVGA have a lifetime warranty if you register them.
They are gonna cook sooner or later. No electronics can run that hot forever......or can they?
LOL. I should know..........

My last known surviving core 2 duo is still running strong at 59c.
And has been for years now.

It's the last dual core I am running. And do so mostly for it's keepsake value.

I have considered taking it offline in deference to a quad, but just can't quite bring myself to do it.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 940782 · Report as offensive
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 942794 - Posted: 25 Oct 2009, 19:42:11 UTC - in response to Message 940765.  
Last modified: 25 Oct 2009, 19:46:41 UTC

OK... So...

What is 'different' for a graphics card after running at 100% for 24 hours over running at 100% for 1 hour?

If cooling isn't adequate, then it isn't adequate. That makes no difference for whether the cooling is inadequate for 1 hour or 24 hours, it's still inadequate just the same!


Not at all unfortunately.

What 24/7 ops does to the hardware is mostly invisible to the typical owner not knowing or honoring the hardware limitations : the decreased lifespan of its components.

Especially the capacitors have a pretty well defined lifespan, which almost exclusively depends on running hours and are very sensitive to operating temperature. As these cards are designed to operate a limited amount of time per day, its equivalent expected lifespan will normally be several years - even when used under merely average conditions.

Put under 24/7 load, however, this lifespan is quickly reduced to a fraction of the intended period - further decreased by high operating temperatures.

You'll see the same when going to harddrives, there are only few normal desktop drives that carry a 24/7 specification. For server drives, this specification is standard and the drives were built for that purpose.

24/7 vs. "normal use" may sound just like marketing but has important design differences based on the different specifications.

You'll find this forum has plenty of reports of dead video cards. GPU failure is seldom the cause, literally burned VRMs or blown capacitors usually are.

Therefor my advice :
- there's far more to running a video card 24/7 than just keeping GPU temps in mind
- VRMs and especially capacitors will need significantly increased cooling to preserve their lifespan (the reason many modern cards have dedicated, additional temperature sensors i.e. for VRMs and VRAM - as their temps are just as critical as GPU temps, the Fan Control logic has to be able to react to these as well)

You can choose to ignore these requirements and may get lucky. Else you will destroy your xxx$ Video card due to lack i.e. of 10$ additional cooling or a plain inadequate case.
ID: 942794 · Report as offensive
John G

Send message
Joined: 29 Dec 01
Posts: 68
Credit: 10,932,850
RAC: 0
Canada
Message 942989 - Posted: 26 Oct 2009, 21:55:36 UTC

Falcon is right!!! I had a 8800GTX from BFG i fried it in less than 6 weeks !!!
Then I bought a MSI 260 GTX which ran for 6 months on my quad core without any problems. Decided to up the antie a bit spent $500 on a super cooler case all aluminum 4 fans that it ships with but room for 8 fans--- MSI motherboard ---- i7 cpu with a 295 GTX --- 6GB DDR3. My temps on my video is about 76C but the card is good for up to 118C. CPU is running at 58C. I run everything full out 24/7. Its worth the extra bucks to invest in something worthwhile besides I will never buy another BFG card ever. The card was under warrenty with full registration and all they did was sideline me for weeks and weeks !!!! Never got a return number from them !!!!
ID: 942989 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 943035 - Posted: 27 Oct 2009, 1:44:07 UTC

I was the former owner of a BFG GPU. When it failed they would not even talk about repair or replacement. Especially when I mentioned that it was running CUDA scientific applications.

On the other hand EVGA has RMA'd one GPU for me without question and I also mentioned the CUDA apps to them. UPS has it on the way home right now.

If your going to crunch with a GPU, EVGA is the one to own. Just don't forget to register your GPU with them and your good to go.
Boinc....Boinc....Boinc....Boinc....
ID: 943035 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 943040 - Posted: 27 Oct 2009, 2:02:01 UTC - in response to Message 943035.  
Last modified: 27 Oct 2009, 2:45:30 UTC

If your going to crunch with a GPU, EVGA is the one to own. Just don't forget to register your GPU with them and your good to go.


Not sure if this applies to other/all video makers, but EVGA's customer support told me they will not honor the warranty on anything purchased through eBay. I was inquiring about getting a replacement faceplate for a dual slot card.

But so far, no problems with either of the two 8800 GTS's I purchased on eBay. One running CUDA 2.3 24/7 and the other, in my daughter's machine, running WOW @ almost 24/7..:) Both at stock clock settings & 65% fan speed.

Martin
[edited] warranty statement
ID: 943040 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 943166 - Posted: 27 Oct 2009, 14:36:06 UTC
Last modified: 27 Oct 2009, 14:45:12 UTC


You make me worry.. ;-)

Maybe Fred W can jump in this thread also.
He got back his damaged GTX295 (2 PCB) from the manufacturer (IIRC, XFX) with the answer, not damaged. But Fred know because of a RAM test, that the RAM is damaged.

Not very well to know, that some manufacturer will not repair/replace damaged GPUs, if they calculate no longer CUDA.
They will repair/replace only if you will play games with ~ 10 % errors (pixel, screen failures or similar) ??

I have 4 manufacturer OCed GTX260-216 from EVGA.
This 10 year warranty is AFAIK only for US (maybe also canadian) people.
We europeans get only 2 years from EVGA (or because of the european legislation). But I bought at my seller 2 year extension. So I have 4 years warranty.

But how it would help, if my seller say it's not damaged..
Then I would have the prob to send them directly to the USA?

Also I thought about maybe to buy an OCed Gigabyte GTX260-216.
How they will answer, if the GPU will be damaged?


My 4 manufacturer OCed GTX260-216 run 24/7, fan @ AUTO - ~ 55 % RPM @ ~ 78 - 80 °C.
In summer they reached max. 84 °C.

ID: 943166 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 943217 - Posted: 27 Oct 2009, 22:40:45 UTC

Sutaru is right. My XFX GTX295 was classified as NFF by Scan but they did return it to XFX when I insisted that I had experienced faults both with S@H crunching and with the MemTestG GPU memory test utility. It has now been returned by XFX who also found no fault. I am currently underclocking the beast to 435 from 576 and am still seeing errors on S@H. Tomorrow I plan to remove CPU overclock and disconnect all unnecessary bits (e.g. DVD drive) to minimise power draw just to cover all the bases though I don't expect the Seasonic 700W PSU to have degraded and be the cause of my problems.

F.
ID: 943217 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 943229 - Posted: 27 Oct 2009, 23:59:18 UTC - in response to Message 940765.  

What is 'different' for a graphics card after running at 100% for 24 hours over running at 100% for 1 hour?

If the card can survive for 5,000 hours at peak temperature, and you run 24 hours per day instead of 1 hour per day, you'll get there 24 times faster.

5,000 hours of use at one hour per day is just shy of 14 years.

5,000 hours of use 24 hours/day is just shy of 7 months.

ID: 943229 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 943235 - Posted: 28 Oct 2009, 0:40:10 UTC


@ Fred W

Thanks for to jump in.. :-)

I wish you all the best that your PC will run very soon again at 100 % full load and 24/7 . :-)


700 W PSU?
Hmm.. after look, experiences and calculation.. I guess every GPU take 3/4 of the max. power consumption if 'only' CUDA calculation.

If ~ 300 W for a GTX295 (2 PCB), then it's ~ 225 W for CUDA calculation (maybe).
You have a power meter for to look the live wattage consumption of your whole PC?
IIRC, your Q9450 is very OCed. I guess stock at ~ 200 W, OCed now at ~ 250 or 280 W (if higher Vcore).
So we have now ~ 505 W.
AFAIK, IIRC, for well PSU life 'only' 50 % utilization.
You have maybe now ~ 72 %.

I don't know if it's correct because of ~ 50 % utilization.
I burned down two PSUs after one year each and they had only ~ 40 % utilization.
It was a cheap 'no name' PSU and a 'be quiet!' (in an expensive frame) PSU.
But both replaced with warranty. And I had good luck that the PSUs didn't damaged also other equipment.

ID: 943235 · Report as offensive
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 946185 - Posted: 9 Nov 2009, 19:29:56 UTC
Last modified: 9 Nov 2009, 19:35:02 UTC

Small add-on :

Seems some stock GPU fans don't like 24/7 ops as well.
The fan on my Club3D HD4890 "Superclocked Edition" now begins to make rattling noises in the speed regime it is now usually in (~50% rpm).

That's just a mere 2 weeks into GPU crunching with the otherwise fairly new card (<3 months), due to its rather frequent speed changes to the fan (it has a rather sensitive fan control logic that will quickly alternate speed levels depending on GPU load).

...another example how 24/7 ops can easily hurt the Video card, in my case caused by a ~1.50$ stock cooler part - the fan which obviously isn't upto that job due to cheap design.

I'll have to look into an alternative GPU cooler, otherwise I see that Video Card going dead in less than 14 days, caused by the weakest part in the chain.
ID: 946185 · Report as offensive
Profile gizbar
Avatar

Send message
Joined: 7 Jan 01
Posts: 586
Credit: 21,087,774
RAC: 0
United Kingdom
Message 946188 - Posted: 9 Nov 2009, 19:48:37 UTC - in response to Message 943166.  
Last modified: 9 Nov 2009, 19:49:55 UTC


You make me worry.. ;-)

Maybe Fred W can jump in this thread also.
He got back his damaged GTX295 (2 PCB) from the manufacturer (IIRC, XFX) with the answer, not damaged. But Fred know because of a RAM test, that the RAM is damaged.

Not very well to know, that some manufacturer will not repair/replace damaged GPUs, if they calculate no longer CUDA.
They will repair/replace only if you will play games with ~ 10 % errors (pixel, screen failures or similar) ??

I have 4 manufacturer OCed GTX260-216 from EVGA.
This 10 year warranty is AFAIK only for US (maybe also canadian) people.
We europeans get only 2 years from EVGA (or because of the european legislation). But I bought at my seller 2 year extension. So I have 4 years warranty.

But how it would help, if my seller say it's not damaged..
Then I would have the prob to send them directly to the USA?

Also I thought about maybe to buy an OCed Gigabyte GTX260-216.
How they will answer, if the GPU will be damaged?


My 4 manufacturer OCed GTX260-216 run 24/7, fan @ AUTO - ~ 55 % RPM @ ~ 78 - 80 °C.
In summer they reached max. 84 °C.


Hi Sutaru, I know your post is quite an old one, but here in the UK, EVGA are also offering a 10 year warranty on their GTX260's. I actually ordered one because of the warranty, and because it was clocked a lot higher than a standard GTX260, but the online supplier took too many orders for it and it went out of stock. So he let me change to the Gigabyte super-overclock version, which was clocked even higher than the EVGA, but only comes with a 3 year warranty. I'd check direct with EVGA if I were you, and the place you purchased them from. I hope you haven't purchased a 2 year warranty extension for nothing...

regards, Gizbar.


A proud GPU User Server Donor!
ID: 946188 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 946191 - Posted: 9 Nov 2009, 20:10:28 UTC - in response to Message 946185.  
Last modified: 9 Nov 2009, 20:12:09 UTC

Small add-on :

Seems some stock GPU fans don't like 24/7 ops as well.
The fan on my Club3D HD4890 "Superclocked Edition" now begins to make rattling noises in the speed regime it is now usually in (~50% rpm).

...

I'll have to look into an alternative GPU cooler, otherwise I see that Video Card going dead in less than 14 days, caused by the weakest part in the chain.


I put an Accelero S1 on one of my HD 4850s (I also added the optional fans), and it went from 87-89 [edit] with fan locked at 80% [/edit] to about 43 [edit] fan no longer controled [/edit] running at 97% load under Collatz and Milkyway. I'd recommend that unit highly. One nice thing about it is that you can run it without any fan and it still does a great job of cooling, and if you do have a fan and it dies, you can quite easily replace it.

-Dave
ID: 946191 · Report as offensive
Crun-chi
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 174
Credit: 3,037,232
RAC: 0
Croatia
Message 946203 - Posted: 9 Nov 2009, 21:09:45 UTC
Last modified: 9 Nov 2009, 21:26:16 UTC

Let some of you say to me: how manufacturer of card may know that you use your GPU for CUDA, and not for playing? :))))
I got 9800 : and temperature under CUDA is 58°C.

Also 24/7 is stress for parts of computer and I agree with that. And yes, electrolytic capacitors are more or less the weakest parts of any computer: but I doubt they will live only few months under 24/7.
Remove one side of case and give natural flow of air: all components will be cooler. GPU vent work every time you turn on your comp: so he work you use cuda or not :)
Happy cuda crunching

And one more thing: multicore processor are now obsolete when we have cuda cards.
My quad core (per core) need 8,657.67 seconds and asked 105.29 credits. In mean time 9800 gt need 1,929.20 seconds and ask 149.53 credits.

So it is simplest to make new comp with lowest priced components and some nice card like 9800 green edition or new GT220.
I am cruncher :)
I LOVE SETI BOINC :)
ID: 946203 · Report as offensive
Profile Odan

Send message
Joined: 8 May 03
Posts: 91
Credit: 15,331,177
RAC: 0
United Kingdom
Message 946213 - Posted: 9 Nov 2009, 22:06:11 UTC - in response to Message 946203.  
Last modified: 9 Nov 2009, 22:06:56 UTC

Let some of you say to me: how manufacturer of card may know that you use your GPU for CUDA, and not for playing? :))))
I got 9800 : and temperature under CUDA is 58°C.


The issue really (especially if you don't tell them!) is not how they know what you have been running but will they agree there is a fault.

If you are gaming & you get a few pixel errors, no biggie: a single error using CUDA for SETI crunching can cause a computation error or maybe prevent your unit validating.

If the manufacturer tests the card for a typical gaming use, they will find "no fault" with it if there are just some pixel errors or a few artifacts. There may be enough wrong with it to never give you a valid or complete unit to report.
ID: 946213 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Noobie CUDA GPU temperature worry


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.