Message boards :
Number crunching :
Noobie CUDA GPU temperature worry
Message board moderation
Author | Message |
---|---|
Tony Habergham Send message Joined: 17 May 99 Posts: 10 Credit: 9,453,098 RAC: 7 |
Please excuse if this is going over old ground, but a half-hour search didn't turn up anything relevant. After passively boincing for years, I checked the seti forums a few weeks ago, and learned about CUDA. Pursuading myself that my eighteen-month-old ATI card was on it's last legs, I splashed out on ebay and yesterday acquired a 8800 GTS OC with 320 MB. Installed it last night; seems OK: processing WU's in 12-40 minutes, compared to 8-10 hours for the CPU. However GPU-z is reporting temperatures of 88-93C, which is worrying me a lot - the old ATI card peaked at 60C in games, and my CPU claims to be a tepid 37C these days. I've browsed the Nvidia forum and cant get a clear answer - it varies from 'anything over 70C is slowly cooking your memory' to 'the GPUs are fine up to 120C'. I'm thinking of using an app (any recommendations?) to throttle back the card - I understand the 'OC' means it's factory overclocked already. On the other hand, it's not actually givining any problems, even when 'utilize GPU when computer is active'. Any considered advice on the long-term impacts of this temperature? TonyH |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Probably not too worrying in itself but I assume all that heat is being dumped into the case which can also affect other components. If you do want to play around with the settings, the easiest tool I have found it EVGA Precision (available here) which seems to work fine with any make of graphics card. I've no experience with the 8800 but you may find that the automatic fan speed does not kick in until the temps get quite high and setting the fan speed manually to 100% with Precision will bring the temp down to more comfortable levels. Alternatively, the tool will allow you to underclock the card. F. [edit]And - welcome to the Boards[/edit] |
FalconFly Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 |
Just be aware that conventional Video cards are neither designed not specced for 24/7 data crunching and its ensuing thermal stress. For a few, irregular hours of gaming each day, such temperatures may be okay and will result in a normal life span of the device. But as a permanent load, critical parts like VRM modules, condensators and to some extend the VRAM will experience a reduced life span. With these going defective, a still intact GPU (which can accept such temperatures for long periods of time) becomes worthless. The formentioned components are the weakest links in that chain. For anything that goes like 24/7 GPU crunching, one should have clearly above-average case cooling, specifically a cooling that directly benefits the video card. Stuffed into generic 0815-cases or given inadequate cooling for the job, those modern cards will run for a variable while, then die. |
Nick Send message Joined: 16 Oct 09 Posts: 81 Credit: 112,909 RAC: 0 |
88-93C seems on the hot side. What is GPU-Z saying the fan speed is? This may be a clue for you. My GTS250 has been running now for 18 hrs non-stop and it's at 67-68C with a fan speed of 49-50%. My ATX tower has one HD and CD drive. I had to remove the cover where the second CD goes to get more air into the box because it was starving for cold air. This lowered the temp by 5C. See if you can get more fresh air into the box. |
ML1 Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20 |
Just be aware that conventional Video cards are neither designed not specced for 24/7 data crunching and its ensuing thermal stress. OK... So... What is 'different' for a graphics card after running at 100% for 24 hours over running at 100% for 1 hour? If cooling isn't adequate, then it isn't adequate. That makes no difference for whether the cooling is inadequate for 1 hour or 24 hours, it's still inadequate just the same! The real test is whether the temperatures seen are considered 'reasonable' or not. Happy fast crunchin', Martin [edit] Aside: I've been running CUDA on a passively cooled GPU for months now, 24/7, without any issues at all. The GPU reports 60 - 68 deg C depending on what WU. [/edit] See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Tony Habergham Send message Joined: 17 May 99 Posts: 10 Credit: 9,453,098 RAC: 7 |
Thanks to all who've responded, several nice pointers to go on. I do think the temps were too high, having had one 'Computation error' today, which I've never seen before. On the other hand, it's the fans which seem to fail on graphics cards these days, rather than the electronics directly. It was having blue screens on high GPU load which started me paying attention to GPU temp, and running GPU-z. I was running passively cooled, without knowing it, for months). I dont have my machine on 24/7 since they invested global warming (and it seems kind of silly when I have a 60% dedication to climatepredition.net) - just about 6 hours straight in the evenings and 14 hours at wekends - just so I can pop back for a quick surf as the mood takes me). EVGA Precision is very nice (there's a more recent release on the link given). Throttling back to the Minimimum 415 MHz core seems to give temps in the mid/high 80's, which seems more comfortable place for me. Does anyone have any advice on tweaking the settings for seti? The shader clock is independently settable, and I got the impression that these were the beasties which were doing the work for CUDA/Boinc. And the memory clock... the 800MHz DDR3 seems to be set to run at 792 as standard. I understand heard faster memory is better for seti, and wouldn't genarate heat, I'd think. I'm open to input. Better case cooling is an obvious step, but I am sat in front of the thing for a fair proportion of the time while crunching (either surfing or running 15 year old games with DosBox) so I dont want too much noise. In any event, there's not much room left, without a major restructure... the dual-width graphics card (which is venting outside the case) leaves no room for the modified case (quieter fan) cooler I had in as well, before the graphics upgrade. I was quite pleased with the main setup I'd just got to - picked up a massive dual-heatpipe cpu cooler cheaply, which was capable of keeping my Athlon 64 at below 55C at 100% load, even with the CPU fan off completely. It produced CPU temps in the high 30's at minimum fan speed, producing a nice airflow over the ramsinks and chipset sink as well. Anyway, thanks to all. TonyH |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I asked the question directly to an EVGA tech..... He simply responded........they just run that hot. Flat response.... I guess that's why most of the top end cards at EVGA have a lifetime warranty if you register them. They are gonna cook sooner or later. No electronics can run that hot forever......or can they? LOL. I should know.......... My last known surviving core 2 duo is still running strong at 59c. And has been for years now. It's the last dual core I am running. And do so mostly for it's keepsake value. I have considered taking it offline in deference to a quad, but just can't quite bring myself to do it. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
FalconFly Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 |
OK... So... Not at all unfortunately. What 24/7 ops does to the hardware is mostly invisible to the typical owner not knowing or honoring the hardware limitations : the decreased lifespan of its components. Especially the capacitors have a pretty well defined lifespan, which almost exclusively depends on running hours and are very sensitive to operating temperature. As these cards are designed to operate a limited amount of time per day, its equivalent expected lifespan will normally be several years - even when used under merely average conditions. Put under 24/7 load, however, this lifespan is quickly reduced to a fraction of the intended period - further decreased by high operating temperatures. You'll see the same when going to harddrives, there are only few normal desktop drives that carry a 24/7 specification. For server drives, this specification is standard and the drives were built for that purpose. 24/7 vs. "normal use" may sound just like marketing but has important design differences based on the different specifications. You'll find this forum has plenty of reports of dead video cards. GPU failure is seldom the cause, literally burned VRMs or blown capacitors usually are. Therefor my advice : - there's far more to running a video card 24/7 than just keeping GPU temps in mind - VRMs and especially capacitors will need significantly increased cooling to preserve their lifespan (the reason many modern cards have dedicated, additional temperature sensors i.e. for VRMs and VRAM - as their temps are just as critical as GPU temps, the Fan Control logic has to be able to react to these as well) You can choose to ignore these requirements and may get lucky. Else you will destroy your xxx$ Video card due to lack i.e. of 10$ additional cooling or a plain inadequate case. |
John G Send message Joined: 29 Dec 01 Posts: 68 Credit: 10,932,850 RAC: 0 |
Falcon is right!!! I had a 8800GTX from BFG i fried it in less than 6 weeks !!! Then I bought a MSI 260 GTX which ran for 6 months on my quad core without any problems. Decided to up the antie a bit spent $500 on a super cooler case all aluminum 4 fans that it ships with but room for 8 fans--- MSI motherboard ---- i7 cpu with a 295 GTX --- 6GB DDR3. My temps on my video is about 76C but the card is good for up to 118C. CPU is running at 58C. I run everything full out 24/7. Its worth the extra bucks to invest in something worthwhile besides I will never buy another BFG card ever. The card was under warrenty with full registration and all they did was sideline me for weeks and weeks !!!! Never got a return number from them !!!! |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I was the former owner of a BFG GPU. When it failed they would not even talk about repair or replacement. Especially when I mentioned that it was running CUDA scientific applications. On the other hand EVGA has RMA'd one GPU for me without question and I also mentioned the CUDA apps to them. UPS has it on the way home right now. If your going to crunch with a GPU, EVGA is the one to own. Just don't forget to register your GPU with them and your good to go. Boinc....Boinc....Boinc....Boinc.... |
Lint trap Send message Joined: 30 May 03 Posts: 871 Credit: 28,092,319 RAC: 0 |
If your going to crunch with a GPU, EVGA is the one to own. Just don't forget to register your GPU with them and your good to go. Not sure if this applies to other/all video makers, but EVGA's customer support told me they will not honor the warranty on anything purchased through eBay. I was inquiring about getting a replacement faceplate for a dual slot card. But so far, no problems with either of the two 8800 GTS's I purchased on eBay. One running CUDA 2.3 24/7 and the other, in my daughter's machine, running WOW @ almost 24/7..:) Both at stock clock settings & 65% fan speed. Martin [edited] warranty statement |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
You make me worry.. ;-) Maybe Fred W can jump in this thread also. He got back his damaged GTX295 (2 PCB) from the manufacturer (IIRC, XFX) with the answer, not damaged. But Fred know because of a RAM test, that the RAM is damaged. Not very well to know, that some manufacturer will not repair/replace damaged GPUs, if they calculate no longer CUDA. They will repair/replace only if you will play games with ~ 10 % errors (pixel, screen failures or similar) ?? I have 4 manufacturer OCed GTX260-216 from EVGA. This 10 year warranty is AFAIK only for US (maybe also canadian) people. We europeans get only 2 years from EVGA (or because of the european legislation). But I bought at my seller 2 year extension. So I have 4 years warranty. But how it would help, if my seller say it's not damaged.. Then I would have the prob to send them directly to the USA? Also I thought about maybe to buy an OCed Gigabyte GTX260-216. How they will answer, if the GPU will be damaged? My 4 manufacturer OCed GTX260-216 run 24/7, fan @ AUTO - ~ 55 % RPM @ ~ 78 - 80 °C. In summer they reached max. 84 °C. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Sutaru is right. My XFX GTX295 was classified as NFF by Scan but they did return it to XFX when I insisted that I had experienced faults both with S@H crunching and with the MemTestG GPU memory test utility. It has now been returned by XFX who also found no fault. I am currently underclocking the beast to 435 from 576 and am still seeing errors on S@H. Tomorrow I plan to remove CPU overclock and disconnect all unnecessary bits (e.g. DVD drive) to minimise power draw just to cover all the bases though I don't expect the Seasonic 700W PSU to have degraded and be the cause of my problems. F. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
What is 'different' for a graphics card after running at 100% for 24 hours over running at 100% for 1 hour? If the card can survive for 5,000 hours at peak temperature, and you run 24 hours per day instead of 1 hour per day, you'll get there 24 times faster. 5,000 hours of use at one hour per day is just shy of 14 years. 5,000 hours of use 24 hours/day is just shy of 7 months. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
@ Fred W Thanks for to jump in.. :-) I wish you all the best that your PC will run very soon again at 100 % full load and 24/7 . :-) 700 W PSU? Hmm.. after look, experiences and calculation.. I guess every GPU take 3/4 of the max. power consumption if 'only' CUDA calculation. If ~ 300 W for a GTX295 (2 PCB), then it's ~ 225 W for CUDA calculation (maybe). You have a power meter for to look the live wattage consumption of your whole PC? IIRC, your Q9450 is very OCed. I guess stock at ~ 200 W, OCed now at ~ 250 or 280 W (if higher Vcore). So we have now ~ 505 W. AFAIK, IIRC, for well PSU life 'only' 50 % utilization. You have maybe now ~ 72 %. I don't know if it's correct because of ~ 50 % utilization. I burned down two PSUs after one year each and they had only ~ 40 % utilization. It was a cheap 'no name' PSU and a 'be quiet!' (in an expensive frame) PSU. But both replaced with warranty. And I had good luck that the PSUs didn't damaged also other equipment. |
FalconFly Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 |
Small add-on : Seems some stock GPU fans don't like 24/7 ops as well. The fan on my Club3D HD4890 "Superclocked Edition" now begins to make rattling noises in the speed regime it is now usually in (~50% rpm). That's just a mere 2 weeks into GPU crunching with the otherwise fairly new card (<3 months), due to its rather frequent speed changes to the fan (it has a rather sensitive fan control logic that will quickly alternate speed levels depending on GPU load). ...another example how 24/7 ops can easily hurt the Video card, in my case caused by a ~1.50$ stock cooler part - the fan which obviously isn't upto that job due to cheap design. I'll have to look into an alternative GPU cooler, otherwise I see that Video Card going dead in less than 14 days, caused by the weakest part in the chain. |
gizbar Send message Joined: 7 Jan 01 Posts: 586 Credit: 21,087,774 RAC: 0 |
Hi Sutaru, I know your post is quite an old one, but here in the UK, EVGA are also offering a 10 year warranty on their GTX260's. I actually ordered one because of the warranty, and because it was clocked a lot higher than a standard GTX260, but the online supplier took too many orders for it and it went out of stock. So he let me change to the Gigabyte super-overclock version, which was clocked even higher than the EVGA, but only comes with a 3 year warranty. I'd check direct with EVGA if I were you, and the place you purchased them from. I hope you haven't purchased a 2 year warranty extension for nothing... regards, Gizbar. A proud GPU User Server Donor! |
dnolan Send message Joined: 30 Aug 01 Posts: 1228 Credit: 47,779,411 RAC: 32 |
Small add-on : I put an Accelero S1 on one of my HD 4850s (I also added the optional fans), and it went from 87-89 [edit] with fan locked at 80% [/edit] to about 43 [edit] fan no longer controled [/edit] running at 97% load under Collatz and Milkyway. I'd recommend that unit highly. One nice thing about it is that you can run it without any fan and it still does a great job of cooling, and if you do have a fan and it dies, you can quite easily replace it. -Dave |
Crun-chi Send message Joined: 3 Apr 99 Posts: 174 Credit: 3,037,232 RAC: 0 |
Let some of you say to me: how manufacturer of card may know that you use your GPU for CUDA, and not for playing? :)))) I got 9800 : and temperature under CUDA is 58°C. Also 24/7 is stress for parts of computer and I agree with that. And yes, electrolytic capacitors are more or less the weakest parts of any computer: but I doubt they will live only few months under 24/7. Remove one side of case and give natural flow of air: all components will be cooler. GPU vent work every time you turn on your comp: so he work you use cuda or not :) Happy cuda crunching And one more thing: multicore processor are now obsolete when we have cuda cards. My quad core (per core) need 8,657.67 seconds and asked 105.29 credits. In mean time 9800 gt need 1,929.20 seconds and ask 149.53 credits. So it is simplest to make new comp with lowest priced components and some nice card like 9800 green edition or new GT220. I am cruncher :) I LOVE SETI BOINC :) |
Odan Send message Joined: 8 May 03 Posts: 91 Credit: 15,331,177 RAC: 0 |
Let some of you say to me: how manufacturer of card may know that you use your GPU for CUDA, and not for playing? :)))) The issue really (especially if you don't tell them!) is not how they know what you have been running but will they agree there is a fault. If you are gaming & you get a few pixel errors, no biggie: a single error using CUDA for SETI crunching can cause a computation error or maybe prevent your unit validating. If the manufacturer tests the card for a typical gaming use, they will find "no fault" with it if there are just some pixel errors or a few artifacts. There may be enough wrong with it to never give you a valid or complete unit to report. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.