CPU loading and thermal stress

Author	Message
NexusNet Send message Joined: 20 Sep 05 Posts: 9 Credit: 149,033 RAC: 0	Message 171450 - Posted: 24 Sep 2005, 21:32:27 UTC Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load? I have noticed an increase in cabinet ambient termperature since enabling SETI processing on my servers - perhaps 5F to 8F, although I am not equipped to factor out room temperature variances to validate those numbers. Note that the machines are Dell 6350s with ample air flow and heat sinks, so as boxes go these are intended to crunch. My question is driven by curiosity rather than immediate concern. Tx, Robert ID: 171450 ·

Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0	Message 171456 - Posted: 24 Sep 2005, 21:45:15 UTC I have always assumed, without any proof, that the "life of a CPU" is measured by it's performance. It will become "tediously slow, and not worth using" and get retired, well before it "retires its self" due to failure. No doubt CPUs do fail, But I suspect they get replaced due to old age far more often than they get replaced due to being worn out. This does not include those that fail due to consequential damage.. IMHO.... Foamy is "Lord and Master". (Oh, + some Classic WUs too.) ID: 171456 ·

The Simonator Send message Joined: 18 Nov 04 Posts: 5700 Credit: 3,855,702 RAC: 50	Message 171462 - Posted: 24 Sep 2005, 21:59:21 UTC Last modified: 24 Sep 2005, 21:59:58 UTC The only CPU ive ever had fail was one i booted up without a heatsink (bad idea), otherwise Dorsai is right. If a CPU is designed to run at a maximum speed, IMHO the only way it would suffer damage (provided it is correctly installed) would be if it was run at over that speed. Ramble over Life on earth is the global equivalent of not storing things in the fridge. ID: 171462 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 171481 - Posted: 24 Sep 2005, 23:30:59 UTC I have had a few CPUs fail - all at the same time because of a lightning strike. Prior to this, the company was too cheap to supply surge protectors. Basically, you have to kill it. It is (or at least used to be) possible to kill a CPU by overheating, but this typically happens very quickly (order of seconds to minutes) when a CPU cooler fails. BOINC WIKI ID: 171481 ·

Mike Gelvin Send message Joined: 23 May 00 Posts: 92 Credit: 9,298,464 RAC: 0	Message 171583 - Posted: 25 Sep 2005, 4:28:33 UTC - in response to Message 171450. Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load? I saw a study once, (donâ€™t recall where), where a CPU life was shortened by thermal cycling. They implied that running them up and down in temperature was more harmful than turning them on and off (electrical shock). One place I just found that discusses this (not an official study) is: http://www.overclockers.com/tips30/ ID: 171583 ·

Hans Dorn Volunteer developer Volunteer tester Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0	Message 171673 - Posted: 25 Sep 2005, 14:19:33 UTC - in response to Message 171450. Last modified: 25 Sep 2005, 14:26:41 UTC Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load? Robert Some years ago I had a Celeron-366 OC'ed to 550Mhz that failed. It was running 100% load all the time and was getting quite hot. If you don't overclock, you CPUs should last ages. Regards Hans ID: 171673 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 171709 - Posted: 25 Sep 2005, 15:30:02 UTC I personnally have never had a computer's CPU die on me that I can recall after I got the system up and running. Some of my systems ran pretty much continually for me for well into a decade ... When I was on active duty we vastly preferred comercial chips over the mil-spec ones because they worked longer. It turns out the the mil-spec qualification tests were introducing latent defects. Later they started to go to a process qualification to ensure quality. Bottom line, the power cycling is worse for a system than letting it run ... ID: 171709 ·

barbarossa Send message Joined: 4 Sep 99 Posts: 1294 Credit: 6,629,998 RAC: 3	Message 171811 - Posted: 25 Sep 2005, 19:58:21 UTC My hottest stove is an AMD XP 1900+. It sometimes reaches 70c in summer. At the moment it has 58c (it's night over here). It's been running Seti (Classic and later BOINC) since August 2003 at 24/7. No failures. :-)= Greybeard All about BOINC: BOINC-Wiki (by Paul D. Buck) ID: 171811 ·

SunMicrosystemsLLG Send message Joined: 4 Jul 05 Posts: 102 Credit: 1,360,617 RAC: 0	Message 171813 - Posted: 25 Sep 2005, 20:10:30 UTC Well the whole point of us subscribing to S@H is to perform reliability testing. Although we normally only have the nodes running Seti for a week at a time the boxes are in near constant use on a variety of jobs. This does mean we are more likely to see early life failures rather than long term 'wear and tear', but we have yet to see any problems at all caused by heat and constant 100% CPU loading. The idle temps of the CPUs are normally ~40DegC and during crunching it varies between 48-56DegC, depending on the servers location in the rack. That is with passive heatsinks and system fans. Long term it might have a slight effect, but probably no more than powering your machine on and off (heat and power cycling) rather than leaving it on to crunch. ID: 171813 ·

daav Send message Joined: 21 May 99 Posts: 39 Credit: 177,065 RAC: 0	Message 171837 - Posted: 25 Sep 2005, 21:29:26 UTC - in response to Message 171450. Last modified: 25 Sep 2005, 21:32:34 UTC Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load? Tx, Robert Hi NexusWest, No one can Guarantee a 'CPU's life, (for any particular reason), but as a 'Systems Builder' for many years..., I cannot recall ever Having or Hearing of any, (that I built), ever 'Fail', (because of Running at 100%, for many Months or Years, (providing regular 'Cleaning of all HSF's was in place)'. Heatsinks and CPU & Case Fans require regular Cleaning w/Compressed air to keep the HS Fins/Fan Blades and Fan Grills Holes, free from dust, providing Max cooling to the Same! Most 'Failures' I see, as customers bring in their 'Retail Dell, HP, other Boxes', are due to 'Impacted Dirt' in Heatsink Fins on CPU's, Chipset's, VideoCards, and Broken Cooling Fans of all Sizes! Thus related 'High Thermal Issues', are the Biggest Direct Problems to CPU's and other 'Inside Case' Components Life expectancy! As stated below, a 'Sure' way of Committing 'CPU Suicide', (on Any AMD CPU), is/was to leave off the 'HSF on the CPU', and hit the Start Button. About 2-3 seconds, use to be all you needed to make a 'Crispy Critter' out of it! But as Tecnology gets better, so do the MainBoards..., for if you don't plug in the HSF to the Newer M/B's..., it won't even Crank! So Yes, they'll run for 'Many Years' w/the proper Maintenance, (at 100%)! =) -daav- ";^) ID: 171837 ·

efa Send message Joined: 26 Mar 00 Posts: 233 Credit: 494,221 RAC: 0	Message 171868 - Posted: 25 Sep 2005, 23:07:48 UTC Last modified: 25 Sep 2005, 23:37:25 UTC I changed my MoBo, CPU and DRAM this may (2005), because it is burned. It run Seti 100% 24/7 for years (it was a Celeron666MHz). From January, any OS can boot, seti run normally, but different applications freeze the CPU (the mouse pointers remain freezed, ctrl-alt-del do nothing) on CPU intensive crunching. These apps are 3D games, video re-coding (miniDV to MPEG2), and sometimes browsing some flash site with Mozilla. I hesitate, from january to may, to change the board because Seti works perfectly. But after reinstalling one of the OS and got again the freezes, I was sure the CPU is burned. My system worked for year perfectly, super cooled and silenced. The fact that seti work good also with the CPU semi-burned, do not permit to say for sure that seti burned the CPU, but surely something happen. Seti use the FPU part of CPU as it do a lot of FFT operations. Video recoding use most of MMX and SSE instructions in the CPU as it manipulate multimedia data (that are integer matrix data). So probably in my CPU the broken part was the SSE part. Anyway, now I have a CeleronD@2.66GHz and seti run 100% 7/24 :-)) squish your cpu electrons --- Abit IS7 Bios 24, CeleronD2.66GHz FSBquad133MHz, 2xMatrix 8chip 16M84X-6 singleSided 512MB DDR400 CL2.5@double166MHz DualChannel128bit, PSU Premier DR-B350ATX 350W, ATI AiW8500DV 64MB, HDD Maxtor DiamondMax10 200GB 8MBcache, HDD Maxtor DiamondMaxPlus9 120GB, HDD Quantum FireballPlusAS 30GB, DVDburner LG-4163B, CDburner Plextor 16x, DVDplayer Asus16x, CDplayer Asus52x, Controller EIDE Q-TEC 310D, SoundBlasterLive5.1, 4x temperature-speed 500-2500rpm control fan, 7 fan totally in the case, hand-made case acustic isolation, Nokia middleTower Case, NEC MultiSync FE950, DLink DSL-300 ID: 171868 ·

Lampros Send message Joined: 17 Jun 02 Posts: 279 Credit: 13,973,726 RAC: 0	Message 171878 - Posted: 25 Sep 2005, 23:34:53 UTC I've been running my AMD 1.1Ghz on a Asus A7V motherboard 24/7 for about 5 years. I clean all heatsinks & fans monthly. About six months ago my cpu fan finally packed it in. It didn't squeal or seize, but slowed by 500Rpm. It was enough to shoot cpu temp up to 70 deg C. Either something in the cpu or most likely the bios shut the computer down. I ran a status monitor program and found the temp spikes. Changed the cpu fan and am back in business. Checked benchmarks and everything is back to normal. As for the freeze up problems, what version of Boinc were you running? As I've seen on these message boards, some previous versions (4.17 ?) had problems. I had similar problems before upgrading. ID: 171878 ·

efa Send message Joined: 26 Mar 00 Posts: 233 Credit: 494,221 RAC: 0	Message 171882 - Posted: 25 Sep 2005, 23:41:49 UTC - in response to Message 171878. Changed the cpu fan and am back in business. Checked benchmarks and everything is back to normal. As for the freeze up problems, what version of Boinc were you running? I also keep MotherboardMonitor always on checking fan, temp and voltage, and I changed many times the fan immediately when it start to slow down about 500rpm under it nominal speed. On May the Seti application 4.45 was about the only program that do not freezes the CPU. :-) ID: 171882 ·

Darth Dogbytes™ Volunteer tester Send message Joined: 30 Jul 03 Posts: 7512 Credit: 2,021,148 RAC: 0	Message 171931 - Posted: 26 Sep 2005, 5:07:21 UTC - in response to Message 171837. Last modified: 26 Sep 2005, 5:08:01 UTC Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load? Tx, Robert Hi NexusWest, No one can Guarantee a 'CPU's life, (for any particular reason), but as a 'Systems Builder' for many years..., I cannot recall ever Having or Hearing of any, (that I built), ever 'Fail', (because of Running at 100%, for many Months or Years, (providing regular 'Cleaning of all HSF's was in place)'. Heatsinks and CPU & Case Fans require regular Cleaning w/Compressed air to keep the HS Fins/Fan Blades and Fan Grills Holes, free from dust, providing Max cooling to the Same! Most 'Failures' I see, as customers bring in their 'Retail Dell, HP, other Boxes', are due to 'Impacted Dirt' in Heatsink Fins on CPU's, Chipset's, VideoCards, and Broken Cooling Fans of all Sizes! Thus related 'High Thermal Issues', are the Biggest Direct Problems to CPU's and other 'Inside Case' Components Life expectancy! As stated below, a 'Sure' way of Committing 'CPU Suicide', (on Any AMD CPU), is/was to leave off the 'HSF on the CPU', and hit the Start Button. About 2-3 seconds, use to be all you needed to make a 'Crispy Critter' out of it! But as Tecnology gets better, so do the MainBoards..., for if you don't plug in the HSF to the Newer M/B's..., it won't even Crank! So Yes, they'll run for 'Many Years' w/the proper Maintenance, (at 100%)! =) -daav- ";^) I whole heartedly concur. I clean my boxes, fans (including oiling), heat sinks, etc. every 4 months. I also run the fans at max using SpeedFan V4.25. I learned my lesson long ago about housekeeping when my old 600MHz Celeron almost fried. Account frozen... ID: 171931 ·

Peter Narkauskas Send message Joined: 3 Jul 99 Posts: 1 Credit: 1,229,803 RAC: 0	Message 171938 - Posted: 26 Sep 2005, 6:34:53 UTC - in response to Message 171931. [quote][quote]Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load? Tx, Robert Hey Robert, I agree with other comments. I have run my home computer/s for 24/7 for the past 12 or so years. The past 6 years doing SETI on the same cpu (yeah, it is a bit old and slow!) And never a cpu problem. I'be had two disks die and intially a couple of fans die on me, but thats it. So go ahead, you should be fine. Chjeers, peter birregurra...australia ID: 171938 ·

Legacy Send message Joined: 10 Dec 99 Posts: 134 Credit: 1,778,571 RAC: 0	Message 171947 - Posted: 26 Sep 2005, 8:20:41 UTC As long as you keep the temperature of the CPU to within reasonable limits, example below 70c. The chances of a CPU failure is extremely low. I have been crunching SETI for almost 6 years, most of my systems are on 24/7. Some of them even overclocked. I have NEVER seen a CPU failure. You are more likely to encounter other hardware failures like..... 1. PSU blowing up or PSU going wonky. 2. Mainboard failure or leaking/bloated caps or blow MOSFETs from overheat. Leaking/bloated capacitors are more likely to occur then a CPU failure. And alot of the time, leaky caps make the system become unstable. And alot of people take that as a sign of a CPU failure. The capacitors on the mainboard are 10 times more likely to "wear out" then the CPU is. ID: 171947 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19087 Credit: 40,757,560 RAC: 67	Message 171953 - Posted: 26 Sep 2005, 8:58:51 UTC - in response to Message 171947. Last modified: 26 Sep 2005, 8:59:08 UTC You are more likely to encounter other hardware failures like..... 1. PSU blowing up or PSU going wonky. 2. Mainboard failure or leaking/bloated caps or blow MOSFETs from overheat. I totally agree with this, and a recent magazine article in the UK stated that 30% of PC failures were directly the result of a power supply failure. They also said that quite a few of the remaining failures were due to mobo power components. Of course those of us who have aquired computers with the latest Intel cpu's with 'enhanced SteedStep' will be happy as they underclock and reduce the cpu voltage when heat stressed. I believe it was Sharkey Extreme did a test on cpu's by switching them on without heatsink and fan, the Intel worked, slowly but it worked, the AMD fried. Andy ID: 171953 ·

hazmatt87 Send message Joined: 22 Aug 05 Posts: 19 Credit: 380,440 RAC: 0	Message 171959 - Posted: 26 Sep 2005, 9:26:21 UTC Tomshardware did a test a while back with a P3, an early P4 (williamette, probably) and 2 Athlons (either early A64s or late XPs). They removed the heatsink on the chips while running a timedemo of some game. The poor monitoring in the AMDs chipset caused both of the AMDs to literally fry, they measured temps as high as ~700C. The P3 crashed, but the chip survived and booted just fine after the HSF was put back on. The P4 continued to run, although very choppy due to the throttling, you would still be able to save data. Anyways, a CPU will be far obsolete before it will ever "wear out." Although, in a few isolated reports, some people that had heavilly overclocked and overvolted Northwood P4s would suddenly have the CPU die out. It would maybe start one day with some erronous problems, but then rapidly progress to a completely dead CPU in 24 hours. We never figured out exactly why it happened, we just knew that everyone that had that problem were pushing really high volts though the chip, anywhere from 1.6V to 1.8 in effort to get good OCs. Sometimes it would be a few weeks before the chip failed, other times it would go on for a couple months. The thing is, for every incident of SNDS (sudden northwood death syndrome) there are 10 that the cpu is still fine after a year of high volts. ID: 171959 ·

Tetsuji Maverick Rai Volunteer tester Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0	Message 171961 - Posted: 26 Sep 2005, 9:28:11 UTC - in response to Message 171947. Last modified: 26 Sep 2005, 9:55:26 UTC Leaking/bloated capacitors are more likely to occur then a CPU failure. And alot of the time, leaky caps make the system become unstable. And alot of people take that as a sign of a CPU failure. The capacitors on the mainboard are 10 times more likely to "wear out" then the CPU is. I suspect I have got damaged capacitors, but by appearance, they look fine...no leakage/bloating found. Since the day before yesterday, I have a strange problem. I have two P4 boxes (2.8G Prescott and 2.4G Northwood), and at first I found my Prescott seemed to have gone by heat; when it's turned on, it's turned off automatically in a few seconds as if the Prescott had been overheated (it's been very hot this summer.) So I switched the Prescott and the Northwood and found both worked fine. I tried the Prescott on the original mobo several times and found it always failed while Northwood works fine with that mobo. In the meantime I broke a lever of one of the cpu fans! This sounds a problem of mobo (maybe capacitors) rather than that of the processor. I overclocked by 2% (Prescott) and 8% (Northwood) but cpu voltages were default values. My Prescott has been between 60-65C at, most 68C AFAIK. And I found Sandra said the mobo was hotter than normal (over 50C). My guess is capacitor(s) is partially damaged, and works with low demanding Northwodd, but not with Prescott. Now I have switched processors, only one box is running, and I'm waiting for a new cpu fan...Time will tell whether my Prescott has really gone or not...I hope not. Or are there any other factors? The suspicious mobo and the healthy Northwood are working very fine at 47-55C. Is Prescott a "mobo killer"? Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ID: 171961 ·

Legacy Send message Joined: 10 Dec 99 Posts: 134 Credit: 1,778,571 RAC: 0	Message 171966 - Posted: 26 Sep 2005, 10:50:14 UTC - in response to Message 171961. Leaking/bloated capacitors are more likely to occur then a CPU failure. And alot of the time, leaky caps make the system become unstable. And alot of people take that as a sign of a CPU failure. The capacitors on the mainboard are 10 times more likely to "wear out" then the CPU is. I suspect I have got damaged capacitors, but by appearance, they look fine...no leakage/bloating found. Since the day before yesterday, I have a strange problem. I have two P4 boxes (2.8G Prescott and 2.4G Northwood), and at first I found my Prescott seemed to have gone by heat; when it's turned on, it's turned off automatically in a few seconds as if the Prescott had been overheated (it's been very hot this summer.) So I switched the Prescott and the Northwood and found both worked fine. I tried the Prescott on the original mobo several times and found it always failed while Northwood works fine with that mobo. In the meantime I broke a lever of one of the cpu fans! This sounds a problem of mobo (maybe capacitors) rather than that of the processor. I overclocked by 2% (Prescott) and 8% (Northwood) but cpu voltages were default values. My Prescott has been between 60-65C at, most 68C AFAIK. And I found Sandra said the mobo was hotter than normal (over 50C). My guess is capacitor(s) is partially damaged, and works with low demanding Northwodd, but not with Prescott. Now I have switched processors, only one box is running, and I'm waiting for a new cpu fan...Time will tell whether my Prescott has really gone or not...I hope not. Or are there any other factors? The suspicious mobo and the healthy Northwood are working very fine at 47-55C. Is Prescott a "mobo killer"? 1. It could be your PSU not pumping enough juice. 2. It could be a mainboard not delivering enough juice to the CPU or a sign of an impending failure. 3. Prescott a mainboard killer? Well, if it is a low quality mainboard with low quality capacitors and MOSFETs, it could die when you put a Prescott into it. I wouldn't say that the Prescott killed it, but because it was a shabby mainboard that was not built within spec. ID: 171966 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.