CPU loading and thermal stress

Message boards : Number crunching : CPU loading and thermal stress
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
NexusNet
Avatar

Send message
Joined: 20 Sep 05
Posts: 9
Credit: 149,033
RAC: 0
United States
Message 171450 - Posted: 24 Sep 2005, 21:32:27 UTC

Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load?

I have noticed an increase in cabinet ambient termperature since enabling SETI processing on my servers - perhaps 5F to 8F, although I am not equipped to factor out room temperature variances to validate those numbers.

Note that the machines are Dell 6350s with ample air flow and heat sinks, so as boxes go these are intended to crunch. My question is driven by curiosity rather than immediate concern.

Tx,

Robert
ID: 171450 · Report as offensive
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 171456 - Posted: 24 Sep 2005, 21:45:15 UTC

I have always assumed, without any proof, that the "life of a CPU" is measured by it's performance.

It will become "tediously slow, and not worth using" and get retired, well before it "retires its self" due to failure.

No doubt CPUs do fail, But I suspect they get replaced due to old age far more often than they get replaced due to being worn out.

This does not include those that fail due to consequential damage..

IMHO....

Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 171456 · Report as offensive
Profile The Simonator
Avatar

Send message
Joined: 18 Nov 04
Posts: 5700
Credit: 3,855,702
RAC: 50
United Kingdom
Message 171462 - Posted: 24 Sep 2005, 21:59:21 UTC
Last modified: 24 Sep 2005, 21:59:58 UTC

The only CPU ive ever had fail was one i booted up without a heatsink (bad idea), otherwise Dorsai is right.

If a CPU is designed to run at a maximum speed, IMHO the only way it would suffer damage (provided it is correctly installed) would be if it was run at over that speed.

Ramble over
Life on earth is the global equivalent of not storing things in the fridge.
ID: 171462 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 171481 - Posted: 24 Sep 2005, 23:30:59 UTC

I have had a few CPUs fail - all at the same time because of a lightning strike. Prior to this, the company was too cheap to supply surge protectors. Basically, you have to kill it. It is (or at least used to be) possible to kill a CPU by overheating, but this typically happens very quickly (order of seconds to minutes) when a CPU cooler fails.


BOINC WIKI
ID: 171481 · Report as offensive
Mike Gelvin
Avatar

Send message
Joined: 23 May 00
Posts: 92
Credit: 9,298,464
RAC: 0
United States
Message 171583 - Posted: 25 Sep 2005, 4:28:33 UTC - in response to Message 171450.  

Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load?


I saw a study once, (don’t recall where), where a CPU life was shortened by thermal cycling. They implied that running them up and down in temperature was more harmful than turning them on and off (electrical shock).

One place I just found that discusses this (not an official study) is:
http://www.overclockers.com/tips30/


ID: 171583 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 171673 - Posted: 25 Sep 2005, 14:19:33 UTC - in response to Message 171450.  
Last modified: 25 Sep 2005, 14:26:41 UTC

Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load?
Robert


Some years ago I had a Celeron-366 OC'ed to 550Mhz that failed.
It was running 100% load all the time and was getting quite hot.

If you don't overclock, you CPUs should last ages.


Regards Hans


ID: 171673 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 171709 - Posted: 25 Sep 2005, 15:30:02 UTC

I personnally have never had a computer's CPU die on me that I can recall after I got the system up and running. Some of my systems ran pretty much continually for me for well into a decade ...

When I was on active duty we vastly preferred comercial chips over the mil-spec ones because they worked longer. It turns out the the mil-spec qualification tests were introducing latent defects. Later they started to go to a process qualification to ensure quality.

Bottom line, the power cycling is worse for a system than letting it run ...
ID: 171709 · Report as offensive
Profile barbarossa
Avatar

Send message
Joined: 4 Sep 99
Posts: 1294
Credit: 6,629,998
RAC: 3
Switzerland
Message 171811 - Posted: 25 Sep 2005, 19:58:21 UTC

My hottest stove is an AMD XP 1900+. It sometimes reaches 70c in summer. At the moment it has 58c (it's night over here).
It's been running Seti (Classic and later BOINC) since August 2003 at 24/7.
No failures.

:-)= Greybeard
All about BOINC: BOINC-Wiki (by Paul D. Buck)

ID: 171811 · Report as offensive
Profile SunMicrosystemsLLG

Send message
Joined: 4 Jul 05
Posts: 102
Credit: 1,360,617
RAC: 0
United Kingdom
Message 171813 - Posted: 25 Sep 2005, 20:10:30 UTC

Well the whole point of us subscribing to S@H is to perform reliability testing.

Although we normally only have the nodes running Seti for a week at a time the boxes are in near constant use on a variety of jobs. This does mean we are more likely to see early life failures rather than long term 'wear and tear', but we have yet to see any problems at all caused by heat and constant 100% CPU loading.

The idle temps of the CPUs are normally ~40DegC and during crunching it varies between 48-56DegC, depending on the servers location in the rack. That is with passive heatsinks and system fans.

Long term it might have a slight effect, but probably no more than powering your machine on and off (heat and power cycling) rather than leaving it on to crunch.
ID: 171813 · Report as offensive
Profile daav
Avatar

Send message
Joined: 21 May 99
Posts: 39
Credit: 177,065
RAC: 0
United States
Message 171837 - Posted: 25 Sep 2005, 21:29:26 UTC - in response to Message 171450.  
Last modified: 25 Sep 2005, 21:32:34 UTC

Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load?

Tx,

Robert


Hi NexusWest, No one can Guarantee a 'CPU's life, (for any particular reason), but as a 'Systems Builder' for many years..., I cannot recall ever Having or Hearing of any, (that I built), ever 'Fail', (because of Running at 100%, for many Months or Years, (providing regular 'Cleaning of all HSF's was in place)'.

Heatsinks and CPU & Case Fans require regular Cleaning w/Compressed air to keep the HS Fins/Fan Blades and Fan Grills Holes, free from dust, providing Max cooling to the Same!

Most 'Failures' I see, as customers bring in their 'Retail Dell, HP, other Boxes', are due to 'Impacted Dirt' in Heatsink Fins on CPU's, Chipset's, VideoCards, and Broken Cooling Fans of all Sizes! Thus related 'High Thermal Issues', are the Biggest Direct Problems to CPU's and other 'Inside Case' Components Life expectancy!

As stated below, a 'Sure' way of Committing 'CPU Suicide', (on Any AMD CPU), is/was to leave off the 'HSF on the CPU', and hit the Start Button. About 2-3 seconds, use to be all you needed to make a 'Crispy Critter' out of it! But as Tecnology gets better, so do the MainBoards..., for if you don't plug in the HSF to the Newer M/B's..., it won't even Crank!

So Yes, they'll run for 'Many Years' w/the proper Maintenance, (at 100%)! =)

-daav- ";^)







ID: 171837 · Report as offensive
Profile efa
Avatar

Send message
Joined: 26 Mar 00
Posts: 233
Credit: 494,221
RAC: 0
Italy
Message 171868 - Posted: 25 Sep 2005, 23:07:48 UTC
Last modified: 25 Sep 2005, 23:37:25 UTC

I changed my MoBo, CPU and DRAM this may (2005), because it is burned.
It run Seti 100% 24/7 for years (it was a Celeron666MHz).
From January, any OS can boot, seti run normally, but different applications freeze the CPU (the mouse pointers remain freezed, ctrl-alt-del do nothing) on CPU intensive crunching.
These apps are 3D games, video re-coding (miniDV to MPEG2), and sometimes browsing some flash site with Mozilla.
I hesitate, from january to may, to change the board because Seti works perfectly.
But after reinstalling one of the OS and got again the freezes, I was sure the CPU is burned.
My system worked for year perfectly, super cooled and silenced.

The fact that seti work good also with the CPU semi-burned, do not permit to say for sure that seti burned the CPU, but surely something happen.

Seti use the FPU part of CPU as it do a lot of FFT operations.
Video recoding use most of MMX and SSE instructions in the CPU as it manipulate multimedia data (that are integer matrix data).
So probably in my CPU the broken part was the SSE part.

Anyway, now I have a CeleronD@2.66GHz and seti run 100% 7/24
:-))
squish your cpu electrons
---
Abit IS7 Bios 24, CeleronD2.66GHz FSBquad133MHz, 2xMatrix 8chip 16M84X-6 singleSided 512MB DDR400 CL2.5@double166MHz DualChannel128bit, PSU Premier DR-B350ATX 350W, ATI AiW8500DV 64MB, HDD Maxtor DiamondMax10 200GB 8MBcache, HDD Maxtor DiamondMaxPlus9 120GB, HDD Quantum FireballPlusAS 30GB, DVDburner LG-4163B, CDburner Plextor 16x, DVDplayer Asus16x, CDplayer Asus52x, Controller EIDE Q-TEC 310D, SoundBlasterLive5.1, 4x temperature-speed 500-2500rpm control fan, 7 fan totally in the case, hand-made case acustic isolation, Nokia middleTower Case, NEC MultiSync FE950, DLink DSL-300

ID: 171868 · Report as offensive
Profile Lampros
Avatar

Send message
Joined: 17 Jun 02
Posts: 279
Credit: 13,973,726
RAC: 0
Canada
Message 171878 - Posted: 25 Sep 2005, 23:34:53 UTC

I've been running my AMD 1.1Ghz on a Asus A7V motherboard 24/7 for about 5 years. I clean all heatsinks & fans monthly. About six months ago my cpu fan finally packed it in. It didn't squeal or seize, but slowed by 500Rpm. It was enough to shoot cpu temp up to 70 deg C. Either something in the cpu or most likely the bios shut the computer down. I ran a status monitor program and found the temp spikes. Changed the cpu fan and am back in business. Checked benchmarks and everything is back to normal. As for the freeze up problems, what version of Boinc were you running? As I've seen on these message boards, some previous versions (4.17 ?) had problems. I had similar problems before upgrading.
ID: 171878 · Report as offensive
Profile efa
Avatar

Send message
Joined: 26 Mar 00
Posts: 233
Credit: 494,221
RAC: 0
Italy
Message 171882 - Posted: 25 Sep 2005, 23:41:49 UTC - in response to Message 171878.  

Changed the cpu fan and am back in business. Checked benchmarks and everything is back to normal. As for the freeze up problems, what version of Boinc were you running?

I also keep MotherboardMonitor always on checking fan, temp and voltage, and I changed many times the fan immediately when it start to slow down about 500rpm under it nominal speed.
On May the Seti application 4.45 was about the only program that do not freezes the CPU.
:-)

ID: 171882 · Report as offensive
Profile Darth Dogbytes™
Volunteer tester

Send message
Joined: 30 Jul 03
Posts: 7512
Credit: 2,021,148
RAC: 0
United States
Message 171931 - Posted: 26 Sep 2005, 5:07:21 UTC - in response to Message 171837.  
Last modified: 26 Sep 2005, 5:08:01 UTC

Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load?

Tx,

Robert


Hi NexusWest, No one can Guarantee a 'CPU's life, (for any particular reason), but as a 'Systems Builder' for many years..., I cannot recall ever Having or Hearing of any, (that I built), ever 'Fail', (because of Running at 100%, for many Months or Years, (providing regular 'Cleaning of all HSF's was in place)'.

Heatsinks and CPU & Case Fans require regular Cleaning w/Compressed air to keep the HS Fins/Fan Blades and Fan Grills Holes, free from dust, providing Max cooling to the Same!

Most 'Failures' I see, as customers bring in their 'Retail Dell, HP, other Boxes', are due to 'Impacted Dirt' in Heatsink Fins on CPU's, Chipset's, VideoCards, and Broken Cooling Fans of all Sizes! Thus related 'High Thermal Issues', are the Biggest Direct Problems to CPU's and other 'Inside Case' Components Life expectancy!

As stated below, a 'Sure' way of Committing 'CPU Suicide', (on Any AMD CPU), is/was to leave off the 'HSF on the CPU', and hit the Start Button. About 2-3 seconds, use to be all you needed to make a 'Crispy Critter' out of it! But as Tecnology gets better, so do the MainBoards..., for if you don't plug in the HSF to the Newer M/B's..., it won't even Crank!

So Yes, they'll run for 'Many Years' w/the proper Maintenance, (at 100%)! =)

-daav- ";^)







I whole heartedly concur. I clean my boxes, fans (including oiling), heat sinks, etc. every 4 months. I also run the fans at max using SpeedFan V4.25.
I learned my lesson long ago about housekeeping when my old 600MHz Celeron almost fried.


Account frozen...
ID: 171931 · Report as offensive
Peter Narkauskas

Send message
Joined: 3 Jul 99
Posts: 1
Credit: 1,229,803
RAC: 0
Australia
Message 171938 - Posted: 26 Sep 2005, 6:34:53 UTC - in response to Message 171931.  

[quote][quote]Does anyone have information on or experience with the impact to CPU life of long-term running at 100% load?

Tx,

Robert



Hey Robert,

I agree with other comments. I have run my home computer/s for 24/7 for the past 12 or so years. The past 6 years doing SETI on the same cpu (yeah, it is a bit old and slow!) And never a cpu problem. I'be had two disks die and intially a couple of fans die on me, but thats it.

So go ahead, you should be fine.

Chjeers,

peter
birregurra...australia
ID: 171938 · Report as offensive
Profile Legacy
Avatar

Send message
Joined: 10 Dec 99
Posts: 134
Credit: 1,778,571
RAC: 0
Singapore
Message 171947 - Posted: 26 Sep 2005, 8:20:41 UTC

As long as you keep the temperature of the CPU to within reasonable limits, example below 70c. The chances of a CPU failure is extremely low. I have been crunching SETI for almost 6 years, most of my systems are on 24/7. Some of them even overclocked. I have NEVER seen a CPU failure.

You are more likely to encounter other hardware failures like.....

1. PSU blowing up or PSU going wonky.
2. Mainboard failure or leaking/bloated caps or blow MOSFETs from overheat.

Leaking/bloated capacitors are more likely to occur then a CPU failure. And alot of the time, leaky caps make the system become unstable. And alot of people take that as a sign of a CPU failure.

The capacitors on the mainboard are 10 times more likely to "wear out" then the CPU is.
ID: 171947 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19087
Credit: 40,757,560
RAC: 67
United Kingdom
Message 171953 - Posted: 26 Sep 2005, 8:58:51 UTC - in response to Message 171947.  
Last modified: 26 Sep 2005, 8:59:08 UTC

You are more likely to encounter other hardware failures like.....

1. PSU blowing up or PSU going wonky.
2. Mainboard failure or leaking/bloated caps or blow MOSFETs from overheat.


I totally agree with this, and a recent magazine article in the UK stated that 30% of PC failures were directly the result of a power supply failure. They also said that quite a few of the remaining failures were due to mobo power components.

Of course those of us who have aquired computers with the latest Intel cpu's with 'enhanced SteedStep' will be happy as they underclock and reduce the cpu voltage when heat stressed. I believe it was Sharkey Extreme did a test on cpu's by switching them on without heatsink and fan, the Intel worked, slowly but it worked, the AMD fried.

Andy
ID: 171953 · Report as offensive
hazmatt87

Send message
Joined: 22 Aug 05
Posts: 19
Credit: 380,440
RAC: 0
United States
Message 171959 - Posted: 26 Sep 2005, 9:26:21 UTC

Tomshardware did a test a while back with a P3, an early P4 (williamette, probably) and 2 Athlons (either early A64s or late XPs). They removed the heatsink on the chips while running a timedemo of some game.

The poor monitoring in the AMDs chipset caused both of the AMDs to literally fry, they measured temps as high as ~700C. The P3 crashed, but the chip survived and booted just fine after the HSF was put back on. The P4 continued to run, although very choppy due to the throttling, you would still be able to save data.

Anyways, a CPU will be far obsolete before it will ever "wear out."

Although, in a few isolated reports, some people that had heavilly overclocked and overvolted Northwood P4s would suddenly have the CPU die out. It would maybe start one day with some erronous problems, but then rapidly progress to a completely dead CPU in 24 hours. We never figured out exactly why it happened, we just knew that everyone that had that problem were pushing really high volts though the chip, anywhere from 1.6V to 1.8 in effort to get good OCs. Sometimes it would be a few weeks before the chip failed, other times it would go on for a couple months. The thing is, for every incident of SNDS (sudden northwood death syndrome) there are 10 that the cpu is still fine after a year of high volts.
ID: 171959 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 171961 - Posted: 26 Sep 2005, 9:28:11 UTC - in response to Message 171947.  
Last modified: 26 Sep 2005, 9:55:26 UTC


Leaking/bloated capacitors are more likely to occur then a CPU failure. And alot of the time, leaky caps make the system become unstable. And alot of people take that as a sign of a CPU failure.

The capacitors on the mainboard are 10 times more likely to "wear out" then the CPU is.


I suspect I have got damaged capacitors, but by appearance, they look fine...no leakage/bloating found.

Since the day before yesterday, I have a strange problem. I have two P4 boxes (2.8G Prescott and 2.4G Northwood), and at first I found my Prescott seemed to have gone by heat; when it's turned on, it's turned off automatically in a few seconds as if the Prescott had been overheated (it's been very hot this summer.) So I switched the Prescott and the Northwood and found both worked fine. I tried the Prescott on the original mobo several times and found it always failed while Northwood works fine with that mobo. In the meantime I broke a lever of one of the cpu fans!

This sounds a problem of mobo (maybe capacitors) rather than that of the processor. I overclocked by 2% (Prescott) and 8% (Northwood) but cpu voltages were default values. My Prescott has been between 60-65C at, most 68C AFAIK. And I found Sandra said the mobo was hotter than normal (over 50C). My guess is capacitor(s) is partially damaged, and works with low demanding Northwodd, but not with Prescott.

Now I have switched processors, only one box is running, and I'm waiting for a new cpu fan...Time will tell whether my Prescott has really gone or not...I hope not. Or are there any other factors? The suspicious mobo and the healthy Northwood are working very fine at 47-55C. Is Prescott a "mobo killer"?
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 171961 · Report as offensive
Profile Legacy
Avatar

Send message
Joined: 10 Dec 99
Posts: 134
Credit: 1,778,571
RAC: 0
Singapore
Message 171966 - Posted: 26 Sep 2005, 10:50:14 UTC - in response to Message 171961.  


Leaking/bloated capacitors are more likely to occur then a CPU failure. And alot of the time, leaky caps make the system become unstable. And alot of people take that as a sign of a CPU failure.

The capacitors on the mainboard are 10 times more likely to "wear out" then the CPU is.


I suspect I have got damaged capacitors, but by appearance, they look fine...no leakage/bloating found.

Since the day before yesterday, I have a strange problem. I have two P4 boxes (2.8G Prescott and 2.4G Northwood), and at first I found my Prescott seemed to have gone by heat; when it's turned on, it's turned off automatically in a few seconds as if the Prescott had been overheated (it's been very hot this summer.) So I switched the Prescott and the Northwood and found both worked fine. I tried the Prescott on the original mobo several times and found it always failed while Northwood works fine with that mobo. In the meantime I broke a lever of one of the cpu fans!

This sounds a problem of mobo (maybe capacitors) rather than that of the processor. I overclocked by 2% (Prescott) and 8% (Northwood) but cpu voltages were default values. My Prescott has been between 60-65C at, most 68C AFAIK. And I found Sandra said the mobo was hotter than normal (over 50C). My guess is capacitor(s) is partially damaged, and works with low demanding Northwodd, but not with Prescott.

Now I have switched processors, only one box is running, and I'm waiting for a new cpu fan...Time will tell whether my Prescott has really gone or not...I hope not. Or are there any other factors? The suspicious mobo and the healthy Northwood are working very fine at 47-55C. Is Prescott a "mobo killer"?


1. It could be your PSU not pumping enough juice.
2. It could be a mainboard not delivering enough juice to the CPU or a sign of an impending failure.
3. Prescott a mainboard killer? Well, if it is a low quality mainboard with low quality capacitors and MOSFETs, it could die when you put a Prescott into it. I wouldn't say that the Prescott killed it, but because it was a shabby mainboard that was not built within spec.
ID: 171966 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : CPU loading and thermal stress


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.