Heat - The Dreaded Enemy


log in

Advanced search

Message boards : Number crunching : Heat - The Dreaded Enemy

Previous · 1 · 2 · 3 · 4 · Next
Author Message
Osiris30
Send message
Joined: 19 Aug 07
Posts: 264
Credit: 41,917,631
RAC: 0
Barbados
Message 650719 - Posted: 29 Sep 2007, 3:40:25 UTC - in response to Message 650191.

Its known as electromigration, and I have seen a couple of CPUs slowly degrade in OCability due to its effects. Sadly, my AXP2000+ @ 2200mhz is starting to feel it

Actually, electromigration failures are far more likely to be instantly catastrophic in their observable performance effects than not.

Gradual performance degradation failures are more likely form other mechanisms--such as threshold voltage shifts and several other things associated with the transistors and not the wiring.

The cheery confidence displayed by some overclockers that simply assuring a good die temperature is full protection from OC bad effects is misplaced. Some of the degradation mechanisms are primarily voltage dependent, with little temperature effect, while some of the other mechanisms have very strong temperature dependence. Still others respond primarily to temperature cycling.


Just a point of order. Electromigration is present in non-OC'ed CPUs as well, just to a lesser extent.

Astro
Volunteer tester
Avatar
Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 650722 - Posted: 29 Sep 2007, 3:50:16 UTC - in response to Message 650715.
Last modified: 29 Sep 2007, 3:57:03 UTC

Okay Crunchers!
[snipppppp]
CPU temp=39C (28C/82F lower temp)
MB temp=41C
[more snipping]
IROC

Are you saying the cpu temps are LOWER than Motherboard temps?? Were you running Seti on all cores? (or some other project app) If so, what kind of Zalman cooler are you using?

archae86
Send message
Joined: 31 Aug 99
Posts: 889
Credit: 1,572,794
RAC: 3
United States
Message 650743 - Posted: 29 Sep 2007, 4:20:26 UTC - in response to Message 650719.

Just a point of order. Electromigration is present in non-OC'ed CPUs as well, just to a lesser extent.

Sure, as so also are pretty much all the other failure mechanisms.

But electromigration for many design/process combinations has often been at so low a level that as a practical matter only initially defective structures failed from it in the field. Years after I went to work in the business, the big IC company I worked for had never seen a field EM failure (we made more than one later).

Running hotter, whether from overclocking, or just from BOINCing a machine which would otherwise be idle, makes many, in fact most, failure mechanisms worse. Running at higher voltage makes another set of mechanisms worse, with appreciable overlap between the two.

Manufacturers tend to know when they have a real field electromigration failure problem with a part. It is one of the very, very few mechanisms for which the failure rate rises with time. This is seriously nasty. By the time you know you have a problem, the check is already in the mail for a very much larger one than what you have seen. As we generally expect decreasing failure rates with time, this stands out like a sore thumb if anyone is paying attention.
____________

Profile IROC
Send message
Joined: 27 Jun 99
Posts: 57
Credit: 10,104,906
RAC: 0
United States
Message 650762 - Posted: 29 Sep 2007, 5:18:17 UTC - in response to Message 650722.


Are you saying the cpu temps are LOWER than Motherboard temps?? Were you running Seti on all cores? (or some other project app) If so, what kind of Zalman cooler are you using?



The new quad is running at a lower temp than the 3.2GHz PentD that it replaced. As I mentioned at the beginning of the thread, the PentD was running about 65+C. The quad is running 40C with all cores running SETI.

Zalman CNPS9500.

IROC

____________

Profile The Chosen
Avatar
Send message
Joined: 24 Jul 00
Posts: 54
Credit: 4,587,117
RAC: 867
Germany
Message 650765 - Posted: 29 Sep 2007, 5:27:24 UTC
Last modified: 29 Sep 2007, 5:30:25 UTC

about temps from core2

not all temp programs read the core2 temp right

in some programs u must add 15°C in some other u must take off 15°C and some is near correct

so think about it
____________
Boinc runs here on:
Intel i7-3770K (only 4 Cores run BOINC)
Nvidia GT-630 (Fermi)

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5953
Credit: 62,443,198
RAC: 39,432
Australia
Message 650797 - Posted: 29 Sep 2007, 8:52:22 UTC - in response to Message 650765.

in some programs u must add 15°C in some other u must take off 15°C and some is near correct

That is the case with all motherboards.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5953
Credit: 62,443,198
RAC: 39,432
Australia
Message 650799 - Posted: 29 Sep 2007, 8:54:08 UTC - in response to Message 650762.

The new quad is running at a lower temp than the 3.2GHz PentD that it replaced. As I mentioned at the beginning of the thread, the PentD was running about 65+C.

Not surprising.
Generally & Core 2 Duo will run at almost half the temperature & do twice the work of a P4.
____________
Grant
Darwin NT.

Profile Careface
Send message
Joined: 6 Jun 03
Posts: 115
Credit: 11,626,751
RAC: 0
New Zealand
Message 650933 - Posted: 29 Sep 2007, 16:29:21 UTC - in response to Message 650191.


Actually, electromigration failures are far more likely to be instantly catastrophic in their observable performance effects than not.


True, but you can't deny its effects can still be noticed if its slow enough. You seem to know your stuff, I have a question for you. Could you explain "coil-whine" to me? Im almost 100% certain I have a cpu suffering from it. making high pitched noises (yes, the cpu, not the mobo or anything else - ive checked) whenever its under load.

At least, I think its called coil whine.. the ocforum I frequent doesnt have a lot of posts on the topic..
____________

archae86
Send message
Joined: 31 Aug 99
Posts: 889
Credit: 1,572,794
RAC: 3
United States
Message 650947 - Posted: 29 Sep 2007, 17:20:50 UTC - in response to Message 650933.

Could you explain "coil-whine" to me? Im almost 100% certain I have a cpu suffering from it. making high pitched noises (yes, the cpu, not the mobo or anything else - ive checked) whenever its under load.

I think people use that term somewhat generically for acoustic noise produced by mechanical vibration of parts induced by electrical variation at frequencies in the audible range.

I think it gets the name because the most common specific electrical component to do this is an inductor (aka coil). PC motherboards commonly have electrical components right next to the CPU to perform the final conversion to the actual operating voltage of the CPU, including capacitors, inductors, and active electronic components. This assembly is reputed to make a nasty whine on some models of motherboards.

Another common place to find that sort of sound is in drivers for LCD displays. I have an ancient HP palmtop which you could hear from several inches away.

I doubt your CPU itself is making the sound, though the power conversion circuitry right next to it on the motherboard would be a good candidate.

____________

Profile michael37
Avatar
Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 651338 - Posted: 30 Sep 2007, 4:07:02 UTC

Well, I just suffered my first serious hardware fault from the dreaded enemy after 8 years of running Seti. This computer had an Athlon XP 1800+ which was a fairly quick CPU back in 2002. It ran seti non-stop from the moment I put my desktop together.

A few days ago, it would post, but would not boot into the operating system. I took it apart, checked the CPU and found a crack on the substrate right next to the chip. I'd say it was definitely heat. I have not overclocked the chip, and I used a standard AMD cooler.

R.I.P.

Luckily, I had an Athlon XP 2500+ lying around and not used =) So, I expect my RAC to go up now :)
____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13706
Credit: 31,728,617
RAC: 12,672
United States
Message 651432 - Posted: 30 Sep 2007, 11:40:21 UTC - in response to Message 651338.

A few days ago, it would post, but would not boot into the operating system. I took it apart, checked the CPU and found a crack on the substrate right next to the chip. I'd say it was definitely heat. I have not overclocked the chip, and I used a standard AMD cooler.


Ouch! Sounds like poor manufacturing (perhaps a bad batch). I still have an Athlon Classic running at 1.4GHz using a ceramic package that is working well 24/7.
____________

GALIFREAN
Avatar
Send message
Joined: 14 Jul 99
Posts: 148
Credit: 28,658
RAC: 0
United States
Message 651542 - Posted: 30 Sep 2007, 16:15:01 UTC

Just wanted to share my experience. Had a problem with booting, locking up, overheating etc... After hours of head banging and hair pulling, I found that the heatsink was not properly seated. I failed to notice that there is a step on it, and I had it oriented incorrectly. It appeared to be correct, and the retainers locked, but there was not 100% contact between the chip and heatsink. When I finally realized and corrected this, all the malfunctions stopped.
____________
I demand a refund. Oh wait, I didn't pay to join,
I VOLUNTEERED!

archae86
Send message
Joined: 31 Aug 99
Posts: 889
Credit: 1,572,794
RAC: 3
United States
Message 651836 - Posted: 30 Sep 2007, 20:24:28 UTC - in response to Message 651542.

but there was not 100% contact between the chip and heatsink


That would be very bad indeed, even if the gap were filled with thermal paste--even worse if not. Thanks for sharing.

____________

Sirius B
Volunteer tester
Avatar
Send message
Joined: 26 Dec 00
Posts: 11973
Credit: 1,796,382
RAC: 578
Bermuda
Message 652337 - Posted: 1 Oct 2007, 15:30:07 UTC - in response to Message 650270.


System 2 has now been running continuously for 9 hours. Temps are:
33/38/38/94/23/44/54 (HD2).

Have 120mm Fan intake, 120mm exhaust, 80mm top cooling fan, 80mm side fan + 80mm intake fan on underside of psu.

No hiccups so far, in fact system running a heck of a lot smoother than yesterday with no unexpected shutdowns.

Could it be a faulty mb sensor?

Is it anything to worry about?


Just to let everyone know. M/B is faulty. Had another system bulider check it & he got 98% main board temp with hd 1 reading 261%.

Supplier sending out new board, should receive Wednesday/Thursday this week.

Profile michael37
Avatar
Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 653812 - Posted: 4 Oct 2007, 4:54:18 UTC - in response to Message 651432.

A few days ago, it would post, but would not boot into the operating system. I took it apart, checked the CPU and found a crack on the substrate right next to the chip. I'd say it was definitely heat. I have not overclocked the chip, and I used a standard AMD cooler.


Ouch! Sounds like poor manufacturing (perhaps a bad batch). I still have an Athlon Classic running at 1.4GHz using a ceramic package that is working well 24/7.


Maybe :) I can't complain though, this CPU had been crunching Seti between 2002 and 2007.
____________

Osiris30
Send message
Joined: 19 Aug 07
Posts: 264
Credit: 41,917,631
RAC: 0
Barbados
Message 653814 - Posted: 4 Oct 2007, 5:16:32 UTC

I have an interesting heat problem at the moment... Namely the heat of the CPUs affecting the room they are in. We have a room segmented off for training at the office and with all the PCs in there crunching 24/7 it can get exceptionally warm when they turn down the climate control system on the weekends.

Luckily winter is on the way, but I'm not sure what I'll do come the summer (although hopefully by then I will have moved said CPUs, which are P4 dual cores into the main office and replaced then with new C2Ds that should run a LOT cooler).

I wish BOINC had the ability to monitor a temp prob and shut down when the temp got above a certain point, because it must have been 110F in the room on Sat. I've asked building maint to leave the AC a little higher on the weekend, so we shall see, else I think I'll have to turn the room off until the winter sets in properly :/

Astro
Volunteer tester
Avatar
Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 653897 - Posted: 4 Oct 2007, 9:41:44 UTC
Last modified: 4 Oct 2007, 9:43:01 UTC

Osirius30, on the boinc manager click "advanced", then "preferences", then click on the "processor" tab. You might be able to stop boinc just on the weekends. Perhaps, just shutting down 1/4 of them on the weekend would help, or some mix.

Mikael Bjorkbom
Send message
Joined: 12 Jan 00
Posts: 6
Credit: 2,462,163
RAC: 2,285
Finland
Message 654046 - Posted: 4 Oct 2007, 16:26:23 UTC - in response to Message 653814.


I wish BOINC had the ability to monitor a temp prob and shut down when the temp got above a certain point, because it must have been 110F in the room on Sat. I've asked building maint to leave the AC a little higher on the weekend, so we shall see, else I think I'll have to turn the room off until the winter sets in properly :/



You can actually start and stop BOINC with an application called SpeedFan and boinccmd.exe. SpeedFan can be configured to launch applications at various events. Such as when the CPU temperature rises past a certain temperature. With boinccmd you can turn on or suspend crunching.

Configure SpeedFan to launch "boinccmd --set_run_mode never" when the temperature rises too high and "boinccmd --set_run_mode auto" to resume crunching when the computer has cooled off.

I use this to avoid overheating in the summer. I want to take every precaution because my fan solution is a bit special. I have tried to minimize the noise, while still having enough cooling to crunch at full speed. But in the summer the ambient temperatures can be too much and then a few minutes of idle time resolves the situation. So this solution has worked very well for me.
____________

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3459
Credit: 2,214,940
RAC: 1,030
Canada
Message 654580 - Posted: 5 Oct 2007, 15:09:49 UTC

At the risk of hijacking the thread: can somebody help a noob with information on "acceptable" CPU temperatures?

I've been running SETI and some other projects on a Toshiba Satellite A100 with a Pentium M dual core processor for over a year now. The exit air temperature seemed to be going up lately, so I downloaded the Intel TAT to look at the temps. The CPUs were running at 67 to 70C. After blowing out (but not disassembling) the air path, this dropped to 60 to 62C.

Is this cause for concern? Where can I find the manufacturer's recommend operating temps?
____________

Sirius B
Volunteer tester
Avatar
Send message
Joined: 26 Dec 00
Posts: 11973
Credit: 1,796,382
RAC: 578
Bermuda
Message 654593 - Posted: 5 Oct 2007, 15:39:16 UTC - in response to Message 654580.

At the risk of hijacking the thread: can somebody help a noob with information on "acceptable" CPU temperatures?

I've been running SETI and some other projects on a Toshiba Satellite A100 with a Pentium M dual core processor for over a year now. The exit air temperature seemed to be going up lately, so I downloaded the Intel TAT to look at the temps. The CPUs were running at 67 to 70C. After blowing out (but not disassembling) the air path, this dropped to 60 to 62C.

Is this cause for concern? Where can I find the manufacturer's recommend operating temps?



As you've had it for a year, try stipping it down completely & rebuild, giving each part a through clean. Then re-apply fresh thermal paste. you should notice a significant difference.

Also make sure that you throughly clean each fan.

What I tend to do is to throughly clean the case as dust can build up in places that are not noticed at a glance.

I've just recently installed a new motherboard on my main system & the amount of dust build up in the case was unbelievable.
____________

Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Heat - The Dreaded Enemy

Copyright © 2014 University of California