Heat - The Dreaded Enemy

Message boards : Number crunching : Heat - The Dreaded Enemy
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Osiris30

Send message
Joined: 19 Aug 07
Posts: 264
Credit: 41,917,631
RAC: 0
Barbados
Message 650719 - Posted: 29 Sep 2007, 3:40:25 UTC - in response to Message 650191.  

Its known as electromigration, and I have seen a couple of CPUs slowly degrade in OCability due to its effects. Sadly, my AXP2000+ @ 2200mhz is starting to feel it

Actually, electromigration failures are far more likely to be instantly catastrophic in their observable performance effects than not.

Gradual performance degradation failures are more likely form other mechanisms--such as threshold voltage shifts and several other things associated with the transistors and not the wiring.

The cheery confidence displayed by some overclockers that simply assuring a good die temperature is full protection from OC bad effects is misplaced. Some of the degradation mechanisms are primarily voltage dependent, with little temperature effect, while some of the other mechanisms have very strong temperature dependence. Still others respond primarily to temperature cycling.


Just a point of order. Electromigration is present in non-OC'ed CPUs as well, just to a lesser extent.
ID: 650719 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 650722 - Posted: 29 Sep 2007, 3:50:16 UTC - in response to Message 650715.  
Last modified: 29 Sep 2007, 3:57:03 UTC

Okay Crunchers!
[snipppppp]
CPU temp=39C (28C/82F lower temp)
MB temp=41C
[more snipping]
IROC

Are you saying the cpu temps are LOWER than Motherboard temps?? Were you running Seti on all cores? (or some other project app) If so, what kind of Zalman cooler are you using?

ID: 650722 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 650743 - Posted: 29 Sep 2007, 4:20:26 UTC - in response to Message 650719.  

Just a point of order. Electromigration is present in non-OC'ed CPUs as well, just to a lesser extent.

Sure, as so also are pretty much all the other failure mechanisms.

But electromigration for many design/process combinations has often been at so low a level that as a practical matter only initially defective structures failed from it in the field. Years after I went to work in the business, the big IC company I worked for had never seen a field EM failure (we made more than one later).

Running hotter, whether from overclocking, or just from BOINCing a machine which would otherwise be idle, makes many, in fact most, failure mechanisms worse. Running at higher voltage makes another set of mechanisms worse, with appreciable overlap between the two.

Manufacturers tend to know when they have a real field electromigration failure problem with a part. It is one of the very, very few mechanisms for which the failure rate rises with time. This is seriously nasty. By the time you know you have a problem, the check is already in the mail for a very much larger one than what you have seen. As we generally expect decreasing failure rates with time, this stands out like a sore thumb if anyone is paying attention.
ID: 650743 · Report as offensive
Profile IROC

Send message
Joined: 27 Jun 99
Posts: 57
Credit: 11,380,977
RAC: 0
United States
Message 650762 - Posted: 29 Sep 2007, 5:18:17 UTC - in response to Message 650722.  


Are you saying the cpu temps are LOWER than Motherboard temps?? Were you running Seti on all cores? (or some other project app) If so, what kind of Zalman cooler are you using?



The new quad is running at a lower temp than the 3.2GHz PentD that it replaced. As I mentioned at the beginning of the thread, the PentD was running about 65+C. The quad is running 40C with all cores running SETI.

Zalman CNPS9500.

IROC

ID: 650762 · Report as offensive
Profile Ralf02061973
Volunteer tester
Avatar

Send message
Joined: 24 Jul 00
Posts: 54
Credit: 9,983,656
RAC: 8
Germany
Message 650765 - Posted: 29 Sep 2007, 5:27:24 UTC
Last modified: 29 Sep 2007, 5:30:25 UTC

about temps from core2

not all temp programs read the core2 temp right

in some programs u must add 15°C in some other u must take off 15°C and some is near correct

so think about it
Boinc runs here on:
Intel i7-3770K + IntelHD4000
Android-Stick-ARM-Cotex-A17
Sony-Z5C-ARM-Cortex-A53/A57
Nvidia GT-630 / Nvidia GTX-750Ti
ID: 650765 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 650797 - Posted: 29 Sep 2007, 8:52:22 UTC - in response to Message 650765.  

in some programs u must add 15°C in some other u must take off 15°C and some is near correct

That is the case with all motherboards.
Grant
Darwin NT
ID: 650797 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 650799 - Posted: 29 Sep 2007, 8:54:08 UTC - in response to Message 650762.  

The new quad is running at a lower temp than the 3.2GHz PentD that it replaced. As I mentioned at the beginning of the thread, the PentD was running about 65+C.

Not surprising.
Generally & Core 2 Duo will run at almost half the temperature & do twice the work of a P4.
Grant
Darwin NT
ID: 650799 · Report as offensive
Profile Careface

Send message
Joined: 6 Jun 03
Posts: 128
Credit: 16,561,684
RAC: 0
New Zealand
Message 650933 - Posted: 29 Sep 2007, 16:29:21 UTC - in response to Message 650191.  


Actually, electromigration failures are far more likely to be instantly catastrophic in their observable performance effects than not.


True, but you can't deny its effects can still be noticed if its slow enough. You seem to know your stuff, I have a question for you. Could you explain "coil-whine" to me? Im almost 100% certain I have a cpu suffering from it. making high pitched noises (yes, the cpu, not the mobo or anything else - ive checked) whenever its under load.

At least, I think its called coil whine.. the ocforum I frequent doesnt have a lot of posts on the topic..
ID: 650933 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 650947 - Posted: 29 Sep 2007, 17:20:50 UTC - in response to Message 650933.  

Could you explain "coil-whine" to me? Im almost 100% certain I have a cpu suffering from it. making high pitched noises (yes, the cpu, not the mobo or anything else - ive checked) whenever its under load.

I think people use that term somewhat generically for acoustic noise produced by mechanical vibration of parts induced by electrical variation at frequencies in the audible range.

I think it gets the name because the most common specific electrical component to do this is an inductor (aka coil). PC motherboards commonly have electrical components right next to the CPU to perform the final conversion to the actual operating voltage of the CPU, including capacitors, inductors, and active electronic components. This assembly is reputed to make a nasty whine on some models of motherboards.

Another common place to find that sort of sound is in drivers for LCD displays. I have an ancient HP palmtop which you could hear from several inches away.

I doubt your CPU itself is making the sound, though the power conversion circuitry right next to it on the motherboard would be a good candidate.

ID: 650947 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 651338 - Posted: 30 Sep 2007, 4:07:02 UTC

Well, I just suffered my first serious hardware fault from the dreaded enemy after 8 years of running Seti. This computer had an Athlon XP 1800+ which was a fairly quick CPU back in 2002. It ran seti non-stop from the moment I put my desktop together.

A few days ago, it would post, but would not boot into the operating system. I took it apart, checked the CPU and found a crack on the substrate right next to the chip. I'd say it was definitely heat. I have not overclocked the chip, and I used a standard AMD cooler.

R.I.P.

Luckily, I had an Athlon XP 2500+ lying around and not used =) So, I expect my RAC to go up now :)

ID: 651338 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 651432 - Posted: 30 Sep 2007, 11:40:21 UTC - in response to Message 651338.  

A few days ago, it would post, but would not boot into the operating system. I took it apart, checked the CPU and found a crack on the substrate right next to the chip. I'd say it was definitely heat. I have not overclocked the chip, and I used a standard AMD cooler.


Ouch! Sounds like poor manufacturing (perhaps a bad batch). I still have an Athlon Classic running at 1.4GHz using a ceramic package that is working well 24/7.
ID: 651432 · Report as offensive
GALIFREAN
Avatar

Send message
Joined: 14 Jul 99
Posts: 148
Credit: 28,658
RAC: 0
United States
Message 651542 - Posted: 30 Sep 2007, 16:15:01 UTC

Just wanted to share my experience. Had a problem with booting, locking up, overheating etc... After hours of head banging and hair pulling, I found that the heatsink was not properly seated. I failed to notice that there is a step on it, and I had it oriented incorrectly. It appeared to be correct, and the retainers locked, but there was not 100% contact between the chip and heatsink. When I finally realized and corrected this, all the malfunctions stopped.
I demand a refund. Oh wait, I didn't pay to join,
I VOLUNTEERED!
ID: 651542 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 651836 - Posted: 30 Sep 2007, 20:24:28 UTC - in response to Message 651542.  

but there was not 100% contact between the chip and heatsink


That would be very bad indeed, even if the gap were filled with thermal paste--even worse if not. Thanks for sharing.

ID: 651836 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24870
Credit: 3,081,182
RAC: 7
Ireland
Message 652337 - Posted: 1 Oct 2007, 15:30:07 UTC - in response to Message 650270.  


System 2 has now been running continuously for 9 hours. Temps are:
33/38/38/94/23/44/54 (HD2).

Have 120mm Fan intake, 120mm exhaust, 80mm top cooling fan, 80mm side fan + 80mm intake fan on underside of psu.

No hiccups so far, in fact system running a heck of a lot smoother than yesterday with no unexpected shutdowns.

Could it be a faulty mb sensor?

Is it anything to worry about?


Just to let everyone know. M/B is faulty. Had another system bulider check it & he got 98% main board temp with hd 1 reading 261%.

Supplier sending out new board, should receive Wednesday/Thursday this week.
ID: 652337 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 653812 - Posted: 4 Oct 2007, 4:54:18 UTC - in response to Message 651432.  

A few days ago, it would post, but would not boot into the operating system. I took it apart, checked the CPU and found a crack on the substrate right next to the chip. I'd say it was definitely heat. I have not overclocked the chip, and I used a standard AMD cooler.


Ouch! Sounds like poor manufacturing (perhaps a bad batch). I still have an Athlon Classic running at 1.4GHz using a ceramic package that is working well 24/7.


Maybe :) I can't complain though, this CPU had been crunching Seti between 2002 and 2007.

ID: 653812 · Report as offensive
Osiris30

Send message
Joined: 19 Aug 07
Posts: 264
Credit: 41,917,631
RAC: 0
Barbados
Message 653814 - Posted: 4 Oct 2007, 5:16:32 UTC

I have an interesting heat problem at the moment... Namely the heat of the CPUs affecting the room they are in. We have a room segmented off for training at the office and with all the PCs in there crunching 24/7 it can get exceptionally warm when they turn down the climate control system on the weekends.

Luckily winter is on the way, but I'm not sure what I'll do come the summer (although hopefully by then I will have moved said CPUs, which are P4 dual cores into the main office and replaced then with new C2Ds that should run a LOT cooler).

I wish BOINC had the ability to monitor a temp prob and shut down when the temp got above a certain point, because it must have been 110F in the room on Sat. I've asked building maint to leave the AC a little higher on the weekend, so we shall see, else I think I'll have to turn the room off until the winter sets in properly :/

ID: 653814 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 653897 - Posted: 4 Oct 2007, 9:41:44 UTC
Last modified: 4 Oct 2007, 9:43:01 UTC

Osirius30, on the boinc manager click "advanced", then "preferences", then click on the "processor" tab. You might be able to stop boinc just on the weekends. Perhaps, just shutting down 1/4 of them on the weekend would help, or some mix.
ID: 653897 · Report as offensive
Mikael_Bjorkbom

Send message
Joined: 12 Jan 00
Posts: 6
Credit: 5,635,899
RAC: 6
Finland
Message 654046 - Posted: 4 Oct 2007, 16:26:23 UTC - in response to Message 653814.  


I wish BOINC had the ability to monitor a temp prob and shut down when the temp got above a certain point, because it must have been 110F in the room on Sat. I've asked building maint to leave the AC a little higher on the weekend, so we shall see, else I think I'll have to turn the room off until the winter sets in properly :/



You can actually start and stop BOINC with an application called SpeedFan and boinccmd.exe. SpeedFan can be configured to launch applications at various events. Such as when the CPU temperature rises past a certain temperature. With boinccmd you can turn on or suspend crunching.

Configure SpeedFan to launch "boinccmd --set_run_mode never" when the temperature rises too high and "boinccmd --set_run_mode auto" to resume crunching when the computer has cooled off.

I use this to avoid overheating in the summer. I want to take every precaution because my fan solution is a bit special. I have tried to minimize the noise, while still having enough cooling to crunch at full speed. But in the summer the ambient temperatures can be too much and then a few minutes of idle time resolves the situation. So this solution has worked very well for me.
ID: 654046 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 654580 - Posted: 5 Oct 2007, 15:09:49 UTC

At the risk of hijacking the thread: can somebody help a noob with information on "acceptable" CPU temperatures?

I've been running SETI and some other projects on a Toshiba Satellite A100 with a Pentium M dual core processor for over a year now. The exit air temperature seemed to be going up lately, so I downloaded the Intel TAT to look at the temps. The CPUs were running at 67 to 70C. After blowing out (but not disassembling) the air path, this dropped to 60 to 62C.

Is this cause for concern? Where can I find the manufacturer's recommend operating temps?

ID: 654580 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24870
Credit: 3,081,182
RAC: 7
Ireland
Message 654593 - Posted: 5 Oct 2007, 15:39:16 UTC - in response to Message 654580.  

At the risk of hijacking the thread: can somebody help a noob with information on "acceptable" CPU temperatures?

I've been running SETI and some other projects on a Toshiba Satellite A100 with a Pentium M dual core processor for over a year now. The exit air temperature seemed to be going up lately, so I downloaded the Intel TAT to look at the temps. The CPUs were running at 67 to 70C. After blowing out (but not disassembling) the air path, this dropped to 60 to 62C.

Is this cause for concern? Where can I find the manufacturer's recommend operating temps?



As you've had it for a year, try stipping it down completely & rebuild, giving each part a through clean. Then re-apply fresh thermal paste. you should notice a significant difference.

Also make sure that you throughly clean each fan.

What I tend to do is to throughly clean the case as dust can build up in places that are not noticed at a glance.

I've just recently installed a new motherboard on my main system & the amount of dust build up in the case was unbelievable.
ID: 654593 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Heat - The Dreaded Enemy


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.