Message boards :
Number crunching :
Strange BSoD message
Message board moderation
| Author | Message |
|---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6242 Credit: 106,370,077 RAC: 275
|
After reciving so many good advices I prepared to repair this host until win or its complete death... but suddenly it started behave well %). Few weeks already, at full BOINC load, w/o any BSoD or restart or any failure. The single changed condition - with spring coming heaters in room were cooled down. CPU temps (accordingly TThrottle) at 71C, but all working well. So it was definitely no CPU overheating per se (CPU temps were even lower when BSoDs were). Perhaps, chipset overheating or memory overheating..... |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0
|
It's also in the logging.... TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0
|
Wow, very fast response! :) Suggestion: add this info also to Expert tab e.g. to the right of [Set to default] button e.g. (if CPU supports reporting of TjMax): "The default TjMax is XX°C as read from CPU register (MSR)" or (if CPU do Not report TjMax): "The default TjMax is XX°C as found in TThrottle tables" - ALF - "Find out what you don't do well ..... then don't do it!" :) |
SciManStev ![]() Send message Joined: 20 Jun 99 Posts: 6553 Credit: 121,090,076 RAC: 0
|
Thank you Fred! That was a very quick responce. Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0
|
A new version of TThrottle now reads the TJunction from the CPU. http://www.efmer.eu/boinc/download_beta.html I ruled out all CPU's before Family 6, but there is still a small change the driver crashes on some older CPU's. You should see the line: TJunction read from CPU: 100 °C, TJunction using: 100 °C in the logging. When there is no such line the value isn't there or valid. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
kittyman ![]() Send message Joined: 9 Jul 00 Posts: 50494 Credit: 1,018,363,574 RAC: 2,276
|
I dunno....... I clock 'em until they don't run anymore. Period. I have NEVER, and I repeat, NEVER....burned up a CPU. Bunch of motherboards, a gaggle of PSUs, stick of RAM here and there, twenty or so cooling fans, and a power burn of several power strips that caused me to run a 50 amp 240v dedicated service line to the crunching den. But I have never burned up a CPU. The only CPU failure I ever suffered was the original Frozen Nehi......and that was due to condensation accumulating under the chippy in the socket and corroding the gold lands off of the bottom of the chip. Even the old AMD Semprons I played with in the old days......and those little toasters ran HOT!! It's hard to kill a CPU folks. Really hard. They just shut down if they are unhappy. I am still running the original C2D chippy that I first bought when they came out. And now it is hosting 2 GTX295s....LOL. 1.525 core volts. For years now. It is happy there. I am not gonna wake it out of happy land. Good cooling is all you need. Toss the stock cooler out the window as soon as the package arrives in the mail. Just don't hit the kitties with it on the way out. Buy yourself a decent aftermarket cooler, a little bit of most any thermal paste, and you are good to go. Since, these days, the GPUs do most of the work, and the CPU is just the host, even overclocking the CPU is not much of an issue......take this from me, I was the master of it. Nowadays, I just back the CPU off for platform stability and let the GPUs take over. I would never again undertake subzero cooling of a CPU....once the compressor fails on the Frozen 920...that is it. No use in doing that anymore. Some others are doing 4.1Ghz on air cooling anyway. Enough from me for now...LOL. Meeow. "Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein "With cats." kittyman
|
HAL9000 Send message Joined: 11 Sep 99 Posts: 6533 Credit: 196,805,888 RAC: 130
|
This thread slightly deviated from original problem to CPU temp measuring problem, but all this posted info VERY interesting and helful, big thanks to all who participate! Yes, I think we got off track on trying to find out if it was maybe thermal issue. Any improvement or can you still make BSOD happen? In thinking maybe a memory issue, which I had once. Have you tried running with only 1 stick in at a time and trying to cause BSOD? SETI@home classic workunits: 93,865 CPU time: 863,447 hours |
Raistmer Send message Joined: 16 Jun 01 Posts: 6242 Credit: 106,370,077 RAC: 275
|
This thread slightly deviated from original problem to CPU temp measuring problem, but all this posted info VERY interesting and helful, big thanks to all who participate! |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0
|
At 67°C on the LCD ... That is almost the maximum CPU case/cover temp (if LCD shows this) - don't go so high (on LCD) ( To check what LCD shows use SIV - System Information Viewer (.zip - no install needed) http://www.rh-software.com/ As you see on my screenshot SIV shows both: individual-cores-temps and CPU case temp. Try this (free) program - it gives very comprehensive info about everything in the computer. On my one-core CPU it uses ~20-30 sec. CPU time per day. (I use it with "-Tray" command-line switch = always running; I put shortcut to it in Startup folder) Size of the .zip is 3.5 MB ) But if TjMax of this CPU is really 101°C you can go to even 80-85°C of the CPU cores temperature (not on LCD) and the CPU will work. The values which are for sure true/real in RealTemp window are "Distance to TJ Max" - they do not depend on do (or do not) the program (RealTemp) know what the TjMax of this CPU really is. ("Distance to TJ Max" is the raw value which all programs read from CPU registers) Keep "Distance to TJ Max" over 15-20 and you are good (bigger the Distance the better). - ALF - "Find out what you don't do well ..... then don't do it!" :) |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0
|
I just installed real temp, and it reported the Tjunction at 101C. I set TThrottle and the core temps line up with Real temp. I am getting core temps from 45C to 55C, and the LCD display is reading 51C. Unless I can find a core spec, I'll leave the back throttle at 70C, and the shutdown at 75C. The last time I did this, things got way out of whack as my core temp rose. Tomorrow, when I have more time, I will experiment and raise my core temp, and make sure it back throttles as it did with the Tjunction set at 80C. Now you are better - you set TThrottle to start acting when the CPU is 31°C (101-70) from the max (not matter what exactly is this "max") - the Intel CPU just says to TThrottle: a) "I am 46°C from the max" - TThrottle does nothing to tame the apps b) "I am 30°C from the max" - TThrottle starts to throttle the apps (Intel CPU never say "My temp is XX°C", it always say "I am YY°C from the max" From the RealTemp site: "Each core on these processors has a digital thermal sensor (DTS) that reports temperature data relative to TJMax which is the safe maximum operating core temperature for the CPU." From the CoreTemp site: "A different MSR contains the temperature data. The data is represented as a Delta in °C between current temperature and Tjunction. So the actual temperature is calculated like this 'Core Temp = Tjunction - Delta'" ) If you need change - change the 70°C "Set Core" up/down, not the TjMax Edit: For correct display of temps it will be good to double-check with CoreTemp the TjMax (in case RealTemp makes just guess and do not read it from the CPU (MSR)) http://www.alcpu.com/CoreTemp/ (There are also .zip versions (32 & 64 bit) of CoreTemp - no install needed (the installer contains both 32 & 64 bit versions)) - ALF - "Find out what you don't do well ..... then don't do it!" :) |
SciManStev ![]() Send message Joined: 20 Jun 99 Posts: 6553 Credit: 121,090,076 RAC: 0
|
That's interesting. I don't remember the numbers the last time I cranked up the temps, but they skewed a great deal from what TThrottle reported, causing my CPU to back throttle when my LCD reported a temp of about 60C. That's what led me to syncronize the two readings at hot temps so that I had something solid to go on. At 67C on the LCD with the Tjunction set at 80C, TThrottle was reading 67C, and back throttled as I went up in temperature. I am anxious to retest this tomorrow. Right now, it looks like everything is tracking as I would have expected it to, before my experience of things going whacky. Even before, things looked fine until I increased temperature. Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0
|
Strangely enough on my AMD64 the internal core temp was/is shown always 1-3 degrees lower than the external CPU case temp!? (a few years ago I contacted the Everest people (at forum) to ask is it possible that AMD uses some Peltier inside the CPU cover (so cover is hotter than the inside). They said - No, it is just inaccuracy of the sensors) - ALF - "Find out what you don't do well ..... then don't do it!" :) |
SciManStev ![]() Send message Joined: 20 Jun 99 Posts: 6553 Credit: 121,090,076 RAC: 0
|
I just installed real temp, and it reported the Tjunction at 101C. I set TThrottle and the core temps line up with Real temp. I am getting core temps from 45C to 55C, and the LCD display is reading 51C. Unless I can find a core spec, I'll leave the back throttle at 70C, and the shutdown at 75C. The last time I did this, things got way out of whack as my core temp rose. Tomorrow, when I have more time, I will experiment and raise my core temp, and make sure it back throttles as it did with the Tjunction set at 80C. Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6533 Credit: 196,805,888 RAC: 130
|
I think maybe Max Case +5ºC might be to low. For example an AMD Athlon X2 64 5400+. CPU case seems to be 55º-59ºC with core temps about 20ºC higher. SETI@home classic workunits: 93,865 CPU time: 863,447 hours |
SciManStev ![]() Send message Joined: 20 Jun 99 Posts: 6553 Credit: 121,090,076 RAC: 0
|
Good enough, I will download and install that software and use that junction temp. The trick then becomes what is the core temp spec? Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0
|
"In the later generation of processors, starting with Nehalem, the exact Tjunction Max value is available for software to read in an MSR (short for Model Specific Register)." CoreTemp and RealTemp read this Tjunction Max value from the CPU itself (use their value). http://setiathome.berkeley.edu/forum_thread.php?id=59292 As Fred say (about TThrottle not still using this new feature): A) MSR containing Tjunction Max (one constant value for the given CPU) No, but I will try to implement it in one of the new releases. P.S. @Fred - maybe you should implement direct setting of Delta/offset to TjMax e.g. "Keep CPU temperature 20°C under the Max allowed" - this will be foolproof (for Intel CPUs) as this is the exact value you read from the CPU core. - ALF - "Find out what you don't do well ..... then don't do it!" :) |
SciManStev ![]() Send message Joined: 20 Jun 99 Posts: 6553 Credit: 121,090,076 RAC: 0
|
I did read everything, and I am willing to change it. The question becomes what do I set the Tjunction at, default is 100C, and I can then check to see what core temp I should set it at. I haven't seen anything in the Intel specs as to what it should be. I do understand that I am measuring two different things, but I only have a spec for one of them. Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0
|
Did you read my first post on the subject?: http://setiathome.berkeley.edu/forum_thread.php?id=63650&nowrap=true#1094579 and the link in it ("How does it work?")?: http://www.alcpu.com/CoreTemp/howitworks.html If you don't believe me wait for (or PM) Fred to confirm: the same result with when TThrottle will act to protect the CPU (namely 10°C near the Tjunction Max) will be with: TjMax=180 and "Set Core" 170°C - there is no difference in when protection action of TThrottle starts (it will be at the same real core temp) Only the reading will be (very) wrong (e.g. TThrottle will just show e.g. 160°C and will not act) Your desire so LCD and programs that read CPU-cores temps to show the same value is similar as if you want (during winter) your 2 thermometers - on balcony outside and in the room near stove to show the same value (will you call this "comfortable" tweak ;) ) Log from TThrottle on AMD Athlon(tm) 64 Processor 3500+ (on Intel may be different!): 09 April 2011 - 01:48:24 Driver installed properly. Driver Version: 2.0 Program version: 2.10 32Bit Microsoft Windows XP Professional Service Pack 3 (build 2600) Language: User: 1026 BGR ,System: 1026 BGR nvidia: found 1 logical devices nvidia: found 1 physical devices nvidia: GeForce 6150SE nForce 430 Vendor ID: AuthenticAMD Vendor: AMD HighestIntegerValue: 00000001 - Processor Signature: 00050FF2 Misc. info: 00000800 Feature Flags1 00002001 Feature Flags2 078BFBFF Processor: AMD Athlon(tm) 64 Processor 3500+ Processor: Family: Fh, Model: 5F, Stepping: 2 Processor: Revision: DH-F2, Revision: F Processor: Socket: AM2, Type: Athlon(tm) 64 Processor: Max Die (Tjunction) Temperature: 70.0 °C The real Max Die (Tjunction) temperature is not this value This value is for calculating the temperature ONLY Max Die (Tjunction) is normally about Max Case + 5C Processor: Max Case Temperature: 70.0 °C, Max Power: 0.0 W Core Temperature: 47 °C, Raw Data: 602220 602220,602220,602220,602220,602220,602220,602220,602220,602220,602220,60 2220,602220,602220,602220,602220,602220,0ffffffffffffffffffffffff0000000 00000 This Processor has 1 cores and 1 temperature sensors. BOINC: orbit.psi.edu_oah setiathome.berkeley.edu You can help by reading www.efmer.eu/boinc/faq.html How can I help! Select the send EMail button,or copy everything in this logging window and mail it to me! boinc [~ at ~] efmer .eu - We use this information to improve this product. 09 April 2011 - 01:48:34 Number of matching Programs (Processes): 1 Cpu: ak_v8b_win_sse3_amd.exe, PID: 2600, Threads: 3 - ALF - "Find out what you don't do well ..... then don't do it!" :) |
SciManStev ![]() Send message Joined: 20 Jun 99 Posts: 6553 Credit: 121,090,076 RAC: 0
|
The kitties say...... I haven't looked lately, but that CPU cost me $1100 when it first came out, and I really don't want to destroy it. It is very fast though! It does anything I throw at it with ease. Even on Einstein I could run it with hyperthreading on, and at higher clock speed, but the GPU usage was lower, so the overall current drain was lower. I am really beginning to think that this new Corsair Gold 1200 watt supply is going to allow me to use hyperthreading, as well as a faster clock for both CPU and GPU's. I can't wait to experiment! Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
kittyman ![]() Send message Joined: 9 Jul 00 Posts: 50494 Credit: 1,018,363,574 RAC: 2,276
|
The kitties say...... "Run 'em 'till they smoke." Then you have gone a bridge too far. "Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein "With cats." kittyman
|
©2020 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.