Strange BSoD message

Message boards : Number crunching : Strange BSoD message
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6242
Credit: 106,370,077
RAC: 275
Russia
Message 1107234 - Posted: 18 May 2011, 8:51:23 UTC

After reciving so many good advices I prepared to repair this host until win or its complete death... but suddenly it started behave well %). Few weeks already, at full BOINC load, w/o any BSoD or restart or any failure.

The single changed condition - with spring coming heaters in room were cooled down. CPU temps (accordingly TThrottle) at 71C, but all working well.

So it was definitely no CPU overheating per se (CPU temps were even lower when BSoDs were). Perhaps, chipset overheating or memory overheating.....
ID: 1107234 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1095494 - Posted: 9 Apr 2011, 23:08:44 UTC - in response to Message 1095489.  


or:
"The default TjMax is XX°C
as found in TThrottle tables"

It's also in the logging....

TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1095494 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1095489 - Posted: 9 Apr 2011, 23:05:46 UTC - in response to Message 1095346.  
Last modified: 9 Apr 2011, 23:18:39 UTC


Wow, very fast response! :)

Suggestion: add this info also to Expert tab
e.g. to the right of [Set to default] button

e.g. (if CPU supports reporting of TjMax):
"The default TjMax is XX°C
as read from CPU register (MSR)"

or (if CPU do Not report TjMax):
"The default TjMax is XX°C
as found in TThrottle tables"


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1095489 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6553
Credit: 121,090,076
RAC: 0
United States
Message 1095381 - Posted: 9 Apr 2011, 19:43:39 UTC

Thank you Fred! That was a very quick responce.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1095381 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1095346 - Posted: 9 Apr 2011, 18:49:57 UTC - in response to Message 1095295.  

A new version of TThrottle now reads the TJunction from the CPU.

http://www.efmer.eu/boinc/download_beta.html
I ruled out all CPU's before Family 6, but there is still a small change the driver crashes on some older CPU's.

You should see the line:
TJunction read from CPU: 100 °C, TJunction using: 100 °C
in the logging.
When there is no such line the value isn't there or valid.
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1095346 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50494
Credit: 1,018,363,574
RAC: 2,276
United States
Message 1095295 - Posted: 9 Apr 2011, 16:20:29 UTC

I dunno.......

I clock 'em until they don't run anymore.

Period.

I have NEVER, and I repeat, NEVER....burned up a CPU.

Bunch of motherboards, a gaggle of PSUs, stick of RAM here and there, twenty or so cooling fans, and a power burn of several power strips that caused me to run a 50 amp 240v dedicated service line to the crunching den.

But I have never burned up a CPU.

The only CPU failure I ever suffered was the original Frozen Nehi......and that was due to condensation accumulating under the chippy in the socket and corroding the gold lands off of the bottom of the chip.

Even the old AMD Semprons I played with in the old days......and those little toasters ran HOT!!

It's hard to kill a CPU folks. Really hard. They just shut down if they are unhappy.

I am still running the original C2D chippy that I first bought when they came out. And now it is hosting 2 GTX295s....LOL.

1.525 core volts. For years now. It is happy there. I am not gonna wake it out of happy land.

Good cooling is all you need. Toss the stock cooler out the window as soon as the package arrives in the mail. Just don't hit the kitties with it on the way out. Buy yourself a decent aftermarket cooler, a little bit of most any thermal paste, and you are good to go.

Since, these days, the GPUs do most of the work, and the CPU is just the host, even overclocking the CPU is not much of an issue......take this from me, I was the master of it. Nowadays, I just back the CPU off for platform stability and let the GPUs take over.

I would never again undertake subzero cooling of a CPU....once the compressor fails on the Frozen 920...that is it. No use in doing that anymore. Some others are doing 4.1Ghz on air cooling anyway.

Enough from me for now...LOL.

Meeow.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 1095295 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 130
United States
Message 1095290 - Posted: 9 Apr 2011, 15:50:37 UTC - in response to Message 1095217.  

This thread slightly deviated from original problem to CPU temp measuring problem, but all this posted info VERY interesting and helful, big thanks to all who participate!

Yes, I think we got off track on trying to find out if it was maybe thermal issue.

Any improvement or can you still make BSOD happen?

In thinking maybe a memory issue, which I had once. Have you tried running with only 1 stick in at a time and trying to cause BSOD?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1095290 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6242
Credit: 106,370,077
RAC: 275
Russia
Message 1095217 - Posted: 9 Apr 2011, 8:54:34 UTC

This thread slightly deviated from original problem to CPU temp measuring problem, but all this posted info VERY interesting and helful, big thanks to all who participate!
ID: 1095217 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1095158 - Posted: 9 Apr 2011, 4:55:07 UTC - in response to Message 1095088.  
Last modified: 9 Apr 2011, 5:24:39 UTC

At 67°C on the LCD ...

That is almost the maximum CPU case/cover temp (if LCD shows this)
- don't go so high (on LCD)

( To check what LCD shows use SIV - System Information Viewer (.zip - no install needed)
http://www.rh-software.com/

As you see on my screenshot SIV shows both: individual-cores-temps and CPU case temp.
Try this (free) program - it gives very comprehensive info about everything in the computer.
On my one-core CPU it uses ~20-30 sec. CPU time per day.
(I use it with "-Tray" command-line switch = always running; I put shortcut to it in Startup folder)
Size of the .zip is 3.5 MB
)


But if TjMax of this CPU is really 101°C you can go to even 80-85°C of the CPU cores temperature (not on LCD) and the CPU will work.

The values which are for sure true/real in RealTemp window
are "Distance to TJ Max" - they do not depend on do (or do not) the program (RealTemp) know what the TjMax of this CPU really is.
("Distance to TJ Max" is the raw value which all programs read from CPU registers)

Keep "Distance to TJ Max" over 15-20 and you are good (bigger the Distance the better).


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1095158 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1095089 - Posted: 9 Apr 2011, 3:16:01 UTC - in response to Message 1095030.  
Last modified: 9 Apr 2011, 4:01:05 UTC

I just installed real temp, and it reported the Tjunction at 101C. I set TThrottle and the core temps line up with Real temp. I am getting core temps from 45C to 55C, and the LCD display is reading 51C. Unless I can find a core spec, I'll leave the back throttle at 70C, and the shutdown at 75C. The last time I did this, things got way out of whack as my core temp rose. Tomorrow, when I have more time, I will experiment and raise my core temp, and make sure it back throttles as it did with the Tjunction set at 80C.

Steve


Now you are better - you set TThrottle to start acting when the CPU is 31°C (101-70) from the max (not matter what exactly is this "max")
- the Intel CPU just says to TThrottle:
a) "I am 46°C from the max" - TThrottle does nothing to tame the apps
b) "I am 30°C from the max" - TThrottle starts to throttle the apps

(Intel CPU never say "My temp is XX°C", it always say "I am YY°C from the max"

From the RealTemp site:
"Each core on these processors has a digital thermal sensor (DTS) that reports temperature data relative to TJMax which is the safe maximum operating core temperature for the CPU."

From the CoreTemp site:
"A different MSR contains the temperature data. The data is represented as a Delta in °C between current temperature and Tjunction.
So the actual temperature is calculated like this 'Core Temp = Tjunction - Delta'"
)


If you need change - change the 70°C "Set Core" up/down, not the TjMax


Edit:
For correct display of temps it will be good to double-check
with CoreTemp the TjMax (in case RealTemp makes just guess and do not read it from the CPU (MSR))
http://www.alcpu.com/CoreTemp/

(There are also .zip versions (32 & 64 bit) of CoreTemp - no install needed (the installer contains both 32 & 64 bit versions))


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1095089 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6553
Credit: 121,090,076
RAC: 0
United States
Message 1095088 - Posted: 9 Apr 2011, 3:15:23 UTC

That's interesting. I don't remember the numbers the last time I cranked up the temps, but they skewed a great deal from what TThrottle reported, causing my CPU to back throttle when my LCD reported a temp of about 60C. That's what led me to syncronize the two readings at hot temps so that I had something solid to go on. At 67C on the LCD with the Tjunction set at 80C, TThrottle was reading 67C, and back throttled as I went up in temperature. I am anxious to retest this tomorrow. Right now, it looks like everything is tracking as I would have expected it to, before my experience of things going whacky. Even before, things looked fine until I increased temperature.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1095088 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1095052 - Posted: 9 Apr 2011, 2:53:30 UTC - in response to Message 1095028.  


Strangely enough on my AMD64 the internal core temp was/is shown always 1-3 degrees lower than the external CPU case temp!?
(a few years ago I contacted the Everest people (at forum) to ask is it possible that AMD uses some Peltier inside the CPU cover (so cover is hotter than the inside).
They said - No, it is just inaccuracy of the sensors)









 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1095052 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6553
Credit: 121,090,076
RAC: 0
United States
Message 1095030 - Posted: 9 Apr 2011, 2:26:42 UTC
Last modified: 9 Apr 2011, 2:27:38 UTC

I just installed real temp, and it reported the Tjunction at 101C. I set TThrottle and the core temps line up with Real temp. I am getting core temps from 45C to 55C, and the LCD display is reading 51C. Unless I can find a core spec, I'll leave the back throttle at 70C, and the shutdown at 75C. The last time I did this, things got way out of whack as my core temp rose. Tomorrow, when I have more time, I will experiment and raise my core temp, and make sure it back throttles as it did with the Tjunction set at 80C.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1095030 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 130
United States
Message 1095028 - Posted: 9 Apr 2011, 2:13:29 UTC - in response to Message 1095011.  


Processor: AMD Athlon(tm) 64 Processor 3500+
Processor: Family: Fh, Model: 5F, Stepping: 2
Processor: Revision: DH-F2, Revision: F
Processor: Socket: AM2, Type: Athlon(tm) 64
Processor: Max Die (Tjunction) Temperature: 70.0 °C
The real Max Die (Tjunction) temperature is not this value
This value is for calculating the temperature ONLY
Max Die (Tjunction) is normally about Max Case + 5C
Processor: Max Case Temperature: 70.0 °C, Max Power: 0.0 W


I think maybe Max Case +5ºC might be to low. For example an AMD Athlon X2 64 5400+. CPU case seems to be 55º-59ºC with core temps about 20ºC higher.

SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1095028 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6553
Credit: 121,090,076
RAC: 0
United States
Message 1095023 - Posted: 9 Apr 2011, 2:05:32 UTC

Good enough, I will download and install that software and use that junction temp. The trick then becomes what is the core temp spec?

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1095023 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1095021 - Posted: 9 Apr 2011, 2:02:47 UTC - in response to Message 1095016.  
Last modified: 9 Apr 2011, 2:13:51 UTC


"In the later generation of processors, starting with Nehalem, the exact Tjunction Max value is available for software to read in an MSR (short for Model Specific Register)."

CoreTemp and RealTemp read this Tjunction Max value from the CPU itself (use their value).
http://setiathome.berkeley.edu/forum_thread.php?id=59292


As Fred say (about TThrottle not still using this new feature):
A) MSR containing Tjunction Max (one constant value for the given CPU)

No, but I will try to implement it in one of the new releases.


P.S.
@Fred - maybe you should implement direct setting of Delta/offset to TjMax
e.g. "Keep CPU temperature 20°C under the Max allowed"
- this will be foolproof (for Intel CPUs) as this is the exact value you read from the CPU core.


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1095021 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6553
Credit: 121,090,076
RAC: 0
United States
Message 1095016 - Posted: 9 Apr 2011, 1:49:21 UTC

I did read everything, and I am willing to change it. The question becomes what do I set the Tjunction at, default is 100C, and I can then check to see what core temp I should set it at. I haven't seen anything in the Intel specs as to what it should be. I do understand that I am measuring two different things, but I only have a spec for one of them.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1095016 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1095011 - Posted: 9 Apr 2011, 1:33:46 UTC - in response to Message 1095003.  
Last modified: 9 Apr 2011, 1:55:45 UTC


Did you read my first post on the subject?:
http://setiathome.berkeley.edu/forum_thread.php?id=63650&nowrap=true#1094579

and the link in it ("How does it work?")?:
http://www.alcpu.com/CoreTemp/howitworks.html

If you don't believe me wait for (or PM) Fred to confirm:
the same result with when TThrottle will act to protect the CPU (namely 10°C near the Tjunction Max)
will be with: TjMax=180 and "Set Core" 170°C
- there is no difference in when protection action of TThrottle starts (it will be at the same real core temp)

Only the reading will be (very) wrong (e.g. TThrottle will just show e.g. 160°C and will not act)

Your desire so LCD and programs that read CPU-cores temps to show the same value
is similar as if you want (during winter) your 2 thermometers - on balcony outside and in the room near stove
to show the same value (will you call this "comfortable" tweak ;) )


Log from TThrottle on AMD Athlon(tm) 64 Processor 3500+ (on Intel may be different!):


09 April 2011 - 01:48:24 Driver installed properly. Driver Version: 2.0

Program version: 2.10 32Bit
Microsoft Windows XP Professional Service Pack 3 (build 2600)

Language: User: 1026 BGR ,System: 1026 BGR

nvidia: found 1 logical devices
nvidia: found 1 physical devices

nvidia: GeForce 6150SE nForce 430

Vendor ID: AuthenticAMD
Vendor: AMD
HighestIntegerValue: 00000001 - Processor Signature: 00050FF2
Misc. info: 00000800
Feature Flags1 00002001
Feature Flags2 078BFBFF

Processor: AMD Athlon(tm) 64 Processor 3500+
Processor: Family: Fh, Model: 5F, Stepping: 2
Processor: Revision: DH-F2, Revision: F
Processor: Socket: AM2, Type: Athlon(tm) 64
Processor: Max Die (Tjunction) Temperature: 70.0 °C
The real Max Die (Tjunction) temperature is not this value
This value is for calculating the temperature ONLY
Max Die (Tjunction) is normally about Max Case + 5C
Processor: Max Case Temperature: 70.0 °C, Max Power: 0.0 W

Core Temperature: 47 °C, Raw Data: 602220
602220,602220,602220,602220,602220,602220,602220,602220,602220,602220,60
2220,602220,602220,602220,602220,602220,0ffffffffffffffffffffffff0000000
00000
This Processor has 1 cores and 1 temperature sensors.

BOINC:
orbit.psi.edu_oah
setiathome.berkeley.edu

You can help by reading www.efmer.eu/boinc/faq.html How can I help!
Select the send EMail button,or copy everything in this logging window and mail it to me!
boinc [~ at ~] efmer .eu - We use this information to improve this product.

09 April 2011 - 01:48:34 Number of matching Programs (Processes): 1
Cpu: ak_v8b_win_sse3_amd.exe, PID: 2600, Threads: 3


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1095011 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6553
Credit: 121,090,076
RAC: 0
United States
Message 1095007 - Posted: 9 Apr 2011, 1:18:55 UTC - in response to Message 1095005.  

The kitties say......

"Run 'em 'till they smoke."
Then you have gone a bridge too far.

I haven't looked lately, but that CPU cost me $1100 when it first came out, and I really don't want to destroy it. It is very fast though! It does anything I throw at it with ease. Even on Einstein I could run it with hyperthreading on, and at higher clock speed, but the GPU usage was lower, so the overall current drain was lower. I am really beginning to think that this new Corsair Gold 1200 watt supply is going to allow me to use hyperthreading, as well as a faster clock for both CPU and GPU's. I can't wait to experiment!

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1095007 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50494
Credit: 1,018,363,574
RAC: 2,276
United States
Message 1095005 - Posted: 9 Apr 2011, 1:13:24 UTC

The kitties say......

"Run 'em 'till they smoke."
Then you have gone a bridge too far.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 1095005 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Strange BSoD message


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.