SETI@Home Bluescreen Crash

Questions and Answers : Windows : SETI@Home Bluescreen Crash
Message board moderation

To post messages, you must log in.

AuthorMessage
Mighty
Avatar

Send message
Joined: 15 May 99
Posts: 13
Credit: 29,993,222
RAC: 50
United States
Message 1531568 - Posted: 24 Jun 2014, 9:09:36 UTC

Win 7 Pro x64 on an Intel 4770K clocked to 3.5 GHz with 16 GB. I have an Nvidia GTX 780 with 3 GB, but I have it suspended in BOINC.

When I fire up SETI@Home it'll run for a short time, I think one to three minutes, and then Windows crashes to a blue screen. I've tried with and without the GPU. I haven't tried GPU only. I figure if it crashes on the CPU, then something is seriously not right.

I tried it when I first got the machine, in January. A couple of weeks ago, mid-June, I downloaded a fresh copy of BOINC and tried SETI@Home again. Same thing.

Other than BOINC/SETI@Home the machine is very stable. I'm a game player, so I'm pushing the hardware regularly. Including current first-person shooters and flying and driving sims.

I've searched the board, and can't find anyone else describing a bluescreen problem. Can anyone give me some ideas on what to do to troubleshoot this? I'd love to have SETI@Home cranking away on this machine.

Drake
Drake
ID: 1531568 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1531583 - Posted: 24 Jun 2014, 11:10:43 UTC - in response to Message 1531568.  

Without telling what the actual blue screen says, it's difficult to diagnose. So please download and run Blue Screen View and tell us what it says the screen told you.
ID: 1531583 · Report as offensive
Mighty
Avatar

Send message
Joined: 15 May 99
Posts: 13
Credit: 29,993,222
RAC: 50
United States
Message 1531728 - Posted: 25 Jun 2014, 1:17:53 UTC - in response to Message 1531583.  

I don't see any way to attach the reports here. I posted one from January and one from June on my website.

So, in most of the crashes from January it's Bug Check Code 0x00000124 in hal.dll. In one in January and this last one in June, same code in ntoskrnl.exe.

I've been skimming some of the Google hits on that error, and nothing is jumping out at me, so far. Any suggestions on how to narrow it down?

Thanks,

Drake
Drake
ID: 1531728 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 1533763 - Posted: 29 Jun 2014, 21:55:29 UTC
Last modified: 29 Jun 2014, 21:56:00 UTC

It might be an overheating problem. When was the last time you chased the dust bunnies out of your computer case? Is the computer overclocked?


BOINC WIKI
ID: 1533763 · Report as offensive
Mighty
Avatar

Send message
Joined: 15 May 99
Posts: 13
Credit: 29,993,222
RAC: 50
United States
Message 1533774 - Posted: 29 Jun 2014, 22:37:44 UTC - in response to Message 1533763.  

In the past, I let cleaning go to the point that the heat sink in one machine was literally 90% clogged with dust. Since then, for the last six years or so, I routinely blow out the dust in my machines monthly or bi-monthly. I have an alarm remind me. I'm pretty sure I did it at the beginning of this month. Last month at the worst.

Also, I was getting the same crashes in the first few days I owned this machine. So, that wasn't dust.

The machine is overclocked by 20%, so nothing too aggressive. I had the boutique place I bought it from do that for me. It's one of their standard up-sells.

As I said, I run current 3D games on this machine all the time, and it has been very stable. Looking at the list in BlueScreenView, in six months there are two .dmp's that are not associated with attempts to run SETI@Home.

While, to me, the games seem a fairly thorough test, maybe you can suggest another number-hungry app that I can run to test the overheating idea? Or, a CPU diagnostic that you respect? I agree that the few minute delay sounds a lot like overheating.
Drake
ID: 1533774 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 1533787 - Posted: 29 Jun 2014, 22:56:31 UTC - in response to Message 1533774.  

In the past, I let cleaning go to the point that the heat sink in one machine was literally 90% clogged with dust. Since then, for the last six years or so, I routinely blow out the dust in my machines monthly or bi-monthly. I have an alarm remind me. I'm pretty sure I did it at the beginning of this month. Last month at the worst.

Also, I was getting the same crashes in the first few days I owned this machine. So, that wasn't dust.

The machine is overclocked by 20%, so nothing too aggressive. I had the boutique place I bought it from do that for me. It's one of their standard up-sells.

As I said, I run current 3D games on this machine all the time, and it has been very stable. Looking at the list in BlueScreenView, in six months there are two .dmp's that are not associated with attempts to run SETI@Home.

While, to me, the games seem a fairly thorough test, maybe you can suggest another number-hungry app that I can run to test the overheating idea? Or, a CPU diagnostic that you respect? I agree that the few minute delay sounds a lot like overheating.

Games tend to do much shorter computations than SETI does. They also do not always use all CPUs all the time. Can you get a temperature monitoring program for your machine? There are some that are free. This will tell us if it is a temp problem.

If you want to do a load test, try consume.exe from MS


BOINC WIKI
ID: 1533787 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1533792 - Posted: 29 Jun 2014, 23:08:21 UTC

From http://www.tomshardware.co.uk/answers/id-2039223/whea-uncorrectable-error-bsod-message.html:
This is a bugcheck generated directly by the CPU. Most often it is caused because of incorrect voltage being applied to the CPU.
This can happen because of incorrect settings being applied to the BIOS (often resetting the BIOS to the defaults or updating the BIOS helps) Sometimes it indicates that a Power supply is outside of its correct voltage range. (power supply voltage regulators or mother board voltage regulator is in the process of failing.

ID: 1533792 · Report as offensive
Mighty
Avatar

Send message
Joined: 15 May 99
Posts: 13
Credit: 29,993,222
RAC: 50
United States
Message 1533812 - Posted: 29 Jun 2014, 23:44:45 UTC - in response to Message 1533792.  

I grabbed RealTemp and Prime95, and that triggered a Blue Screen. The temp jumped immediately to 100C and it died about 10 seconds later.

I'll look at that Tom's Hardware thread and my motherboard docs and see if I can tweak some settings up or down and try to stabilize this thing.

I appreciate the patient help.
Drake
ID: 1533812 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1533822 - Posted: 30 Jun 2014, 0:13:43 UTC - in response to Message 1533812.  
Last modified: 30 Jun 2014, 0:15:21 UTC

100C is never good. Sounds like either the CPU fan isn't doing its work (is it even spinning?), or that there's a problem with the thermal paste between the CPU and the heat sink.

(I'm helping you in my pauses in The Elder Scrolls Online. So if it takes me a while to answer, it's probably because trolls are slaying me. ;-))
ID: 1533822 · Report as offensive
Mighty
Avatar

Send message
Joined: 15 May 99
Posts: 13
Credit: 29,993,222
RAC: 50
United States
Message 1533872 - Posted: 30 Jun 2014, 5:39:48 UTC - in response to Message 1533822.  

All the fans are spinning. This is a liquid cooled system, so I have a clear view to the radiator to see that the radiator is clear. Obviously, I can't check the thermal paste that easily.

I've searched around, and apparently 100C is where this CPU starts throttling. It's touching 40C while I'm typing this.

I sent an email to my builder to see what they say. I expect to hear back from them tomorrow.

...Later

Just played a few hours of Far Cry 3. The temp was in the 60s most of the time, and went above 70 briefly, and only occasionally.
Drake
ID: 1533872 · Report as offensive
Marc McLean
Avatar

Send message
Joined: 8 Nov 02
Posts: 4
Credit: 39,872
RAC: 0
United Kingdom
Message 1560335 - Posted: 21 Aug 2014, 22:52:34 UTC - in response to Message 1533872.  

I suffered an overheating problem similar to this a while back on my system, after trying everything else, I found that it was my incorrect application of thermal paste that was the problem, redone it and all was good, also, you say all fans are spinning, check that your case is pulling and pushing air to and from the system correctly, might not be the cause but will help reduce temps overall
ID: 1560335 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1561023 - Posted: 23 Aug 2014, 9:45:05 UTC - in response to Message 1560335.  
Last modified: 23 Aug 2014, 9:59:22 UTC

I started yesterday, since 18 months off absense and installed the latest BOINC
64Bit on WIN 7, 64Bit.
PC: I7-2600 (HT=ON)16GB DDR 3 1600MHz. DRAM and 2x HD5870 GPU's.
PSU= 1000Watt.
I'd set the CPU to a higher TURBO Freq. 3.9 i.s.o/ 3.8GHz. and also aupped some
offset CPU voltages. This inmediatly cased not only overheat problems but real
instabillity! And lock-ups, BLUE-SCREENS and crashes.
Loading the B.I.O.S. settings for this particular CPU, caused the system to respond normally. CPU, with sock-cooling,[HT=ON] caused inmediat overheating problems, choosing 70% CPU load and a auxilliary fan took care offt that.
By the way, CoreTemp and INTEL DeskTopUtillities reported 101C for all 4 CPU's
[HT=ON], but the temp switched as fast as the core-load and caused no added
instabillity

The HD 5870 GPU's are running at ~75% off their crunching capacity did run smooth, the whole system uses 500Watt/hour and is very stable also use it for
browsing and forum posts, e-mail and other minor tasks, causing no problems
with BOINC running 'all the time' setting. I use 1 monitor on each GPU and enough 'head-room' is available the use these quite normally, although no smooth
graphics-display f.i web-cam or videos.

It took less then 8 hours to get me to post on these forums, enough work was done and validated.
ID: 1561023 · Report as offensive
Mighty
Avatar

Send message
Joined: 15 May 99
Posts: 13
Credit: 29,993,222
RAC: 50
United States
Message 1561027 - Posted: 23 Aug 2014, 10:04:31 UTC

Sorry I haven't updated. I got an email response from the boutique place that built my machine. They want me to call in to talk to a tech. Keep meaning to do it, but I get distracted. Real Soon Now.

Drake
ID: 1561027 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 1561339 - Posted: 24 Aug 2014, 1:43:27 UTC - in response to Message 1561023.  

I started yesterday, since 18 months off absense and installed the latest BOINC
64Bit on WIN 7, 64Bit.
PC: I7-2600 (HT=ON)16GB DDR 3 1600MHz. DRAM and 2x HD5870 GPU's.
PSU= 1000Watt.
I'd set the CPU to a higher TURBO Freq. 3.9 i.s.o/ 3.8GHz. and also aupped some
offset CPU voltages. This inmediatly cased not only overheat problems but real
instabillity! And lock-ups, BLUE-SCREENS and crashes.
Loading the B.I.O.S. settings for this particular CPU, caused the system to respond normally. CPU, with sock-cooling,[HT=ON] caused inmediat overheating problems, choosing 70% CPU load and a auxilliary fan took care offt that.
By the way, CoreTemp and INTEL DeskTopUtillities reported 101C for all 4 CPU's
[HT=ON], but the temp switched as fast as the core-load and caused no added
instabillity

The HD 5870 GPU's are running at ~75% off their crunching capacity did run smooth, the whole system uses 500Watt/hour and is very stable also use it for
browsing and forum posts, e-mail and other minor tasks, causing no problems
with BOINC running 'all the time' setting. I use 1 monitor on each GPU and enough 'head-room' is available the use these quite normally, although no smooth
graphics-display f.i web-cam or videos.

It took less then 8 hours to get me to post on these forums, enough work was done and validated.

100 C is too hot.

Boosting the CPU voltage can lead to instability.


BOINC WIKI
ID: 1561339 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1561372 - Posted: 24 Aug 2014, 3:07:23 UTC - in response to Message 1561023.  

Get TThrottle and don't let the CPU go over 80°C
http://efmer.com/b/

(in this case - in BOINC set 'Use 100% CPU time', TThrottle will do its own reducing of CPU time if the temperature you set (type in the box) is reached)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1561372 · Report as offensive

Questions and Answers : Windows : SETI@Home Bluescreen Crash


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.