Blue Screen? (i7 with nVidia NVS 2100M)


log in

Advanced search

Message boards : Number crunching : Blue Screen? (i7 with nVidia NVS 2100M)

Author Message
ST1100
Send message
Joined: 20 Feb 03
Posts: 7
Credit: 180,440
RAC: 0
Austria
Message 1407996 - Posted: 26 Aug 2013, 17:19:05 UTC

G'Day,

since about mid August my machine capsizes with a blue screen when S@H attempts to access the GPU...

Some details:
OS: Win7pro_64
CPU: i7 M620 2,67GHz
RAM: 8GB
GPU: nVidia NVS2100M, driver version: 9.18.13.2049 (latest, updated after first error, but no avail...)

BOINC version 7.0.64 (x64)
SETI@home v77.00 (currently cuda50 & cuda23 listed)
Einstein@home (not using GPU)

I've tried changing the energy setting for the monitor to [always ON], but no fix...

Currently I suspended all GPU processing, no fault then...

Any clues?

TIA

cheers!

____________

spitfire_mk_2
Avatar
Send message
Joined: 14 Apr 00
Posts: 455
Credit: 12,596,756
RAC: 9,461
United States
Message 1408029 - Posted: 26 Aug 2013, 18:33:20 UTC
Last modified: 26 Aug 2013, 18:34:10 UTC

I have had crashes from time to time. Every time I looked at the mini dump that windows made for the crash, it had been nVidia driver.

I use Debugging Tools for Windows to read the dump file:
[url]http://msdn.microsoft.com/en-us/windows/hardware/gg463009.aspx [/url]

This will show how to set it up:
http://www.networkworld.com/supp/2011/041811-windows-7-crashes.html
____________

MonChrMe
Send message
Joined: 9 Jun 13
Posts: 23
Credit: 113,889
RAC: 0
United Kingdom
Message 1408030 - Posted: 26 Aug 2013, 18:34:15 UTC - in response to Message 1407996.

Everyone's going to have a different answer for you; blue screens are tough to diagnose without physical access to the machine.


First thing to find out is if it's software or hardware causing the problem. Given the GPU's an M model, I assume this is a laptop we're dealing with, so swapping out the GPU won't be an option.

That leaves software diagnostics as the only available route.

First things first, let's make sure your drivers are a clean install with no version conflicts. Download the most up to date drivers available for your card, and reboot into windows safe mode. In safe mode, uninstall the video card drivers. Reboot back into safe mode, then install the downloaded drivers.

Reboot into normal mode. Once Boinc is up, reset the Seti@Home project (projects tab in the advanced view) and see if that's fixed it.


No Joy?
First, grab a freeware utility called 'Nvidia Inspector'. I believe the current version is 1.9.7.2

Once this is downloaded and extracted, start it up. There's a button at the bottom that says 'overclocking'. Press that - you'll get a warning, OK the warning to proceed.

A new page of options will open up. Drag the settings for 'Shader Clock', 'Memory Clock', and 'GPU Clock' (if it's not greyed out) down until they're one notch from the left. Do not touch the slider marked 'voltage'.

This will temporarily (until you reboot) underclock your graphics card. If the problem is cooling (eg, the heatsink becoming unseated, or dead fan), it should last much longer before blue screening, or fail to bluescreen at all. At that point you'll want to take it to a technician to reseat or replace the heatsinks.

If it crashes just as fast, then you'll have to assume the GPU is defective, which normally means replacing the entire mainboard where laptops are concerned. Often cheaper to replace the entire machine.



Not ideal, but there's not much that can cause a blue screen. Drivers, Power, and Hardware failure, in that order.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12310
Credit: 2,606,644
RAC: 1,074
Netherlands
Message 1408083 - Posted: 26 Aug 2013, 20:02:26 UTC

When you see a Blue Screen of Death (BSOD), make sure you catch what the error is that's displayed on it. Don't tell us you had a BSOD, tell us which one you had!

You can install a program such as Blue Screen View, which will show what the blue screens said.

Everyone's going to have a different answer for you;

Yeah, that's what forums do.

blue screens are tough to diagnose without physical access to the machine.

No, they aren't that difficult to diagnose, as long as you know the circumstance under which the BSOD happens, and what it says, or which EVENT ID it has, you can get quite a ways in diagnosing it for someone else.
In the utmost case when you really have tried about everything, the dump file is useful. But then you need someone who knows how to analyze that file and bring the information back to the user.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

ST1100
Send message
Joined: 20 Feb 03
Posts: 7
Credit: 180,440
RAC: 0
Austria
Message 1408097 - Posted: 26 Aug 2013, 21:14:22 UTC

Hi,

thanx for the hints so far, brought up something though.

I've 6 BSOD dumps (starting at August 10th 2013, 2233hrs) and all list crashes due the DirectX Graphics Kernel (Win OS, driver: dxgkrnl.sys, caused by address: dxgkrnl.sys+5d054)...

So it seems nVidia hardware/driver are not the (direct) culprit (I'd updated the driver after the 3rd crash to exclude comparability issues with BOINC)

Yes its a laptop, a Toshiba Tecra S11 to be precise, well nurtured, sitting safely in its portrep, rarely moved, working absolutely flawless.

I started dxdiag and didn't find any problems listed there.


____________

spitfire_mk_2
Avatar
Send message
Joined: 14 Apr 00
Posts: 455
Credit: 12,596,756
RAC: 9,461
United States
Message 1408104 - Posted: 26 Aug 2013, 21:28:02 UTC - in response to Message 1408097.

Hi,

thanx for the hints so far, brought up something though.

I've 6 BSOD dumps (starting at August 10th 2013, 2233hrs) and all list crashes due the DirectX Graphics Kernel (Win OS, driver: dxgkrnl.sys, caused by address: dxgkrnl.sys+5d054)...

So it seems nVidia hardware/driver are not the (direct) culprit (I'd updated the driver after the 3rd crash to exclude comparability issues with BOINC)

Yes its a laptop, a Toshiba Tecra S11 to be precise, well nurtured, sitting safely in its portrep, rarely moved, working absolutely flawless.

I started dxdiag and didn't find any problems listed there.


I found this thread interesting: http://www.sevenforums.com/bsod-help-support/201437-bsod-windows-7-x64-nvlddmkm-sys-dxgkrnl-sys-dxgmms1-sys.html

____________

ST1100
Send message
Joined: 20 Feb 03
Posts: 7
Credit: 180,440
RAC: 0
Austria
Message 1409488 - Posted: 29 Aug 2013, 19:20:48 UTC

Well, did a full hardware test (waist of time, hardware in perfect shape, 0 errors, hardly exceeds 80°C...GPU not overclocked, CPUs only 17% overclocked)
Removed all on DirectX and nVidia software/drivers, re-installed graphic accelerator drivers, purged the registry, installed/updated the DirectX again and set the GPU usage in BOINC back to defaults...

The machine did ~2 hours of crunching yesterday evening, cuda50 already >50%, no problems yet... :-)

Thanx for the efforts so far
____________

ST1100
Send message
Joined: 20 Feb 03
Posts: 7
Credit: 180,440
RAC: 0
Austria
Message 1410862 - Posted: 2 Sep 2013, 20:44:07 UTC

Well, S@H is working properly now, but since end of August every E@H WU is failing out:

http://einstein.phys.uwm.edu/results.php?hostid=7703197&offset=20&show_names=1&state=0&appid=0

Stderr output
<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741511 (0xc0000139)
</message>
]]>

So how to fix this now?

cheers!
____________

spitfire_mk_2
Avatar
Send message
Joined: 14 Apr 00
Posts: 455
Credit: 12,596,756
RAC: 9,461
United States
Message 1410869 - Posted: 2 Sep 2013, 21:10:22 UTC - in response to Message 1410862.

Well, S@H is working properly now, but since end of August every E@H WU is failing out:

http://einstein.phys.uwm.edu/results.php?hostid=7703197&offset=20&show_names=1&state=0&appid=0

Stderr output
<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741511 (0xc0000139)
</message>
]]>

So how to fix this now?

cheers!

I don't know the answer, but have you tried to reattach to E@H?
____________

ST1100
Send message
Joined: 20 Feb 03
Posts: 7
Credit: 180,440
RAC: 0
Austria
Message 1411506 - Posted: 4 Sep 2013, 20:56:46 UTC

By now I've 'upgraded' the GPU driver back to v188.22 (I did try the OEM driver from 2010, but with that all E@H (and some S@H) WUs did fail with calculation error; astonishing as there is no info displayed that E@H even utilizes GPU... only S@H indicates usage like [0,00462 CPUs + 1 nVidia GPU]...), still observing now if this has fixed the computation errors...


____________

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7096
Credit: 95,204,526
RAC: 73,679
Australia
Message 1411514 - Posted: 4 Sep 2013, 21:12:26 UTC - in response to Message 1411506.

What temps are you getting and when was the dust last cleaned out of it?

Cheers.

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 358,718
RAC: 34
Germany
Message 1411534 - Posted: 4 Sep 2013, 21:52:08 UTC - in response to Message 1411506.

...astonishing as there is no info displayed that E@H even utilizes GPU...

You are running (or better trashing ;-) at Einstein:
Binary Radio Pulsar Search (Arecibo, GPU) v1.39 (BRP4G-cuda32-nv301) Binary Radio Pulsar Search (Perseus Arm Survey) v1.39 (BRP5-cuda32-nv301)

That looks pretty much like CUDA ;-)

Gruß
Gundolf

ST1100
Send message
Joined: 20 Feb 03
Posts: 7
Credit: 180,440
RAC: 0
Austria
Message 1412854 - Posted: 8 Sep 2013, 10:22:41 UTC

Yah, seems those E@H WUs caused the havoc... ;-)

After experiments with various driver editions (like I'd have that much free time at hand...) things seems to have stabilized... nVidia predecessor v32049 + profile [highest performance] appears functional... for now...
Will have to keep an eye on things for a while though, till the first WUs get validated...

Joining E@H was only meant temporary, still waiting for NEO@H, IMHO at more immediate vicinity :-)

cheers!
____________

ST1100
Send message
Joined: 20 Feb 03
Posts: 7
Credit: 180,440
RAC: 0
Austria
Message 1413445 - Posted: 9 Sep 2013, 21:29:35 UTC

Well, another BSOD occurred...

So effective immediately all GPU usage is suspended...
All currently loaded WUs will time out...

Sorry, I've given up.



____________

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 358,718
RAC: 34
Germany
Message 1413501 - Posted: 9 Sep 2013, 23:05:01 UTC - in response to Message 1413445.

All currently loaded WUs will time out...

Why don't you just abort and report them?

Gruß
Gundolf

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4101
Credit: 33,140,428
RAC: 8,634
United Kingdom
Message 1413670 - Posted: 10 Sep 2013, 9:57:52 UTC - in response to Message 1413445.

Well, another BSOD occurred...

So effective immediately all GPU usage is suspended...
All currently loaded WUs will time out...

Sorry, I've given up.



Doesn't mean you can't complete your CPU Wu's, you only had one GPU Wu here.

If that goes well, just untick 'Use NVIDIA GPU' in your different project preferences (ie Seti and Einstein), and you'll never receive Nvidia GPU Wu's again.

Claggy

Message boards : Number crunching : Blue Screen? (i7 with nVidia NVS 2100M)

Copyright © 2014 University of California