Message boards :
Number crunching :
Blue Screen? (i7 with nVidia NVS 2100M)
Message board moderation
Author | Message |
---|---|
ST1100 Send message Joined: 20 Feb 03 Posts: 7 Credit: 180,440 RAC: 0 |
G'Day, since about mid August my machine capsizes with a blue screen when S@H attempts to access the GPU... Some details: OS: Win7pro_64 CPU: i7 M620 2,67GHz RAM: 8GB GPU: nVidia NVS2100M, driver version: 9.18.13.2049 (latest, updated after first error, but no avail...) BOINC version 7.0.64 (x64) SETI@home v77.00 (currently cuda50 & cuda23 listed) Einstein@home (not using GPU) I've tried changing the energy setting for the monitor to [always ON], but no fix... Currently I suspended all GPU processing, no fault then... Any clues? TIA cheers! |
spitfire_mk_2 Send message Joined: 14 Apr 00 Posts: 563 Credit: 27,306,885 RAC: 0 |
I have had crashes from time to time. Every time I looked at the mini dump that windows made for the crash, it had been nVidia driver. I use Debugging Tools for Windows to read the dump file: [url]http://msdn.microsoft.com/en-us/windows/hardware/gg463009.aspx [/url] This will show how to set it up: http://www.networkworld.com/supp/2011/041811-windows-7-crashes.html |
MonChrMe Send message Joined: 9 Jun 13 Posts: 23 Credit: 113,889 RAC: 0 |
Everyone's going to have a different answer for you; blue screens are tough to diagnose without physical access to the machine. First thing to find out is if it's software or hardware causing the problem. Given the GPU's an M model, I assume this is a laptop we're dealing with, so swapping out the GPU won't be an option. That leaves software diagnostics as the only available route. First things first, let's make sure your drivers are a clean install with no version conflicts. Download the most up to date drivers available for your card, and reboot into windows safe mode. In safe mode, uninstall the video card drivers. Reboot back into safe mode, then install the downloaded drivers. Reboot into normal mode. Once Boinc is up, reset the Seti@Home project (projects tab in the advanced view) and see if that's fixed it. No Joy? First, grab a freeware utility called 'Nvidia Inspector'. I believe the current version is 1.9.7.2 Once this is downloaded and extracted, start it up. There's a button at the bottom that says 'overclocking'. Press that - you'll get a warning, OK the warning to proceed. A new page of options will open up. Drag the settings for 'Shader Clock', 'Memory Clock', and 'GPU Clock' (if it's not greyed out) down until they're one notch from the left. Do not touch the slider marked 'voltage'. This will temporarily (until you reboot) underclock your graphics card. If the problem is cooling (eg, the heatsink becoming unseated, or dead fan), it should last much longer before blue screening, or fail to bluescreen at all. At that point you'll want to take it to a technician to reseat or replace the heatsinks. If it crashes just as fast, then you'll have to assume the GPU is defective, which normally means replacing the entire mainboard where laptops are concerned. Often cheaper to replace the entire machine. Not ideal, but there's not much that can cause a blue screen. Drivers, Power, and Hardware failure, in that order. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
When you see a Blue Screen of Death (BSOD), make sure you catch what the error is that's displayed on it. Don't tell us you had a BSOD, tell us which one you had! You can install a program such as Blue Screen View, which will show what the blue screens said. Everyone's going to have a different answer for you; Yeah, that's what forums do. blue screens are tough to diagnose without physical access to the machine. No, they aren't that difficult to diagnose, as long as you know the circumstance under which the BSOD happens, and what it says, or which EVENT ID it has, you can get quite a ways in diagnosing it for someone else. In the utmost case when you really have tried about everything, the dump file is useful. But then you need someone who knows how to analyze that file and bring the information back to the user. |
ST1100 Send message Joined: 20 Feb 03 Posts: 7 Credit: 180,440 RAC: 0 |
Hi, thanx for the hints so far, brought up something though. I've 6 BSOD dumps (starting at August 10th 2013, 2233hrs) and all list crashes due the DirectX Graphics Kernel (Win OS, driver: dxgkrnl.sys, caused by address: dxgkrnl.sys+5d054)... So it seems nVidia hardware/driver are not the (direct) culprit (I'd updated the driver after the 3rd crash to exclude comparability issues with BOINC) Yes its a laptop, a Toshiba Tecra S11 to be precise, well nurtured, sitting safely in its portrep, rarely moved, working absolutely flawless. I started dxdiag and didn't find any problems listed there. |
spitfire_mk_2 Send message Joined: 14 Apr 00 Posts: 563 Credit: 27,306,885 RAC: 0 |
Hi, I found this thread interesting: http://www.sevenforums.com/bsod-help-support/201437-bsod-windows-7-x64-nvlddmkm-sys-dxgkrnl-sys-dxgmms1-sys.html |
ST1100 Send message Joined: 20 Feb 03 Posts: 7 Credit: 180,440 RAC: 0 |
Well, did a full hardware test (waist of time, hardware in perfect shape, 0 errors, hardly exceeds 80°C...GPU not overclocked, CPUs only 17% overclocked) Removed all on DirectX and nVidia software/drivers, re-installed graphic accelerator drivers, purged the registry, installed/updated the DirectX again and set the GPU usage in BOINC back to defaults... The machine did ~2 hours of crunching yesterday evening, cuda50 already >50%, no problems yet... :-) Thanx for the efforts so far |
ST1100 Send message Joined: 20 Feb 03 Posts: 7 Credit: 180,440 RAC: 0 |
Well, S@H is working properly now, but since end of August every E@H WU is failing out: http://einstein.phys.uwm.edu/results.php?hostid=7703197&offset=20&show_names=1&state=0&appid=0 Stderr output <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1073741511 (0xc0000139) </message> ]]> So how to fix this now? cheers! |
spitfire_mk_2 Send message Joined: 14 Apr 00 Posts: 563 Credit: 27,306,885 RAC: 0 |
Well, S@H is working properly now, but since end of August every E@H WU is failing out: I don't know the answer, but have you tried to reattach to E@H? |
ST1100 Send message Joined: 20 Feb 03 Posts: 7 Credit: 180,440 RAC: 0 |
By now I've 'upgraded' the GPU driver back to v188.22 (I did try the OEM driver from 2010, but with that all E@H (and some S@H) WUs did fail with calculation error; astonishing as there is no info displayed that E@H even utilizes GPU... only S@H indicates usage like [0,00462 CPUs + 1 nVidia GPU]...), still observing now if this has fixed the computation errors... |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
What temps are you getting and when was the dust last cleaned out of it? Cheers. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
...astonishing as there is no info displayed that E@H even utilizes GPU... You are running (or better trashing ;-) at Einstein: Binary Radio Pulsar Search (Arecibo, GPU) v1.39 (BRP4G-cuda32-nv301) Binary Radio Pulsar Search (Perseus Arm Survey) v1.39 (BRP5-cuda32-nv301) That looks pretty much like CUDA ;-) Gruß Gundolf |
ST1100 Send message Joined: 20 Feb 03 Posts: 7 Credit: 180,440 RAC: 0 |
Yah, seems those E@H WUs caused the havoc... ;-) After experiments with various driver editions (like I'd have that much free time at hand...) things seems to have stabilized... nVidia predecessor v32049 + profile [highest performance] appears functional... for now... Will have to keep an eye on things for a while though, till the first WUs get validated... Joining E@H was only meant temporary, still waiting for NEO@H, IMHO at more immediate vicinity :-) cheers! |
ST1100 Send message Joined: 20 Feb 03 Posts: 7 Credit: 180,440 RAC: 0 |
Well, another BSOD occurred... So effective immediately all GPU usage is suspended... All currently loaded WUs will time out... Sorry, I've given up. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
All currently loaded WUs will time out... Why don't you just abort and report them? Gruß Gundolf |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Well, another BSOD occurred... Doesn't mean you can't complete your CPU Wu's, you only had one GPU Wu here. If that goes well, just untick 'Use NVIDIA GPU' in your different project preferences (ie Seti and Einstein), and you'll never receive Nvidia GPU Wu's again. Claggy |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.