Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card


log in

Advanced search

Message boards : Number crunching : Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card

Previous · 1 · 2
Author Message
Profile Michael W.F. Miles
Avatar
Send message
Joined: 24 Mar 07
Posts: 186
Credit: 25,422,366
RAC: 25,306
Canada
Message 1334746 - Posted: 4 Feb 2013, 23:23:29 UTC

I do believe you are running hot. When the gpu is running hot everything else heats up. Your intel cpu should handle it but the board may not. Also how much power is on the PSU.
I had this problem quite a bit with old drivers and under powered PSU. If it bsod in seconds then power is probably a thing to really look at.
Have you the latest drivers in as well?

Michael Miles

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4807
Credit: 71,575,443
RAC: 9,445
Australia
Message 1334787 - Posted: 5 Feb 2013, 2:25:22 UTC - in response to Message 1334703.
Last modified: 5 Feb 2013, 2:29:35 UTC

I didn't even know what that .sys file is so I googled it & came across this thread on Win7 forums:
http://www.sevenforums.com/crashes-debugging/201437-bsod-windows-7-x64-nvlddmkm-sys-dxgkrnl-sys-dxgmms1-sys.html

It's from DirectX (which is interrelated)

Along those lines I would suggest:
Uninstall driver in safe mode,
Driver sweeper,
Hard disk integrity checks,
clean latest WHQL driver install,
Update DirectX http://www.microsoft.com/en-us/download/details.aspx?id=35

& see where we go from there.

JAson

I have run 2 passes of Memtest86+ without any errors. It ran for about two and a half hours.
I have downgraded Nvidia driver to 306.97. Didn't help.
The PSU is a Corsair HX 650 watt. Should be adequate. The system draws a maximum of 330 watt.
CPU temp hovers around 80 under full load. It used to be 15 degrees hotter before I got an Akasa Nero 2 cooler.
The video card temperature never gets off the ground; the BSOD occurs about five seconds into the task, before any heating up has had time to happen.

So I guess I'm down to start shuffling the memory modules around then...?

Or getting myself a proper rig, instead of this juiced up old Dell :-)

BlueScreenView gives this info about the crash:
Technical Information:

*** STOP: 0x00000116 (0xfffffa8007653110, 0xfffffa60032e1630, 0xffffffffc000009a,
0x0000000000000004)

*** dxgkrnl.sys - Address 0xfffffa6003557ad4 base at 0xfffffa60034fc000 DateStamp
0x4d384226

____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Oddbjornik
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 67
Credit: 75,767,108
RAC: 59,914
Norway
Message 1334994 - Posted: 5 Feb 2013, 22:58:45 UTC - in response to Message 1334787.
Last modified: 5 Feb 2013, 22:59:20 UTC

Hmm... I did all that, and the cuda 4.2 build still crashes.

Cuda 3.2 actually also caused a crash in the middle of the night, but the computer restarted and then kept running as if nothing had happened. The crash signature is the same, as far as I can see;

Cuda 3.2 crash:

*** STOP: 0x00000116 (0xfffffa800a413010, 0xfffffa60034e8630, 0xffffffffc000009a,
0x0000000000000004)

*** dxgkrnl.sys - Address 0xfffffa600375ead4 base at 0xfffffa6003703000 DateStamp
0x4d384226


Cuda 4.2 crash:

*** STOP: 0x00000116 (0xfffffa80093ea4e0, 0xfffffa6003130adc, 0xffffffffc000009a,
0x0000000000000004)

*** dxgkrnl.sys - Address 0xfffffa6003305ad4 base at 0xfffffa60032aa000 DateStamp
0x4d384226


I would guess this means that the software does something that my hardware can't quite handle, and that the 4.2 software does it to a much higher degree than the 3.2 version. That seems to be in line with your general explanation of the development from 3.2 to 4.2.

And now I've gone and done a fun thing. I've bought an ASUS GeForce GTX 680 DirectCU II 4 GB that I'll replace the 560ti card with.

Unless you wish to do further research on the 560ti card for development reasons, I suggest I wait for the 680 card which should be here in a couple of days, and then take it from there. Cuda 5.0 and all...
____________

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,167,295
RAC: 331,058
Brazil
Message 1334998 - Posted: 5 Feb 2013, 23:09:23 UTC

Did you try to change the 560Ti with another GPU on another one of your hosts and look if the problems solves or changes to the other hosts? That could be an interest thing to do, at least that could clearely point the source of the problem.

Do you agree Jason with that aproach?
____________

Oddbjornik
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 67
Credit: 75,767,108
RAC: 59,914
Norway
Message 1335003 - Posted: 5 Feb 2013, 23:20:50 UTC - in response to Message 1334998.

I don't have any other hosts with room for this card, and I don't have any other cards capable of cuda 4.2, so there is nothing that can be swapped anywhere, unfortunately.
____________

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,167,295
RAC: 331,058
Brazil
Message 1335005 - Posted: 5 Feb 2013, 23:27:38 UTC - in response to Message 1335003.
Last modified: 5 Feb 2013, 23:30:27 UTC

I don't have any other hosts with room for this card, and I don't have any other cards capable of cuda 4.2, so there is nothing that can be swapped anywhere, unfortunately.

Your computer list shows you have one host with NVIDIA GeForce GTS 240 (970MB) driver: 275.33, my ideia is to switch this card with the 560TI (need to check if the driver version you actualy uses runs the 560TI) just for a test purposes and look if the problem solves on the first host and passes to the second host (could be with cuda 32 just not sure if that card runs on the GTS240, you say the cuda32 already give you a BSOD). The old trial and error test cicle.

On other hand, if you allready buy a new 680 why not wait for the arrival? that´s makes sense too.
____________

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4807
Credit: 71,575,443
RAC: 9,445
Australia
Message 1335040 - Posted: 6 Feb 2013, 2:42:16 UTC - in response to Message 1334998.
Last modified: 6 Feb 2013, 2:48:21 UTC

Did you try to change the 560Ti with another GPU on another one of your hosts and look if the problems solves or changes to the other hosts? That could be an interest thing to do, at least that could clearely point the source of the problem. (240 or 680 would use less power

Do you agree Jason with that aproach?


Definitely worth a shot if/when another card becomes available. Mind you this doesn't completely clarify the source of the issues if the issue stays with the machine or follows the card either (Different Gen card, Different power requirement etc), so it's difficult to write a particular piece of hardware off completely. (240 or 680 would use less power, & use different Driver code/hardware etc)

Since we're down into DirectX/Kernel stuff for those failures, we're down near hardware level alright (well away from the app & even Cuda Runtimes), there are not many options between 'working' & 'something broke', so hardware swappage becomes the name of the game troubleshooting/isolation-wise. (I personally don't enjoy that game ;), I'd rather something in the app was broken, told by the faulting module & address of BSOD, as for me that'd be a quick fix :D )

The "Weird things that can 'magically' clear weird issues" list grows:
- Carefully reseating everything (Card, RAM, PSU connectors)
- Try a different PSU
- Try a different PCIe Slot
- Check the card in a kind friend's machine
- Bios Updates
- SSD firmware flash updates
- Chipset overvolt, or undervolt
- Undeclocking various items, like card, PCIe, memory, (less feasible on a machine like this)

Then there's the 'Brad Approach', which I'm told involves a collection of assorted firearms, a rifle range, and ordering new computer parts. I think the idea there is to have fun with the 'trouble-shooting' process.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Hanford WA4LZC
Avatar
Send message
Joined: 15 May 99
Posts: 38
Credit: 10,129,207
RAC: 0
United States
Message 1335050 - Posted: 6 Feb 2013, 3:09:12 UTC - in response to Message 1335040.

#####Then there's the 'Brad Approach', which I'm told involves a collection of assorted firearms, a rifle range, and ordering new computer parts. I think the idea there is to have fun with the troubleshooting process.####


I used that approach on a Carter Carburetor that refused to tune..... and it was fun..... ;-)


____________

Highlander
Avatar
Send message
Joined: 5 Oct 99
Posts: 143
Credit: 30,748,474
RAC: 5,059
Germany
Message 1335128 - Posted: 6 Feb 2013, 8:56:41 UTC

I have read at a german forum, that the stop 116 failure can rarly happen with activated HPET (High Precision Event Timer) in BIOS on some boards. Solution there was to deactivate it. Perhaps you can try this one?
____________

Oddbjornik
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 67
Credit: 75,767,108
RAC: 59,914
Norway
Message 1335274 - Posted: 6 Feb 2013, 22:32:25 UTC - in response to Message 1335128.

I'm happy to let you know that my new GTX 680 card now runs three parallel Cuda 5.0 tasks on the old Dell, without complaint.

The ASUS card, i.e. the GTX 680, runs much quieter than my old EVGA 560ti card, frankly because it has inadequate cooling. It quickly reaches its Tmax of 98C, and then downclocks itself so it stays at 98C. That is not quite what I expected, but it looks like the net effect is approximately the same amount of crunching for less electricity and less noise. Not totally happy with the 98C, though.

So the 560ti card apparently had issues that the new card doesn't have. And that, I suppose, is as far as we get on this thread.

Thank you for all your help and suggestions.
____________

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4807
Credit: 71,575,443
RAC: 9,445
Australia
Message 1335282 - Posted: 6 Feb 2013, 23:00:48 UTC - in response to Message 1335274.

A hah! Nice one :D & a bit more experience under our belts :)

Yeah 98C for the 680 is certainly far warmer than their 'happy-zone' though the dynamic clocks should help keep from entering faulty operation outright. I'd definitely look at ways to get that down, namely more air in there however you can, & kicking up the fan. For comparison my own 680, when clean, runs 62-65C with stock clocks at the moment (work shortages negated the value of my regular OC), fan typically running auto around 65-70%.

If there is limited airflow, the 560ti would have been quite sensitive & less tolerant of that, so chances are it may not be defective/broken, just not as sophisticated at dealing with the borderline operation.

JAson
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,167,295
RAC: 331,058
Brazil
Message 1335290 - Posted: 6 Feb 2013, 23:20:37 UTC
Last modified: 6 Feb 2013, 23:21:28 UTC

You could use a program like EVGA Precision to keelp the fan running fast so your GPU will runs cooler. I don´t have the 680 but my 670 runs on low 70C and the 690 runs at middle 75C

You could try to use the 560TI in another host and look what happens.
____________

Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3542
Credit: 46,071,190
RAC: 30,359
United States
Message 1335323 - Posted: 7 Feb 2013, 1:09:30 UTC - in response to Message 1335290.

You could use a program like EVGA Precision to keelp the fan running fast so your GPU will runs cooler. I don´t have the 680 but my 670 runs on low 70C and the 690 runs at middle 75C

You could try to use the 560TI in another host and look what happens.


My 670 runs around 62C and I have it overclocked at the moment. The 650Ti runs around 65C and the 660 runs at 63C.

All of them I keep around 70% fan speed and I crank up the voltage to the maximum allowed.
____________

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,167,295
RAC: 331,058
Brazil
Message 1335331 - Posted: 7 Feb 2013, 1:19:32 UTC
Last modified: 7 Feb 2013, 1:19:57 UTC

I have no AC and in the middle of a tropical summer, so my temps are high than normal but certainly >90C on a 670 is a bad ideia.
____________

Oddbjornik
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 67
Credit: 75,767,108
RAC: 59,914
Norway
Message 1335436 - Posted: 7 Feb 2013, 13:10:31 UTC - in response to Message 1335331.

I drilled two 80mm holes in the bottom of the cabinet, more or less corresponding to the fans on the GPU card. I also lifted the cabinet about 20mm from the table to provide easy airflow.

After these minor changes, the GPU temperature fell to a stable 71-73C, and it runs at full speed constantly (no more underclocking). I'm actually a little surprised that the effect was so big.
____________

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,167,295
RAC: 331,058
Brazil
Message 1335448 - Posted: 7 Feb 2013, 14:34:21 UTC
Last modified: 7 Feb 2013, 14:34:32 UTC

Don´t be so surprise, heat is the enemy of the GPU, most of us who don´t use water cooler, uses a lot of fans to keep them working cooler. Specialy after x41zc, it optimizations produce a great improvement on the speed performance, but generates more heat. But sure the gain in performance worths the aditional heat.

Now you could try diferents WU at a time to obtain the best performance.
____________

Previous · 1 · 2

Message boards : Number crunching : Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card

Copyright © 2014 University of California