Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card

Message boards : Number crunching : Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Michael W.F. Miles
Avatar

Send message
Joined: 24 Mar 07
Posts: 268
Credit: 34,410,870
RAC: 0
Canada
Message 1334746 - Posted: 4 Feb 2013, 23:23:29 UTC

I do believe you are running hot. When the gpu is running hot everything else heats up. Your intel cpu should handle it but the board may not. Also how much power is on the PSU.
I had this problem quite a bit with old drivers and under powered PSU. If it bsod in seconds then power is probably a thing to really look at.
Have you the latest drivers in as well?

Michael Miles
ID: 1334746 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1334787 - Posted: 5 Feb 2013, 2:25:22 UTC - in response to Message 1334703.  
Last modified: 5 Feb 2013, 2:29:35 UTC

I didn't even know what that .sys file is so I googled it & came across this thread on Win7 forums:
http://www.sevenforums.com/crashes-debugging/201437-bsod-windows-7-x64-nvlddmkm-sys-dxgkrnl-sys-dxgmms1-sys.html

It's from DirectX (which is interrelated)

Along those lines I would suggest:
Uninstall driver in safe mode,
Driver sweeper,
Hard disk integrity checks,
clean latest WHQL driver install,
Update DirectX http://www.microsoft.com/en-us/download/details.aspx?id=35

& see where we go from there.

JAson

I have run 2 passes of Memtest86+ without any errors. It ran for about two and a half hours.
I have downgraded Nvidia driver to 306.97. Didn't help.
The PSU is a Corsair HX 650 watt. Should be adequate. The system draws a maximum of 330 watt.
CPU temp hovers around 80 under full load. It used to be 15 degrees hotter before I got an Akasa Nero 2 cooler.
The video card temperature never gets off the ground; the BSOD occurs about five seconds into the task, before any heating up has had time to happen.

So I guess I'm down to start shuffling the memory modules around then...?

Or getting myself a proper rig, instead of this juiced up old Dell :-)

BlueScreenView gives this info about the crash:
Technical Information:

*** STOP: 0x00000116 (0xfffffa8007653110, 0xfffffa60032e1630, 0xffffffffc000009a, 
0x0000000000000004)

*** dxgkrnl.sys - Address 0xfffffa6003557ad4 base at 0xfffffa60034fc000 DateStamp 
0x4d384226

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1334787 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1334994 - Posted: 5 Feb 2013, 22:58:45 UTC - in response to Message 1334787.  
Last modified: 5 Feb 2013, 22:59:20 UTC

Hmm... I did all that, and the cuda 4.2 build still crashes.

Cuda 3.2 actually also caused a crash in the middle of the night, but the computer restarted and then kept running as if nothing had happened. The crash signature is the same, as far as I can see;

Cuda 3.2 crash:

*** STOP: 0x00000116 (0xfffffa800a413010, 0xfffffa60034e8630, 0xffffffffc000009a, 
0x0000000000000004)

*** dxgkrnl.sys - Address 0xfffffa600375ead4 base at 0xfffffa6003703000 DateStamp 
0x4d384226


Cuda 4.2 crash:

*** STOP: 0x00000116 (0xfffffa80093ea4e0, 0xfffffa6003130adc, 0xffffffffc000009a, 
0x0000000000000004)

*** dxgkrnl.sys - Address 0xfffffa6003305ad4 base at 0xfffffa60032aa000 DateStamp 
0x4d384226


I would guess this means that the software does something that my hardware can't quite handle, and that the 4.2 software does it to a much higher degree than the 3.2 version. That seems to be in line with your general explanation of the development from 3.2 to 4.2.

And now I've gone and done a fun thing. I've bought an ASUS GeForce GTX 680 DirectCU II 4 GB that I'll replace the 560ti card with.

Unless you wish to do further research on the 560ti card for development reasons, I suggest I wait for the 680 card which should be here in a couple of days, and then take it from there. Cuda 5.0 and all...
ID: 1334994 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1334998 - Posted: 5 Feb 2013, 23:09:23 UTC

Did you try to change the 560Ti with another GPU on another one of your hosts and look if the problems solves or changes to the other hosts? That could be an interest thing to do, at least that could clearely point the source of the problem.

Do you agree Jason with that aproach?
ID: 1334998 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1335003 - Posted: 5 Feb 2013, 23:20:50 UTC - in response to Message 1334998.  

I don't have any other hosts with room for this card, and I don't have any other cards capable of cuda 4.2, so there is nothing that can be swapped anywhere, unfortunately.
ID: 1335003 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1335005 - Posted: 5 Feb 2013, 23:27:38 UTC - in response to Message 1335003.  
Last modified: 5 Feb 2013, 23:30:27 UTC

I don't have any other hosts with room for this card, and I don't have any other cards capable of cuda 4.2, so there is nothing that can be swapped anywhere, unfortunately.

Your computer list shows you have one host with NVIDIA GeForce GTS 240 (970MB) driver: 275.33, my ideia is to switch this card with the 560TI (need to check if the driver version you actualy uses runs the 560TI) just for a test purposes and look if the problem solves on the first host and passes to the second host (could be with cuda 32 just not sure if that card runs on the GTS240, you say the cuda32 already give you a BSOD). The old trial and error test cicle.

On other hand, if you allready buy a new 680 why not wait for the arrival? that´s makes sense too.
ID: 1335005 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1335040 - Posted: 6 Feb 2013, 2:42:16 UTC - in response to Message 1334998.  
Last modified: 6 Feb 2013, 2:48:21 UTC

Did you try to change the 560Ti with another GPU on another one of your hosts and look if the problems solves or changes to the other hosts? That could be an interest thing to do, at least that could clearely point the source of the problem. (240 or 680 would use less power

Do you agree Jason with that aproach?


Definitely worth a shot if/when another card becomes available. Mind you this doesn't completely clarify the source of the issues if the issue stays with the machine or follows the card either (Different Gen card, Different power requirement etc), so it's difficult to write a particular piece of hardware off completely. (240 or 680 would use less power, & use different Driver code/hardware etc)

Since we're down into DirectX/Kernel stuff for those failures, we're down near hardware level alright (well away from the app & even Cuda Runtimes), there are not many options between 'working' & 'something broke', so hardware swappage becomes the name of the game troubleshooting/isolation-wise. (I personally don't enjoy that game ;), I'd rather something in the app was broken, told by the faulting module & address of BSOD, as for me that'd be a quick fix :D )

The "Weird things that can 'magically' clear weird issues" list grows:
- Carefully reseating everything (Card, RAM, PSU connectors)
- Try a different PSU
- Try a different PCIe Slot
- Check the card in a kind friend's machine
- Bios Updates
- SSD firmware flash updates
- Chipset overvolt, or undervolt
- Undeclocking various items, like card, PCIe, memory, (less feasible on a machine like this)

Then there's the 'Brad Approach', which I'm told involves a collection of assorted firearms, a rifle range, and ordering new computer parts. I think the idea there is to have fun with the 'trouble-shooting' process.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1335040 · Report as offensive
Hanford WA4LZC
Avatar

Send message
Joined: 15 May 99
Posts: 38
Credit: 10,129,207
RAC: 0
United States
Message 1335050 - Posted: 6 Feb 2013, 3:09:12 UTC - in response to Message 1335040.  

#####Then there's the 'Brad Approach', which I'm told involves a collection of assorted firearms, a rifle range, and ordering new computer parts. I think the idea there is to have fun with the troubleshooting process.####


I used that approach on a Carter Carburetor that refused to tune..... and it was fun..... ;-)


ID: 1335050 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1335128 - Posted: 6 Feb 2013, 8:56:41 UTC

I have read at a german forum, that the stop 116 failure can rarly happen with activated HPET (High Precision Event Timer) in BIOS on some boards. Solution there was to deactivate it. Perhaps you can try this one?
- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1335128 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1335274 - Posted: 6 Feb 2013, 22:32:25 UTC - in response to Message 1335128.  

I'm happy to let you know that my new GTX 680 card now runs three parallel Cuda 5.0 tasks on the old Dell, without complaint.

The ASUS card, i.e. the GTX 680, runs much quieter than my old EVGA 560ti card, frankly because it has inadequate cooling. It quickly reaches its Tmax of 98C, and then downclocks itself so it stays at 98C. That is not quite what I expected, but it looks like the net effect is approximately the same amount of crunching for less electricity and less noise. Not totally happy with the 98C, though.

So the 560ti card apparently had issues that the new card doesn't have. And that, I suppose, is as far as we get on this thread.

Thank you for all your help and suggestions.
ID: 1335274 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1335282 - Posted: 6 Feb 2013, 23:00:48 UTC - in response to Message 1335274.  

A hah! Nice one :D & a bit more experience under our belts :)

Yeah 98C for the 680 is certainly far warmer than their 'happy-zone' though the dynamic clocks should help keep from entering faulty operation outright. I'd definitely look at ways to get that down, namely more air in there however you can, & kicking up the fan. For comparison my own 680, when clean, runs 62-65C with stock clocks at the moment (work shortages negated the value of my regular OC), fan typically running auto around 65-70%.

If there is limited airflow, the 560ti would have been quite sensitive & less tolerant of that, so chances are it may not be defective/broken, just not as sophisticated at dealing with the borderline operation.

JAson
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1335282 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1335290 - Posted: 6 Feb 2013, 23:20:37 UTC
Last modified: 6 Feb 2013, 23:21:28 UTC

You could use a program like EVGA Precision to keelp the fan running fast so your GPU will runs cooler. I don´t have the 680 but my 670 runs on low 70C and the 690 runs at middle 75C

You could try to use the 560TI in another host and look what happens.
ID: 1335290 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1335323 - Posted: 7 Feb 2013, 1:09:30 UTC - in response to Message 1335290.  

You could use a program like EVGA Precision to keelp the fan running fast so your GPU will runs cooler. I don´t have the 680 but my 670 runs on low 70C and the 690 runs at middle 75C

You could try to use the 560TI in another host and look what happens.


My 670 runs around 62C and I have it overclocked at the moment. The 650Ti runs around 65C and the 660 runs at 63C.

All of them I keep around 70% fan speed and I crank up the voltage to the maximum allowed.

ID: 1335323 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1335331 - Posted: 7 Feb 2013, 1:19:32 UTC
Last modified: 7 Feb 2013, 1:19:57 UTC

I have no AC and in the middle of a tropical summer, so my temps are high than normal but certainly >90C on a 670 is a bad ideia.
ID: 1335331 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1335436 - Posted: 7 Feb 2013, 13:10:31 UTC - in response to Message 1335331.  

I drilled two 80mm holes in the bottom of the cabinet, more or less corresponding to the fans on the GPU card. I also lifted the cabinet about 20mm from the table to provide easy airflow.

After these minor changes, the GPU temperature fell to a stable 71-73C, and it runs at full speed constantly (no more underclocking). I'm actually a little surprised that the effect was so big.
ID: 1335436 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1335448 - Posted: 7 Feb 2013, 14:34:21 UTC
Last modified: 7 Feb 2013, 14:34:32 UTC

Don´t be so surprise, heat is the enemy of the GPU, most of us who don´t use water cooler, uses a lot of fans to keep them working cooler. Specialy after x41zc, it optimizations produce a great improvement on the speed performance, but generates more heat. But sure the gain in performance worths the aditional heat.

Now you could try diferents WU at a time to obtain the best performance.
ID: 1335448 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.