Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card

Message boards : Number crunching : Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1334256 - Posted: 3 Feb 2013, 13:35:48 UTC

I have a 560ti 448 card. It runs nicely with the x41g lunatics application from the latest installer package.

Now I thought I'd try to upgrade to the x41zc version for cuda 4.2, as recommended for fermi-class cards.

But this upgrade caused repeated, immediate bluescreens.

The tasks (two parallel tasks) start, run for a couple of seconds, and then the screen goes blank. Check task result id 2820934848 for the task log. This task starts with version 3.2, then I upgrade to 4.2 and it restarts twice, and finally I revert to 3.2 and the task runs to completion. Nothing in the log that tells me anything...

Driver version 310.90, Windows Vista.
ID: 1334256 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1334262 - Posted: 3 Feb 2013, 14:00:23 UTC - in response to Message 1334256.  

Have you tried the Cuda32 version of x41zc?

Claggy
ID: 1334262 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1334266 - Posted: 3 Feb 2013, 14:09:21 UTC - in response to Message 1334262.  

No, I haven't, but I'm starting to suspect that the problem might be that I restart the already started cuda tasks with new versions of the program and dlls.

I will let the existing tasks finish, then try the Cuda32 version of x41zc on freshly downloaded tasks, and if that works, the Cuda42 version also on freshly downloaded tasks.
ID: 1334266 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1334277 - Posted: 3 Feb 2013, 14:53:11 UTC

Try the Claggy sugestion first, i have 8 hosts running x41zc cuda5, in one just one (why i´m still working with Jasons to find) simply the only build that works is cuda32 on the others 7 all works fine (1 have 3 with the exactly MB/GPU, etc. of the one that fails, so for now we think is not the hardware itself)). Is not BSOD just driver crash, but i use a diferent GPU than you use.

At least in my case the cuda task could be started with any of the versions of the program with no problem (besides the host who only can run cuda32 tasks), so you don´t need to wait to try, just remember to correct copy all files and and do the aimerge cmd.


Another think you could check is the temperature of the GPU, x41zc is far superior in performance than x41g, so it´s generate more heat, anything could happens if your GPU overheat.
ID: 1334277 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1334278 - Posted: 3 Feb 2013, 14:57:00 UTC
Last modified: 3 Feb 2013, 14:57:55 UTC

On the chance that the Cuda 3.2 build appears to process fine under the same conditions, while 4.2 & 5.0 are unstable on the machine, there are a few possibilities to look over:

The usual power & thermal issues should be looked at first & eliminated as possibilities. While when working, the performance of the 4.2 app on that card would be some non-zero amount faster, it'll be placing some possibly marginal piece of hardware, firmware or software under different stress loadings (even though at the application end, the code is identical)

With Fermi upwards, the newer Cuda revisions use more motherboard resources, namely DMA engines and a lot of helper-threading, which can expose weaknesses in BIOS settings or other system configuration that didn't matter before. Things to check here would be system RAM timing & voltage specs. In more than a few reported cases, for example, enthusiast grade RAM is picked up by motherboard auto settings as being capable of a command rate of 1 cycle. In most cases that's unrealistic for the motherboard or memory controller even if the RAM itself can handle it.

Another RAM related example is that while correct RAM timings may be in effect for the system RAM & controller, the RAM was automatically given increased voltage as spec dictates, but the memory controller didn't receive its additional voltage ( for the Core i7 example, the controller voltage should be around 75%-80% of the RAM voltage, but the defaults are often on the low side around 1.05-1.15 volts, too low for 1.6V spec RAM, leading to excess current sinking into the CPU. In the case of 1.6V RAM it should be 1.2-1.28V )

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1334278 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1334380 - Posted: 3 Feb 2013, 19:35:39 UTC

The cuda 3.2 build of x41zc runs without problems.

I'm not much into memory timings and voltages, and I don't know how to adjust any of those settings on this system.

I'll give the cuda 4.2 build another chance. If it still fails, I'd be happy to run any logging/checking/whatever might be needed to find out what the problem is.
ID: 1334380 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1334382 - Posted: 3 Feb 2013, 19:39:57 UTC - in response to Message 1334380.  
Last modified: 3 Feb 2013, 19:44:59 UTC

The cuda 3.2 build of x41zc runs without problems.

I'm not much into memory timings and voltages, and I don't know how to adjust any of those settings on this system.

I'll give the cuda 4.2 build another chance. If it still fails, I'd be happy to run any logging/checking/whatever might be needed to find out what the problem is.

If 3.2 runs and others no that point to the memory timmings/voltages pointed by Jason´s, to look that the easy way to look is by the CPU-z program, if you could post the screen so Jasons could give you a hand (he is in Australia so it´s expected he is sleeping now). BTU what is your MB?
ID: 1334382 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1334383 - Posted: 3 Feb 2013, 19:44:39 UTC - in response to Message 1334380.  

...and the cuda 4.2 build bluescreened within seconds.
ID: 1334383 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1334385 - Posted: 3 Feb 2013, 19:50:07 UTC - in response to Message 1334383.  
Last modified: 3 Feb 2013, 19:50:27 UTC

...and the cuda 4.2 build bluescreened within seconds.

Post your memory timings, volts and don´t forget the MB model.
ID: 1334385 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1334400 - Posted: 3 Feb 2013, 20:12:49 UTC - in response to Message 1334385.  

Screendumps from CPU-Z:



ID: 1334400 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1334406 - Posted: 3 Feb 2013, 20:33:07 UTC - in response to Message 1334400.  

Jason´s would give you a better help, but until then, look your 2 screen Command Rate is at 1T, few MB work with that, in my MB at least no, Jasons instructs to change to 2T (they are changed in the Bios settings). Try you have nothing to loose, our you could wait Jason´s return.
ID: 1334406 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1334409 - Posted: 3 Feb 2013, 20:51:19 UTC - in response to Message 1334406.  

There are no memory timing settings in this BIOS, as far as I can see. It's just an old Dell computer...
ID: 1334409 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1334410 - Posted: 3 Feb 2013, 21:01:07 UTC - in response to Message 1334409.  

There are no memory timing settings in this BIOS, as far as I can see. It's just an old Dell computer...

Dell probably has the bios all locked down so there is little tinkering you are able to do with it.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1334410 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1334411 - Posted: 3 Feb 2013, 21:07:33 UTC

My sugestion, keep running 32 until Jason give you a clue, he is our "master guru", if there anyone that could help you is him.
ID: 1334411 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1334448 - Posted: 3 Feb 2013, 22:12:38 UTC
Last modified: 3 Feb 2013, 22:37:33 UTC

These are just two very slow running 4 GB sticks(default specs are 1600/9 as can be seen from model number and rightmost SPD column). 1T is just fine. tRFC is tight for 4GB sticks. 110-130 are usual values. I would start from 130 and try to tighten it.
Memory controller is not even strained, it runs in dual channel mode only.
Before making any assumption and mess up with timings, memory must be tested. Of course, there could be faulty module, but their settings must be the least concern. Also, checking crash dumps will reveal which faulting module(driver, generally called kernel module, not memory module) is.
For s.1366 systems delta between memory controller voltage and memory module voltage must be within 0.5V. For 1.65 V DRAM voltage, VTT voltage(which is memory controller voltage) must be at least 1.15, which is default. I had several 1366 boards and did heluva things to them, back then, when they were fancy, and never saw a MoBo which gives less than 1.15 V to memory controller. Instead, they tend to increase it silently to ensure stability. They were AsRock X58 Extreme6, Foxconn Bloodrage GTI and now I still use Gigabyte X58A-UD7(the mother of motherboards for X58 platform), running i7-920 at 4.2 GHz 24/7 on air.
ID: 1334448 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1334471 - Posted: 4 Feb 2013, 0:45:55 UTC
Last modified: 4 Feb 2013, 0:58:24 UTC

Yep definitely Memtest86+ for starters. With the RAM running that slow 1T can be fine, and as hbomber describes.

Aside from the memory integrity itself, My main concern with settings that conservative is that in the later Cuda versions as mentioned there is much more use of the DMA engines (on the card) and System memory.

As there is often limited or no control available in BIOS for these machines, some conservative auto setting is probably normal, but given the shown specs, it looks more like some kindof 'fail-safe' mode for the memory.

I've used this kind of machine in the past at work. Reseating / swapping the memory modules out & resetting the BIOS worked wonders in that particular case.

I'd also have a closer look at the PSU & cooling, just in case there's something obvious there.

JAson
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1334471 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1334703 - Posted: 4 Feb 2013, 20:45:01 UTC - in response to Message 1334471.  
Last modified: 4 Feb 2013, 20:47:41 UTC

I have run 2 passes of Memtest86+ without any errors. It ran for about two and a half hours.
I have downgraded Nvidia driver to 306.97. Didn't help.
The PSU is a Corsair HX 650 watt. Should be adequate. The system draws a maximum of 330 watt.
CPU temp hovers around 80 under full load. It used to be 15 degrees hotter before I got an Akasa Nero 2 cooler.
The video card temperature never gets off the ground; the BSOD occurs about five seconds into the task, before any heating up has had time to happen.

So I guess I'm down to start shuffling the memory modules around then...?

Or getting myself a proper rig, instead of this juiced up old Dell :-)

BlueScreenView gives this info about the crash:
Technical Information:

*** STOP: 0x00000116 (0xfffffa8007653110, 0xfffffa60032e1630, 0xffffffffc000009a, 
0x0000000000000004)

*** dxgkrnl.sys - Address 0xfffffa6003557ad4 base at 0xfffffa60034fc000 DateStamp 
0x4d384226

ID: 1334703 · Report as offensive
Mike Davis
Volunteer tester

Send message
Joined: 17 May 99
Posts: 240
Credit: 5,402,361
RAC: 0
Isle of Man
Message 1334710 - Posted: 4 Feb 2013, 21:30:35 UTC

Out of interest, is that CPU stock clocked or have you got it overclocked? I know its below max so not terrible temperature wise, but i had a nero (probably 1st revision - it was a while back) on a 965 and that only used to hit 70C max
ID: 1334710 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1334724 - Posted: 4 Feb 2013, 21:56:31 UTC - in response to Message 1334710.  

There's no overclocking.
The cooler fan runs slow and quiet, so I guess i could lower the temp by cranking up the fan speed.
But I like it quiet, and 80C seems to work ok.
ID: 1334724 · Report as offensive
Tom*

Send message
Joined: 12 Aug 11
Posts: 127
Credit: 20,769,223
RAC: 9
United States
Message 1334739 - Posted: 4 Feb 2013, 23:08:58 UTC - in response to Message 1334724.  
Last modified: 4 Feb 2013, 23:17:05 UTC

Some 560TI' came slightly overclocked from the factory

even though they have no time to heat up try slowing your 560 down

Bill
ID: 1334739 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.