Message boards :
Number crunching :
Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card
Message board moderation
Author | Message |
---|---|
Oddbjornik Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 |
I have a 560ti 448 card. It runs nicely with the x41g lunatics application from the latest installer package. Now I thought I'd try to upgrade to the x41zc version for cuda 4.2, as recommended for fermi-class cards. But this upgrade caused repeated, immediate bluescreens. The tasks (two parallel tasks) start, run for a couple of seconds, and then the screen goes blank. Check task result id 2820934848 for the task log. This task starts with version 3.2, then I upgrade to 4.2 and it restarts twice, and finally I revert to 3.2 and the task runs to completion. Nothing in the log that tells me anything... Driver version 310.90, Windows Vista. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Have you tried the Cuda32 version of x41zc? Claggy |
Oddbjornik Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 |
No, I haven't, but I'm starting to suspect that the problem might be that I restart the already started cuda tasks with new versions of the program and dlls. I will let the existing tasks finish, then try the Cuda32 version of x41zc on freshly downloaded tasks, and if that works, the Cuda42 version also on freshly downloaded tasks. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Try the Claggy sugestion first, i have 8 hosts running x41zc cuda5, in one just one (why i´m still working with Jasons to find) simply the only build that works is cuda32 on the others 7 all works fine (1 have 3 with the exactly MB/GPU, etc. of the one that fails, so for now we think is not the hardware itself)). Is not BSOD just driver crash, but i use a diferent GPU than you use. At least in my case the cuda task could be started with any of the versions of the program with no problem (besides the host who only can run cuda32 tasks), so you don´t need to wait to try, just remember to correct copy all files and and do the aimerge cmd. Another think you could check is the temperature of the GPU, x41zc is far superior in performance than x41g, so it´s generate more heat, anything could happens if your GPU overheat. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
On the chance that the Cuda 3.2 build appears to process fine under the same conditions, while 4.2 & 5.0 are unstable on the machine, there are a few possibilities to look over: The usual power & thermal issues should be looked at first & eliminated as possibilities. While when working, the performance of the 4.2 app on that card would be some non-zero amount faster, it'll be placing some possibly marginal piece of hardware, firmware or software under different stress loadings (even though at the application end, the code is identical) With Fermi upwards, the newer Cuda revisions use more motherboard resources, namely DMA engines and a lot of helper-threading, which can expose weaknesses in BIOS settings or other system configuration that didn't matter before. Things to check here would be system RAM timing & voltage specs. In more than a few reported cases, for example, enthusiast grade RAM is picked up by motherboard auto settings as being capable of a command rate of 1 cycle. In most cases that's unrealistic for the motherboard or memory controller even if the RAM itself can handle it. Another RAM related example is that while correct RAM timings may be in effect for the system RAM & controller, the RAM was automatically given increased voltage as spec dictates, but the memory controller didn't receive its additional voltage ( for the Core i7 example, the controller voltage should be around 75%-80% of the RAM voltage, but the defaults are often on the low side around 1.05-1.15 volts, too low for 1.6V spec RAM, leading to excess current sinking into the CPU. In the case of 1.6V RAM it should be 1.2-1.28V ) Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Oddbjornik Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 |
The cuda 3.2 build of x41zc runs without problems. I'm not much into memory timings and voltages, and I don't know how to adjust any of those settings on this system. I'll give the cuda 4.2 build another chance. If it still fails, I'd be happy to run any logging/checking/whatever might be needed to find out what the problem is. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
The cuda 3.2 build of x41zc runs without problems. If 3.2 runs and others no that point to the memory timmings/voltages pointed by Jason´s, to look that the easy way to look is by the CPU-z program, if you could post the screen so Jasons could give you a hand (he is in Australia so it´s expected he is sleeping now). BTU what is your MB? |
Oddbjornik Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 |
...and the cuda 4.2 build bluescreened within seconds. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
...and the cuda 4.2 build bluescreened within seconds. Post your memory timings, volts and don´t forget the MB model. |
Oddbjornik Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 |
Screendumps from CPU-Z: |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Jason´s would give you a better help, but until then, look your 2 screen Command Rate is at 1T, few MB work with that, in my MB at least no, Jasons instructs to change to 2T (they are changed in the Bios settings). Try you have nothing to loose, our you could wait Jason´s return. |
Oddbjornik Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 |
There are no memory timing settings in this BIOS, as far as I can see. It's just an old Dell computer... |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
There are no memory timing settings in this BIOS, as far as I can see. It's just an old Dell computer... Dell probably has the bios all locked down so there is little tinkering you are able to do with it. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
My sugestion, keep running 32 until Jason give you a clue, he is our "master guru", if there anyone that could help you is him. |
hbomber Send message Joined: 2 May 01 Posts: 437 Credit: 50,852,854 RAC: 0 |
These are just two very slow running 4 GB sticks(default specs are 1600/9 as can be seen from model number and rightmost SPD column). 1T is just fine. tRFC is tight for 4GB sticks. 110-130 are usual values. I would start from 130 and try to tighten it. Memory controller is not even strained, it runs in dual channel mode only. Before making any assumption and mess up with timings, memory must be tested. Of course, there could be faulty module, but their settings must be the least concern. Also, checking crash dumps will reveal which faulting module(driver, generally called kernel module, not memory module) is. For s.1366 systems delta between memory controller voltage and memory module voltage must be within 0.5V. For 1.65 V DRAM voltage, VTT voltage(which is memory controller voltage) must be at least 1.15, which is default. I had several 1366 boards and did heluva things to them, back then, when they were fancy, and never saw a MoBo which gives less than 1.15 V to memory controller. Instead, they tend to increase it silently to ensure stability. They were AsRock X58 Extreme6, Foxconn Bloodrage GTI and now I still use Gigabyte X58A-UD7(the mother of motherboards for X58 platform), running i7-920 at 4.2 GHz 24/7 on air. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yep definitely Memtest86+ for starters. With the RAM running that slow 1T can be fine, and as hbomber describes. Aside from the memory integrity itself, My main concern with settings that conservative is that in the later Cuda versions as mentioned there is much more use of the DMA engines (on the card) and System memory. As there is often limited or no control available in BIOS for these machines, some conservative auto setting is probably normal, but given the shown specs, it looks more like some kindof 'fail-safe' mode for the memory. I've used this kind of machine in the past at work. Reseating / swapping the memory modules out & resetting the BIOS worked wonders in that particular case. I'd also have a closer look at the PSU & cooling, just in case there's something obvious there. JAson "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Oddbjornik Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 |
I have run 2 passes of Memtest86+ without any errors. It ran for about two and a half hours. I have downgraded Nvidia driver to 306.97. Didn't help. The PSU is a Corsair HX 650 watt. Should be adequate. The system draws a maximum of 330 watt. CPU temp hovers around 80 under full load. It used to be 15 degrees hotter before I got an Akasa Nero 2 cooler. The video card temperature never gets off the ground; the BSOD occurs about five seconds into the task, before any heating up has had time to happen. So I guess I'm down to start shuffling the memory modules around then...? Or getting myself a proper rig, instead of this juiced up old Dell :-) BlueScreenView gives this info about the crash: Technical Information: *** STOP: 0x00000116 (0xfffffa8007653110, 0xfffffa60032e1630, 0xffffffffc000009a, 0x0000000000000004) *** dxgkrnl.sys - Address 0xfffffa6003557ad4 base at 0xfffffa60034fc000 DateStamp 0x4d384226 |
Mike Davis Send message Joined: 17 May 99 Posts: 240 Credit: 5,402,361 RAC: 0 |
Out of interest, is that CPU stock clocked or have you got it overclocked? I know its below max so not terrible temperature wise, but i had a nero (probably 1st revision - it was a while back) on a 965 and that only used to hit 70C max |
Oddbjornik Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 |
There's no overclocking. The cooler fan runs slow and quiet, so I guess i could lower the temp by cranking up the fan speed. But I like it quiet, and 80C seems to work ok. |
Tom* Send message Joined: 12 Aug 11 Posts: 127 Credit: 20,769,223 RAC: 9 |
Some 560TI' came slightly overclocked from the factory even though they have no time to heat up try slowing your 560 down Bill |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.