Message boards :
Number crunching :
Lunatics_x41zc_win32_cuda42.exe BSOD on 560ti card
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
![]() ![]() Send message Joined: 24 Mar 07 Posts: 268 Credit: 34,410,870 RAC: 0 ![]() |
I do believe you are running hot. When the gpu is running hot everything else heats up. Your intel cpu should handle it but the board may not. Also how much power is on the PSU. I had this problem quite a bit with old drivers and under powered PSU. If it bsod in seconds then power is probably a thing to really look at. Have you the latest drivers in as well? Michael Miles |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
I didn't even know what that .sys file is so I googled it & came across this thread on Win7 forums: http://www.sevenforums.com/crashes-debugging/201437-bsod-windows-7-x64-nvlddmkm-sys-dxgkrnl-sys-dxgmms1-sys.html It's from DirectX (which is interrelated) Along those lines I would suggest: Uninstall driver in safe mode, Driver sweeper, Hard disk integrity checks, clean latest WHQL driver install, Update DirectX http://www.microsoft.com/en-us/download/details.aspx?id=35 & see where we go from there. JAson I have run 2 passes of Memtest86+ without any errors. It ran for about two and a half hours. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Oddbjornik ![]() ![]() ![]() ![]() Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 ![]() ![]() |
Hmm... I did all that, and the cuda 4.2 build still crashes. Cuda 3.2 actually also caused a crash in the middle of the night, but the computer restarted and then kept running as if nothing had happened. The crash signature is the same, as far as I can see; Cuda 3.2 crash: *** STOP: 0x00000116 (0xfffffa800a413010, 0xfffffa60034e8630, 0xffffffffc000009a, 0x0000000000000004) *** dxgkrnl.sys - Address 0xfffffa600375ead4 base at 0xfffffa6003703000 DateStamp 0x4d384226 Cuda 4.2 crash: *** STOP: 0x00000116 (0xfffffa80093ea4e0, 0xfffffa6003130adc, 0xffffffffc000009a, 0x0000000000000004) *** dxgkrnl.sys - Address 0xfffffa6003305ad4 base at 0xfffffa60032aa000 DateStamp 0x4d384226 I would guess this means that the software does something that my hardware can't quite handle, and that the 4.2 software does it to a much higher degree than the 3.2 version. That seems to be in line with your general explanation of the development from 3.2 to 4.2. And now I've gone and done a fun thing. I've bought an ASUS GeForce GTX 680 DirectCU II 4 GB that I'll replace the 560ti card with. Unless you wish to do further research on the 560ti card for development reasons, I suggest I wait for the 680 card which should be here in a couple of days, and then take it from there. Cuda 5.0 and all... |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Did you try to change the 560Ti with another GPU on another one of your hosts and look if the problems solves or changes to the other hosts? That could be an interest thing to do, at least that could clearely point the source of the problem. Do you agree Jason with that aproach? ![]() |
Oddbjornik ![]() ![]() ![]() ![]() Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 ![]() ![]() |
I don't have any other hosts with room for this card, and I don't have any other cards capable of cuda 4.2, so there is nothing that can be swapped anywhere, unfortunately. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
I don't have any other hosts with room for this card, and I don't have any other cards capable of cuda 4.2, so there is nothing that can be swapped anywhere, unfortunately. Your computer list shows you have one host with NVIDIA GeForce GTS 240 (970MB) driver: 275.33, my ideia is to switch this card with the 560TI (need to check if the driver version you actualy uses runs the 560TI) just for a test purposes and look if the problem solves on the first host and passes to the second host (could be with cuda 32 just not sure if that card runs on the GTS240, you say the cuda32 already give you a BSOD). The old trial and error test cicle. On other hand, if you allready buy a new 680 why not wait for the arrival? that´s makes sense too. ![]() |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Did you try to change the 560Ti with another GPU on another one of your hosts and look if the problems solves or changes to the other hosts? That could be an interest thing to do, at least that could clearely point the source of the problem. (240 or 680 would use less power Definitely worth a shot if/when another card becomes available. Mind you this doesn't completely clarify the source of the issues if the issue stays with the machine or follows the card either (Different Gen card, Different power requirement etc), so it's difficult to write a particular piece of hardware off completely. (240 or 680 would use less power, & use different Driver code/hardware etc) Since we're down into DirectX/Kernel stuff for those failures, we're down near hardware level alright (well away from the app & even Cuda Runtimes), there are not many options between 'working' & 'something broke', so hardware swappage becomes the name of the game troubleshooting/isolation-wise. (I personally don't enjoy that game ;), I'd rather something in the app was broken, told by the faulting module & address of BSOD, as for me that'd be a quick fix :D ) The "Weird things that can 'magically' clear weird issues" list grows: - Carefully reseating everything (Card, RAM, PSU connectors) - Try a different PSU - Try a different PCIe Slot - Check the card in a kind friend's machine - Bios Updates - SSD firmware flash updates - Chipset overvolt, or undervolt - Undeclocking various items, like card, PCIe, memory, (less feasible on a machine like this) Then there's the 'Brad Approach', which I'm told involves a collection of assorted firearms, a rifle range, and ordering new computer parts. I think the idea there is to have fun with the 'trouble-shooting' process. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Hanford WA4LZC ![]() Send message Joined: 15 May 99 Posts: 38 Credit: 10,129,207 RAC: 0 ![]() |
#####Then there's the 'Brad Approach', which I'm told involves a collection of assorted firearms, a rifle range, and ordering new computer parts. I think the idea there is to have fun with the troubleshooting process.#### I used that approach on a Carter Carburetor that refused to tune..... and it was fun..... ;-) ![]() |
Highlander ![]() Send message Joined: 5 Oct 99 Posts: 167 Credit: 37,987,668 RAC: 16 ![]() ![]() |
I have read at a german forum, that the stop 116 failure can rarly happen with activated HPET (High Precision Event Timer) in BIOS on some boards. Solution there was to deactivate it. Perhaps you can try this one? - Performance is not a simple linear function of the number of CPUs you throw at the problem. - |
Oddbjornik ![]() ![]() ![]() ![]() Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 ![]() ![]() |
I'm happy to let you know that my new GTX 680 card now runs three parallel Cuda 5.0 tasks on the old Dell, without complaint. The ASUS card, i.e. the GTX 680, runs much quieter than my old EVGA 560ti card, frankly because it has inadequate cooling. It quickly reaches its Tmax of 98C, and then downclocks itself so it stays at 98C. That is not quite what I expected, but it looks like the net effect is approximately the same amount of crunching for less electricity and less noise. Not totally happy with the 98C, though. So the 560ti card apparently had issues that the new card doesn't have. And that, I suppose, is as far as we get on this thread. Thank you for all your help and suggestions. |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
A hah! Nice one :D & a bit more experience under our belts :) Yeah 98C for the 680 is certainly far warmer than their 'happy-zone' though the dynamic clocks should help keep from entering faulty operation outright. I'd definitely look at ways to get that down, namely more air in there however you can, & kicking up the fan. For comparison my own 680, when clean, runs 62-65C with stock clocks at the moment (work shortages negated the value of my regular OC), fan typically running auto around 65-70%. If there is limited airflow, the 560ti would have been quite sensitive & less tolerant of that, so chances are it may not be defective/broken, just not as sophisticated at dealing with the borderline operation. JAson "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
You could use a program like EVGA Precision to keelp the fan running fast so your GPU will runs cooler. I don´t have the 680 but my 670 runs on low 70C and the 690 runs at middle 75C You could try to use the 560TI in another host and look what happens. ![]() |
![]() ![]() Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 ![]() |
You could use a program like EVGA Precision to keelp the fan running fast so your GPU will runs cooler. I don´t have the 680 but my 670 runs on low 70C and the 690 runs at middle 75C My 670 runs around 62C and I have it overclocked at the moment. The 650Ti runs around 65C and the 660 runs at 63C. All of them I keep around 70% fan speed and I crank up the voltage to the maximum allowed. ![]() |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
I have no AC and in the middle of a tropical summer, so my temps are high than normal but certainly >90C on a 670 is a bad ideia. ![]() |
Oddbjornik ![]() ![]() ![]() ![]() Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 ![]() ![]() |
I drilled two 80mm holes in the bottom of the cabinet, more or less corresponding to the fans on the GPU card. I also lifted the cabinet about 20mm from the table to provide easy airflow. After these minor changes, the GPU temperature fell to a stable 71-73C, and it runs at full speed constantly (no more underclocking). I'm actually a little surprised that the effect was so big. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Don´t be so surprise, heat is the enemy of the GPU, most of us who don´t use water cooler, uses a lot of fans to keep them working cooler. Specialy after x41zc, it optimizations produce a great improvement on the speed performance, but generates more heat. But sure the gain in performance worths the aditional heat. Now you could try diferents WU at a time to obtain the best performance. ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.