Message boards :
Number crunching :
Some puzzle...
Message board moderation
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
About my unstable NV host again... Now it entered into period of increased instability again. Blue screens or reboots almost immediately after login or even before login (BOINC running as service). And this happens under both installed OSes, Win2003 Server x64 and Win7 x64. I booted Win7 into safe mode and moved BOINC's data folder so was able to re-boot into Win2003 server w/o BSoD. Then I started to test GPU with MSI Afterburn. Burn test (like FurMark) ran ~10 mins, GPU temp increased over 70C, GPU load was 98% or more, one CPU core was completely busy... and no BSoDs/restarts. But when I restored BOINC setup (that configured to run 1 CPU core + GPU) BSoD happened almost immediately. So, the puzzle is: in what system load from FurMark/MSI Afterburn differs so radically from BOINC load? IMHO power draw from PSU should be even higher with burn-in test... Unfortunately, I can't measure power directly, but GPU temperature was lower with CUDA app.... SETI apps news We're not gonna fight them. We're gonna transcend them. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Assuming the card or anything else isn't broken, If you applied Windows updates since June/July this year, then you have a fairly major technology mismatch (as far as Cuda is concerned) between Windows, and using an old driver. There are substantial changes to texture/font cache management, most of which would be resolved by using the newest WHQL [clean install advanced option] & x41zc public beta application. These synchronisation issues aren't 'correctable' using old setup, as they are deemed critical security issues (hence BSOD), and are a function of the evolving landscape of gpgpu technology. Happy new year, Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
The latest 310.70 NV driver has been getting good comments. You might try a clean install of that on the Win 7 OS, as Jason suggested. I updated my Win 7 rig (my daily driver) to it a few days ago, and it seems to be working very well. I am still running the x41z app, but will be updating that soon as well. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
In general auto-update disabled there on both OSes, but can't be sure when it was manually updated last time. Cause Win7x64 not "production" OS there can experiment with it a little. Till now it looked as purely hardware issue (leaving CPU completely idle usually decreased frequency of BSoDs). But cause it holds burn-in GPU tests quite Ok... SETI apps news We're not gonna fight them. We're gonna transcend them. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
The latest 310.70 NV driver has been getting good comments. Running legacy GPUs on the 304.xx and later drivers introduces quite a slowdown on the x41 Cuda32 and Cuda42 apps, while the Cuda5 app is even slower, the Cuda22 and Cuda23 apps don't seem to be affected. (at least on my 9800GTX+ Win Vista x64 host, i haven't managed to get anyone with a GTX2** GPU to do similar benches yet), that's why my 9800GTX+ runs 301.42 Cuda42 drivers, my last posted bench: Message 1284483 Claggy |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Cause I'm going to do driver update for Win7 I can do tests on GTX260. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
Does the 260 comp work with stock app without problem??? Might try driver 306.97 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Completely disabling CPU again makes host much more stable. Looks like it's hardware problem after all. Need some burn-in CPU tests to check. SETI apps news We're not gonna fight them. We're gonna transcend them. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
The latest 310.70 NV driver has been getting good comments. I believe Jason recommends 2.3 for 200 series cards. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49 |
The cpu as an i/o device for the gpu, sigh. I'd tried 310.70, the beta version, I'd gotten a BSOD, it might have been a problem with My hardware before I thoroughly cleaned out the PC, I'm running 306.97 x64 on Win 7 Pro x64 and x41zc, I see regular slow downs from around 10 minutes to about 18-20 minutes per wu crunched, temps fall from in the mid to low 70's to the mid 60's when this happens, I'm also using Boinc 6.10.58 x64 and BoincTasks 1.44 x64 too, I don't know if the author sees this as important or not, but it should be looked into, all this happens on an EVGA GTX590 Classified(a model #1598 in fact), I run from 7pm to 7am in the winter and 8pn to 8am the rest of the time with the fan at 100%(not a mere 95%) using Precision X 3.04 and I do have clean 12v power going to the pcie bus as I have an EVGA Power Booster x1 pcie card in place, so I have plenty of power going to the GTX590 card, I get driver crashes once a day when doing Seti, but the card just picks up and just keeps going. This isn't a complaint... The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I get driver crashes once a day when doing Seti, but the card just picks up and just keeps going. Did you try to increase watchdog timer value via Windows registry ? Driver restart can be because of just by too lenghtly kernel (or sequence of kernels) call. If so, increasing that timer value will solve problem or will make driver restart condition less frequent. SETI apps news We're not gonna fight them. We're gonna transcend them. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49 |
I get driver crashes once a day when doing Seti, but the card just picks up and just keeps going. If this is the DCI value of 7, then What do suggest Raistmer? Would 60 be alright? HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicDrivers\DCI\ This is the only 'timeout' that I see in this area. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Two things apply there: 1) TDR only applies on displays with an active display connected. So if the issue is TDR related (at all), it should only show on particular GPUs with a monitor connected. There are two inbuilt settings in x41zc for individually controlling both process priority either globally or for individual GPUs. To use them you create a mbcuda.cfg text file in the project directory & reference it the app_info.xml, as per the provided example mbcuda.cfg. As the default settings are conservative (for Pre-Fermi belownormal, pfblockspersm=1, pf=100) I doubt this is an issue unless there is something particularly unusual about the particular system, but a stripped down example to reduce the settings while retaining abovenormal process priority would look like this for global control: [mbcuda] processpriority = abovenormal pfblockspersm = 1 pfperiodsperlaunch = 20 or for specific GPU (Cuda 3.2 build or higher required), slot and bus determined from stderr device listing: [mbcuda] processpriority = abovenormal pfblockspersm = 1 pfperiodsperlaunch = 100 [bus1slot0] processpriority = abovenormal pfblockspersm = 1 pfperiodsperlaunch = 10 2) TDR period is pretty long by default on XP, like 10 seconds or something. As this hasn't been reported by others to particularly manifest with default settings on newer OSes with much shorter TDR timeout period, I would recommend to investigate/diagnose all hardware and BIOS settings in detail, as well as apply the reduced settings as in 1 while diagnosing. HTH Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49 |
In Windows 7 x64 the Timeout is set at 7, I set it at 60, I also did the following: what i did was adding to the registry (using "regedit") the following DWORDS: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicDrivers\ [added "TdrLevel=0" and "TdrDelay=10"] && HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicDrivers\Timeout [changed "Timeout" value to 0x60] I put both newly created 64 bit Dwords in the same folder as DCI Timeout, after wards I rebooted to PC and Windows 7 Pro x64, if one has a 32bit Windows OS one would use by default 32bit Dwords. Whether this is the right place or not I don't know. http://stackoverflow.com/questions/10272513/cuda-nvidia-driver-crash-while-running The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Completely disabling CPU again makes host much more stable. Looks like it's hardware problem after all. Need some burn-in CPU tests to check. As well, as a side note on the original thread issues: I had a stark reminder today on my i5 w/GTX560ti, with a BSOD, that it needed a cleanout & reapplication of heatsink goo. As it uses the stock heatsink which IMO is too small, any kind of paste tends to dry out over a few months, so needing a good going through. Combined with several months of dust bunnies that was enough for its only issues. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Completely disabling CPU again makes host much more stable. Looks like it's hardware problem after all. Need some burn-in CPU tests to check. It's winter here now, and the crunchers heat my house... But during the summer months, any time I get a rig that starts to act up in any way.......the first thing I do is shut it down and clean the kitty furs out of the heat sinks. Many times, that is all that is wrong. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
In Windows 7 x64 the Timeout is set at 7, I set it at 60, I also did the following: Here is what AMD recommends to do to disable watchdog timer under Vista: Under Windows Vista, to prevent long programs from causing a dialog to be displayed But, as Jason stated, try to tune app first. This measure just to check if too long kernel call applies or not to the problem. SETI apps news We're not gonna fight them. We're gonna transcend them. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49 |
In Windows 7 x64 the Timeout is set at 7, I set it at 60, I also did the following: I went with what I've found and I've not had one video driver crash since, so I'll not disable such and such, as I'm happy right where the pc is set at. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.