| Author |
Message |
|
|
|
I'm running an EVGA nvidia gts250 (512mb/19107 driver).
Sometimes if I launch a game on my crunch box while seti is running, the graphic card blanks-n-tanks, I get a message the driver has recovered, and the current CUDA WU terminates with a "Computational Error."
All that is fine. What isn't fine is from then on out, at least until I reboot, CUDA jobs that used to run in 20 minutes will now run slow-mo taking over 2 hours.
Has anyone seen this and do you know of any process other than rebooting the box to get the nvidia card to re-initialize properly? Full exit of boinc & services does not resolve, this is burried someplace. I've tried the obvious things inside the nvidia applet (and EVGA Precision) and windows proper.
Boinc Messages:
10/21/2009 9:08:53 PM NVIDIA GPU has become unusable; disabling tasks
10/21/2009 9:08:55 PM NVIDIA GPU has become usable; enabling tasks
Windows Event Viewer:
Display driver nvlddmkm stopped responding and has successfully recovered.
Thx in advanced & cheers.
|
|
|
|
|
What isn't fine is from then on out, at least until I reboot, CUDA jobs that used to run in 20 minutes will now run slow-mo taking over 2 hours.
That's because all tasks are running in CPU-fallback mode until you reboot and thus reinitialise your graphics device.
As far as I know, there's no other way than rebooting.
Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)
SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours |
|
|
|
|
|
Full power recycle on the hardware, aka a reboot or power down/power up, is the only way to reinitialize stuck hardware.
____________
Jord
- BOINC FAQ Service
- BOINC User Wiki
Real is just a matter of perception. |
|
|
Claggy Volunteer tester Send message
Joined: 5 Jul 99 Posts: 3363 Credit: 25,949,683 RAC: 1,140

|
|
You didn't say what Boinc version you're running, but i suspect it's 6.10.14,
as i've had 'NVIDIA GPU has become unusable; disabling tasks' as well,
mine was on Collatz Conjecture with a bit of IE Browsing and downloading,
But went away when i upgraded to 6.10.15, changes are:
Rom 19 October 2009
- client: Use is_remote_desktop() instead of the various GPU functions to determine when the client software has been switched into Remote Desktop mode and shutsdown GPU apps. This will prevent App crashes
Claggy |
|
|
|
|
|
Thx all & Claggy.
Yes, I was on .14 and just updated to 6.10.15 ! Thx, good find.
One other item I stumbled on last night trying to solve this, I'm using a Gigabyte (P55-UD2) motherboard, and it comes with their own clocking util (EasyTune6) to ease configuration of the Award Bios settings. Although not all settings are part of BIOS, it has a TAB called "Graphics," and sure enough the values it showed were the slow ones that only appear after a GPU crash & recover. Although I could not set the values to the OC levels, I *COULD* up the values to the original out-of-the-box levels. So GPU WU's instead of taking 20 minutes will take 22 minutes after a crash (which is much better than 2 hours).
Classic, those non-Bios settings don't survive a reboot (but do of course survive a GPU crash & driver recover), but I might be able to auto-load a profile file, I'll worry about it later. But thought I'd share this info now, as here were a set of settings that really don't belong where they were, and somehow inserted themselves AHEAD of everything nVidia ships (albeit, only in the scenario of a GPU crash).
|
|
|
|
|
What isn't fine is from then on out, at least until I reboot, CUDA jobs that used to run in 20 minutes will now run slow-mo taking over 2 hours.
Its because the card is running in 2d mode.
I always have this error with this setup:
GPU0: GTX295
-> SLI
GPU2: GTX295
-> PhysX
GPU1: GTX260
But when use this setup (no crashing anymore):
GPU0: GTX295
-> PhysX
GPU2: GTX295
-> Extend monitor
GPU1: GTX260
____________
|
|
|
|
|
You didn't say what Boinc version you're running, but i suspect it's 6.10.14,
as i've had 'NVIDIA GPU has become unusable; disabling tasks' as well,
mine was on Collatz Conjecture with a bit of IE Browsing and downloading,
But went away when i upgraded to 6.10.15, changes are:
Rom 19 October 2009
- client: Use is_remote_desktop() instead of the various GPU functions to determine when the client software has been switched into Remote Desktop mode and shutsdown GPU apps. This will prevent App crashes
Claggy
Looks like I'll have to upgrade. I had the exact same crash (trashed 3 GPUGrid units) yesterday while gaming. Currently using 6.6.36
____________
|
|
|
|
|
Looks like I'll have to upgrade. I had the exact same crash (trashed 3 GPUGrid units) yesterday while gaming. Currently using 6.6.36
Lucky you, looks like 6.10.16 just got released. |
|
|
|
|
|
I've upgraded. I was gaming with BOINC completely shut down. Still had the video driver crash.
Display driver nvlddmkm stopped responding and has successfully recovered. (Event ID 4101)
This has happened with the current drivers and latest previous drivers. I never suffered a video crash with SETI CUDA. The problems started a few days into GPU Grid (was gaming and crunching Grid at the same time.) So I'm wondering if a file somewhere has been corrupted.
____________
|
|
|
|
|
|
It's happened to me before, the GPU clocks go from 550/1375/900 MHz (core/shaders/memory) to 383/767/301 MHz and stay there, which is why WUs take considerably longer. Unfortunately, the only solution I know of is to restart. The drivers do this when the card gets too hot and/or you overclock too far.
You can use GPU-Z to check whether the clocks go down (under the Sensors tab).
____________
|
|
|