Driver Crash/Comp Error

Questions and Answers : GPU applications : Driver Crash/Comp Error
Message board moderation

To post messages, you must log in.

AuthorMessage
Carl Johnson[SETI.USA]
Volunteer tester
Avatar

Send message
Joined: 18 Feb 05
Posts: 33
Credit: 5,269,022
RAC: 0
United States
Message 849852 - Posted: 5 Jan 2009, 22:19:48 UTC

This is a new eVGA mobo and 9800GT, new as in yesterday. As soon as I start seti it crashes the display driver and the task gets labeled as computation error. What could the source of this be?

I have the newest drivers since cuda is working. The mobo and gpu are new, the rest of the system is a carry-over. I tried each dimm separately and they both gave this same error.

BOINC version 6.4.5
nVidia driver version 7.15.11.7813 (180.48)
Vista 64 home premium

ID: 849852 · Report as offensive
Maik

Send message
Joined: 15 May 99
Posts: 163
Credit: 9,208,555
RAC: 0
Germany
Message 849891 - Posted: 5 Jan 2009, 23:42:29 UTC

ID: 849891 · Report as offensive
Carl Johnson[SETI.USA]
Volunteer tester
Avatar

Send message
Joined: 18 Feb 05
Posts: 33
Credit: 5,269,022
RAC: 0
United States
Message 849905 - Posted: 6 Jan 2009, 0:07:18 UTC - in response to Message 849891.  
Last modified: 6 Jan 2009, 0:13:53 UTC

Read the snow one. If I get snow and reboot and then go into the applying updates screen, the snow goes away, thus this is a windows issue.

I have vista 64, so xp crashes are not of the same problem. Read this one as well and none of these say that seti runs for 14-16 seconds, crashes, recovers, and repeats until all wu's are compiled.

I'm not getting cuda errors, I'm getting driver crashes and wu's errors, although this does only happen to seti. I don't know what that spike and triple talk ais about. I tried to roll back the driver but now boinc cannot find a cuda device.

ID: 849905 · Report as offensive
Carl Johnson[SETI.USA]
Volunteer tester
Avatar

Send message
Joined: 18 Feb 05
Posts: 33
Credit: 5,269,022
RAC: 0
United States
Message 849909 - Posted: 6 Jan 2009, 0:12:46 UTC

OK. Here is one of the wu's
http://setiathome.berkeley.edu/result.php?resultid=1113296562

What does this mean in line 1166, unknown error?

ID: 849909 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 849916 - Posted: 6 Jan 2009, 0:35:11 UTC - in response to Message 849909.  
Last modified: 6 Jan 2009, 0:38:00 UTC

It's an error common when CUDA is trying to process a task with a Very Low Angle Range, or VLAR. It's suggested that those tasks be aborted since it's not likely you'll be able to complete them. it's a known bug in the CUDA app that they're working on.

There's a batch file you can create to spot them before they run. There's info regarding it in this message
ID: 849916 · Report as offensive
Carl Johnson[SETI.USA]
Volunteer tester
Avatar

Send message
Joined: 18 Feb 05
Posts: 33
Credit: 5,269,022
RAC: 0
United States
Message 849946 - Posted: 6 Jan 2009, 1:57:28 UTC
Last modified: 6 Jan 2009, 2:06:40 UTC

I read about the low angle and wasn't sure if this was one of them. It mentions an angle but I wouldn't know it was low or not low. Is this an ongoing problem with the beam or is this just a bad 'batch?'

ID: 849946 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 849960 - Posted: 6 Jan 2009, 2:45:07 UTC - in response to Message 849946.  
Last modified: 6 Jan 2009, 2:59:02 UTC

I read about the low angle and wasn't sure if this was one of them. It mentions an angle but I wouldn't know it was low or not low. Is this an ongoing problem with the beam or is this just a bad 'batch?'


The angle range's in question are 0.1 and lower. There isn't anything wrong with the angle range. It's the CUDA app's ability to process them that's the problem. If you were using the stock or opti cpu app it would be able to complete them. Once they've worked out the bugs you'll be able to do that angle range with CUDA as well. Aborting those tasks is probably your best option at the monent if you continue to use CUDA.

Edit: there's also an overflow -9 issue with CUDA and one method to keep those results from the server here

It's probably worth mentioning, if your plan is to be able to set the computer and then not monitor it, you probably won't want to run CUDA. In it's present state it is not ready for that setup.
ID: 849960 · Report as offensive
Robert P. Herbst
Volunteer tester
Avatar

Send message
Joined: 10 Jun 03
Posts: 45
Credit: 64,523,408
RAC: 142
United States
Message 850069 - Posted: 6 Jan 2009, 11:02:06 UTC - in response to Message 849905.  

Hi Carl;
I have 6 computers here, all running Windows Vista Ultimate 64 bit. I have a wide range of graphics cards and I have had this problem will all but the NON cuda card, the 7600 series. There are constant errors that shut my computers down and daily snow storms on the computers. Even with the 9600GT. The problems are not Windows related because if you shut the SETI graphics down, the problems go away. You can let it run in the back ground, but the daily errors don't seem to happen.
There is something you need to watch for with these new CUDA cards and all the GeForce cards. Pull the card out of the machine and check the capacitors. Look carefully at the F277 1500 6.3V caps. You might also check the F777 470 1.6V.
These caps are scored with a "K" on their tops and they rupture through the score mark. Failure of the card seems to go in stages, probably as the caps rupture. First you get the constant error codes, then the error codes become snow storms, shortly there after the card fails completely. I have one sitting on my desk right now and the people at NIVIDIA have failed to respond to my several requests for an RMA. 2 of the three F277 caps and one of the 3 F777 caps have popped their tops. This seems to be a common problem with these cards, but more so with CUDA and SETI running together. Try turning SETI off and let it run in the background. It may prolong the life of your graphics card.
Please Visit Mount Perry, Florida
Home to Florida's Only Snow Capped Mountain
www.mountperryfl.com
ID: 850069 · Report as offensive
Profile BigWaveSurfer

Send message
Joined: 29 Nov 01
Posts: 186
Credit: 36,311,381
RAC: 141
United States
Message 850097 - Posted: 6 Jan 2009, 12:41:48 UTC - in response to Message 850069.  

Hi Carl;
I have 6 computers here, all running Windows Vista Ultimate 64 bit. I have a wide range of graphics cards and I have had this problem will all but the NON cuda card, the 7600 series. There are constant errors that shut my computers down and daily snow storms on the computers. Even with the 9600GT. The problems are not Windows related because if you shut the SETI graphics down, the problems go away. You can let it run in the back ground, but the daily errors don't seem to happen.
There is something you need to watch for with these new CUDA cards and all the GeForce cards. Pull the card out of the machine and check the capacitors. Look carefully at the F277 1500 6.3V caps. You might also check the F777 470 1.6V.
These caps are scored with a "K" on their tops and they rupture through the score mark. Failure of the card seems to go in stages, probably as the caps rupture. First you get the constant error codes, then the error codes become snow storms, shortly there after the card fails completely. I have one sitting on my desk right now and the people at NIVIDIA have failed to respond to my several requests for an RMA. 2 of the three F277 caps and one of the 3 F777 caps have popped their tops. This seems to be a common problem with these cards, but more so with CUDA and SETI running together. Try turning SETI off and let it run in the background. It may prolong the life of your graphics card.


That really made my morning! Arg! I will have to pull the side and look at the card later today. Well at least EVGA has a lifetime warranty, yeah!

ID: 850097 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 850554 - Posted: 7 Jan 2009, 20:32:24 UTC - in response to Message 850097.  
Last modified: 7 Jan 2009, 20:35:54 UTC

That really made my morning! Arg! I will have to pull the side and look at the card later today. Well at least EVGA has a lifetime warranty, yeah!

So, after checking the card for problems, what was the result? Did you experience the same problem with your EVGA card? I have a very low end EVGA card and haven't had a similar experience with card failure. With the new mod CUDA app, and a batch file I use to spot VLAR tasks, it all seems to run pretty well.

Should also mention I don't OC the card or control the fan speed. It's all set at default for the testing I've been doing.
ID: 850554 · Report as offensive
Profile BigWaveSurfer

Send message
Joined: 29 Nov 01
Posts: 186
Credit: 36,311,381
RAC: 141
United States
Message 850641 - Posted: 8 Jan 2009, 0:44:20 UTC - in response to Message 850554.  

That really made my morning! Arg! I will have to pull the side and look at the card later today. Well at least EVGA has a lifetime warranty, yeah!

So, after checking the card for problems, what was the result? Did you experience the same problem with your EVGA card? I have a very low end EVGA card and haven't had a similar experience with card failure. With the new mod CUDA app, and a batch file I use to spot VLAR tasks, it all seems to run pretty well.

Should also mention I don't OC the card or control the fan speed. It's all set at default for the testing I've been doing.


Well the card has a cover plate over it, so I can not see any of the parts listed. I am not sure if I want to try to remove it, I did not look to see how simple it is, but I do NOT want to void the warranty.
ID: 850641 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 850645 - Posted: 8 Jan 2009, 0:53:10 UTC - in response to Message 850641.  
Last modified: 8 Jan 2009, 1:00:40 UTC

In your case I think it has more to do with the inabilities of the CUDA app than the card just from the Very Low Angle Range problems you've had. Everyone has those.

If the card starts failing or showing similar bad visual effects like snow or rebooting when doing things other than CUDA, even then I wouldn't go into the card unless you are confident you know what you're doing. My guess is though it would void the warranty as I would assume you are expected to RMA it, not take it apart.
ID: 850645 · Report as offensive
Carl Johnson[SETI.USA]
Volunteer tester
Avatar

Send message
Joined: 18 Feb 05
Posts: 33
Credit: 5,269,022
RAC: 0
United States
Message 850653 - Posted: 8 Jan 2009, 1:20:23 UTC

OK, well I got it stable and it seems to run for a while now.

This question has move to do with software. I run GPU-Z and the clock idles at 300 and I got it clocked up to about 750 when it's needed. But why, when CUDA is running, does the clock ramp up and down at seemingly random intervals?

ID: 850653 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 850657 - Posted: 8 Jan 2009, 1:39:02 UTC - in response to Message 850653.  

But why, when CUDA is running, does the clock ramp up and down at seemingly random intervals?

To make sure your GPU doesn't overheat. The VBIOS working with the Nvidia drivers will throttle the clock speed/voltage according to how hot the GPU gets. You'll probably see it go down during the GPU being under full load.
ID: 850657 · Report as offensive
Carl Johnson[SETI.USA]
Volunteer tester
Avatar

Send message
Joined: 18 Feb 05
Posts: 33
Credit: 5,269,022
RAC: 0
United States
Message 850668 - Posted: 8 Jan 2009, 1:58:39 UTC - in response to Message 850657.  

I couldn't be sure. I can run 3 tasks, two with my dual core and one on the cuda. I restarted the and now I'm only running two and the clock on the gpu hasn't dropped back to idle speed.

I'm not sure if it was temp slowing the card, if it 'throttles' I would figure it would use more than a off or on approach and actually slow the clock speed, but that might be asking for too much programming. This card runs 80*C when I play cod4 and it doesn't kick out, I couldn't imagine that cuda could be any more demanding than a game like that.

ID: 850668 · Report as offensive

Questions and Answers : GPU applications : Driver Crash/Comp Error


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.