Questions and Answers :
GPU applications :
Driver Crash/Comp Error
Message board moderation
Author | Message |
---|---|
Carl Johnson[SETI.USA] ![]() Send message Joined: 18 Feb 05 Posts: 33 Credit: 5,269,022 RAC: 0 ![]() |
This is a new eVGA mobo and 9800GT, new as in yesterday. As soon as I start seti it crashes the display driver and the task gets labeled as computation error. What could the source of this be? I have the newest drivers since cuda is working. The mobo and gpu are new, the rest of the system is a carry-over. I tried each dimm separately and they both gave this same error. BOINC version 6.4.5 nVidia driver version 7.15.11.7813 (180.48) Vista 64 home premium ![]() ![]() |
Maik Send message Joined: 15 May 99 Posts: 163 Credit: 9,208,555 RAC: 0 ![]() |
|
Carl Johnson[SETI.USA] ![]() Send message Joined: 18 Feb 05 Posts: 33 Credit: 5,269,022 RAC: 0 ![]() |
Read the snow one. If I get snow and reboot and then go into the applying updates screen, the snow goes away, thus this is a windows issue. I have vista 64, so xp crashes are not of the same problem. Read this one as well and none of these say that seti runs for 14-16 seconds, crashes, recovers, and repeats until all wu's are compiled. I'm not getting cuda errors, I'm getting driver crashes and wu's errors, although this does only happen to seti. I don't know what that spike and triple talk ais about. I tried to roll back the driver but now boinc cannot find a cuda device. ![]() ![]() |
Carl Johnson[SETI.USA] ![]() Send message Joined: 18 Feb 05 Posts: 33 Credit: 5,269,022 RAC: 0 ![]() |
OK. Here is one of the wu's http://setiathome.berkeley.edu/result.php?resultid=1113296562 What does this mean in line 1166, unknown error? ![]() ![]() |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
It's an error common when CUDA is trying to process a task with a Very Low Angle Range, or VLAR. It's suggested that those tasks be aborted since it's not likely you'll be able to complete them. it's a known bug in the CUDA app that they're working on. There's a batch file you can create to spot them before they run. There's info regarding it in this message |
Carl Johnson[SETI.USA] ![]() Send message Joined: 18 Feb 05 Posts: 33 Credit: 5,269,022 RAC: 0 ![]() |
I read about the low angle and wasn't sure if this was one of them. It mentions an angle but I wouldn't know it was low or not low. Is this an ongoing problem with the beam or is this just a bad 'batch?' ![]() ![]() |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
I read about the low angle and wasn't sure if this was one of them. It mentions an angle but I wouldn't know it was low or not low. Is this an ongoing problem with the beam or is this just a bad 'batch?' The angle range's in question are 0.1 and lower. There isn't anything wrong with the angle range. It's the CUDA app's ability to process them that's the problem. If you were using the stock or opti cpu app it would be able to complete them. Once they've worked out the bugs you'll be able to do that angle range with CUDA as well. Aborting those tasks is probably your best option at the monent if you continue to use CUDA. Edit: there's also an overflow -9 issue with CUDA and one method to keep those results from the server here It's probably worth mentioning, if your plan is to be able to set the computer and then not monitor it, you probably won't want to run CUDA. In it's present state it is not ready for that setup. |
Robert P. Herbst ![]() Send message Joined: 10 Jun 03 Posts: 45 Credit: 64,523,408 RAC: 142 ![]() ![]() |
Hi Carl; I have 6 computers here, all running Windows Vista Ultimate 64 bit. I have a wide range of graphics cards and I have had this problem will all but the NON cuda card, the 7600 series. There are constant errors that shut my computers down and daily snow storms on the computers. Even with the 9600GT. The problems are not Windows related because if you shut the SETI graphics down, the problems go away. You can let it run in the back ground, but the daily errors don't seem to happen. There is something you need to watch for with these new CUDA cards and all the GeForce cards. Pull the card out of the machine and check the capacitors. Look carefully at the F277 1500 6.3V caps. You might also check the F777 470 1.6V. These caps are scored with a "K" on their tops and they rupture through the score mark. Failure of the card seems to go in stages, probably as the caps rupture. First you get the constant error codes, then the error codes become snow storms, shortly there after the card fails completely. I have one sitting on my desk right now and the people at NIVIDIA have failed to respond to my several requests for an RMA. 2 of the three F277 caps and one of the 3 F777 caps have popped their tops. This seems to be a common problem with these cards, but more so with CUDA and SETI running together. Try turning SETI off and let it run in the background. It may prolong the life of your graphics card. Please Visit Mount Perry, Florida Home to Florida's Only Snow Capped Mountain www.mountperryfl.com |
![]() Send message Joined: 29 Nov 01 Posts: 186 Credit: 36,311,381 RAC: 141 ![]() ![]() |
Hi Carl; That really made my morning! Arg! I will have to pull the side and look at the card later today. Well at least EVGA has a lifetime warranty, yeah! |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
That really made my morning! Arg! I will have to pull the side and look at the card later today. Well at least EVGA has a lifetime warranty, yeah! So, after checking the card for problems, what was the result? Did you experience the same problem with your EVGA card? I have a very low end EVGA card and haven't had a similar experience with card failure. With the new mod CUDA app, and a batch file I use to spot VLAR tasks, it all seems to run pretty well. Should also mention I don't OC the card or control the fan speed. It's all set at default for the testing I've been doing. |
![]() Send message Joined: 29 Nov 01 Posts: 186 Credit: 36,311,381 RAC: 141 ![]() ![]() |
That really made my morning! Arg! I will have to pull the side and look at the card later today. Well at least EVGA has a lifetime warranty, yeah! Well the card has a cover plate over it, so I can not see any of the parts listed. I am not sure if I want to try to remove it, I did not look to see how simple it is, but I do NOT want to void the warranty. |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
In your case I think it has more to do with the inabilities of the CUDA app than the card just from the Very Low Angle Range problems you've had. Everyone has those. If the card starts failing or showing similar bad visual effects like snow or rebooting when doing things other than CUDA, even then I wouldn't go into the card unless you are confident you know what you're doing. My guess is though it would void the warranty as I would assume you are expected to RMA it, not take it apart. |
Carl Johnson[SETI.USA] ![]() Send message Joined: 18 Feb 05 Posts: 33 Credit: 5,269,022 RAC: 0 ![]() |
OK, well I got it stable and it seems to run for a while now. This question has move to do with software. I run GPU-Z and the clock idles at 300 and I got it clocked up to about 750 when it's needed. But why, when CUDA is running, does the clock ramp up and down at seemingly random intervals? ![]() ![]() |
![]() Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 ![]() |
But why, when CUDA is running, does the clock ramp up and down at seemingly random intervals? To make sure your GPU doesn't overheat. The VBIOS working with the Nvidia drivers will throttle the clock speed/voltage according to how hot the GPU gets. You'll probably see it go down during the GPU being under full load. |
Carl Johnson[SETI.USA] ![]() Send message Joined: 18 Feb 05 Posts: 33 Credit: 5,269,022 RAC: 0 ![]() |
I couldn't be sure. I can run 3 tasks, two with my dual core and one on the cuda. I restarted the and now I'm only running two and the clock on the gpu hasn't dropped back to idle speed. I'm not sure if it was temp slowing the card, if it 'throttles' I would figure it would use more than a off or on approach and actually slow the clock speed, but that might be asking for too much programming. This card runs 80*C when I play cod4 and it doesn't kick out, I couldn't imagine that cuda could be any more demanding than a game like that. ![]() ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.