Message boards :
Number crunching :
NVIDIA driver crashing
Message board moderation
Author | Message |
---|---|
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
Last night, I was physically sitting in front of my i7 main cruncher (which I only do about once a week), playing Solitaire while waiting for the washing machine to finish. Suddenly, the display went blank. After a few seconds, it recovered and a balloon said the driver had crashed and recovered. This happened several times before the washer finished and I went upstairs. Sometime during the night, the computer froze. I tried to get in with Teamviewer and it had been offline since around 2-3 a.m. I got home just now and had to hold the power button to shut it down. After restarting it, everything appeared to be normal (except Boinc took longer than normal to get going). Then the display blanked a couple more times. I have now suspended Boinc and it hasn't happened since. Details: GT440 running driver 353.30, Win 7 64 bit. Questions: has this happened to anyone else? Is there a newer driver that is known to be okay for Seti and Einstein? Could this be a sign that the GPU itself is dying? Any other suggestions? David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66 |
I used to get this with my GT430 on Einstein when I was running 2 tasks at a time and OCing about 10%, I was pushing it as hard as I dared. Since then it has been relegated to Seti and only running 1 task but still OCing I have not had that happen. |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
I used to get this with my GT430 on Einstein when I was running 2 tasks at a time and OCing about 10%, I was pushing it as hard as I dared. Since then it has been relegated to Seti and only running 1 task but still OCing I have not had that happen. Okay, thanks. Update: I suspended the GPU and allowed the CPU to go back to work and nothing has happened. I suppose one option would be to go to the Einstein site and disable GPU work, then abort all the GPU tasks I have and see if Seti and Beta work okay. Don't want to do that if there's an easier way. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
I've seen that when bumping up the memory clock via Nvidia Inspector on the cards. Couple of question about your GPU computing. Are you overclocking the cards? Do you used NI or GPUZ to increase any of the setting over factory? How many work units per card are you crunching? What about the ratio of CPU to GPU for each work unit? |
KLiK Send message Joined: 31 Mar 14 Posts: 1304 Credit: 22,994,597 RAC: 60 |
|
Ulrich Metzner Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 |
Have you checked the fans and blown out the dust bunnies? Aloha, Uli |
Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3 |
It happens to me too Doc? . If it's doing to much work at SETI and then I try watching a video or play a game or if I've been mining and then stop mining and watch a video seems to be if you ask it to do to much it will crash and recover at a lower resolution this has happened while playing games to me Until it actually errors out a unit don't worry as so far it hasn't done that Seti just keeps on keeping on with no prob's |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
Update: Okay, first of all, it is definitely a problem with it blanks the screen every minute or so while I'm sitting in front of it, and even more of a problem when the whole computer freezes after a few hours. It did that again last night and I just now took the time while it was shut down anyway to open it up and blow out the dust. There was surprisingly little of it in there. As soon as I started it up again, it started crashing again. I suspended the GPU and it quit. I updated the driver, but it didn't help. It was weird that it was crashing because I thought I had suspended all the individual GPU tasks. Finally, I noticed a Seti Beta openCL task. It would pop up as running, the screen would blank, and when it recovered the task wasn't running. I tracked it down and suspended it, and the crashing stopped. Then I unsuspended one Seti task. No crash. I suspended that and unsuspended one Einstein. No crash. I suspended that and unsuspended one Beta Cuda50. No crash. Once more, I suspended that and unsuspended the openCL. Crash. I suspended that again and unsuspended every GPU task I had except the openCLs. No crashing. This just leaves the question of why it suddenly started crashing when I had not made any changes to the system in months (even the Windows updates are a few weeks old). I will post this info in Beta too. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3 |
David have a look at the event logs and see if it can give you more of a idea . Look at my post Computer shut down WMI I'm using a cmdline on the open cl AP's and it crashed 3 times in the last day or 2 . The computer would restart and be ok till I started seti . I've made a change so hopefully it's fixed now but I will have to see if it crashes again in the next 24 hrs . |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Update: Ah, now it makes much more sense. I thought you were just talking about Seti Main work units. Now that you said it was a OpenCl on Beta, it makes much more sense. I'm guessed and saw I was correct, that it's the OpenCL_Nvidia_sah work units aka VLARS Those really should be crunched on the newer Maxwell cards or Keplers. Even with my Machines, they were hard pressed. They use a full 1 core apiece. They also shouldn't be mated with any other work units from any other project. I think there was a decussion about those on Beta some time back if I remember correctly. I would recommend not crunching any more of those on your GPU . Zalster |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
Update: Forgot to mention, I have made no adjustments to the way the card itself operates since I bought the computer several years ago. I operate Seti and Einstein two at a time, but I have never set Beta to do that. Is there a way I can set Beta to run cuda but not opencl? I don't see a pref for it (like Einstein has). Would I have to go with optimized apps? I've never done that on Beta (reasoning that the point of the Beta project is to test the next generation of Main apps, so let it do its own thing). David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
Update: No Optimized Apps are allowed at Beta... However; you can run an app_config.xml file that may suit your needs to run only CUDA Units. Zalster, or someone else, will have to help you configure that for your system. The app_config.xml I have was created for me by Joe Segur. I've modified it a couple of times; but, I could NOT have created it on my own from scratch. TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Edit.. True TL, you could just do a app_config.xml |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
<app_config> <app_version> <app_name>setiathome_v7</app_name> <plan_class>cuda42</plan_class> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> </app_version> </app_config> I think this should work. Of course when V8 comes out, it will have to be modified for them I hope this will prevent you from getting those OpenCl, not 100% sure so if anyone else has input I'd appreciate it. If it turns out that it doesn't then we can talk about using an <exclude> in the form of a cc_config.xml This app_config is set up for cuda 42. does your GPU run 32? if so then we can easily modify the above for you. Zalster Edit.. I changed it for cuda 42 only |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
As i said at Beta, update your apps with the latest Lunatics installer, the rev 2737 OpenCL AP app that you running has a problem with bool2 being reserved in the latest Nvidia driver, The latest Lunatics Installer has rev 2887 for the Nvidia OpenCL AP app, this app has been fixed for this problem, download and install the latest Lunatics Installer. http://setiathome.berkeley.edu/result.php?resultid=4536904643 Stderr output Claggy |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
As i said at Beta, update your apps with the latest Lunatics installer, the rev 2737 OpenCL AP app that you running has a problem with bool2 being reserved in the latest Nvidia driver, Okay, I'll do that for Main, but how will that fix my Beta problem? Zalster, I'll try that. Thanks, guys. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
<app_config> What if I want it to be able to run any cuda? David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Sorry for the delay David. To change it to any cuda,edit the app_config.xml in notepad, copy the section between <app_version> to </app_version> and paste it before </app_config> then change the number in cuda 42 to which ever cuda you want, example cuda50 or cuda32 save and then have boinc reread the config file. |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
Sorry for the delay David. No problem. I was reading (and almost comprehending) the discussion of the current Lunatics. I have the app_config with cuda32, 42, and 50 in the Beta folder, but I'm still afraid to resume those suspended opencls. Should I just abort them and see what happens with the next server contact? --------------- As a separate but related matter, if I currently have no APs from Main, can I install Lunatics without finishing off the MB tasks in progress? The app_config in my Main folder is just this: <app_config> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>.04</cpu_usage> </gpu_versions> </app> </app_config> Do I just need to duplicate the body of that with astropulse instead of setiathome? EVERY occurrence of astropulse in my current app_info.xml has the count set to .5 . Also, I seem to remember having just a little bit of customization going (there was a thread a while back where I tried to customize it more and it promptly started crashing). Is that controlled from app_info? David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
Sorry for the delay David. David, Here is my app_config.xml file originally created by Joe Segur: <app_config> <app> <name>astropulse_v7</name> <max_concurrent>2</max_concurrent> <gpu_versions> <gpu_usage>.5</gpu_usage> <cpu_usage>.5</cpu_usage> </gpu_versions> </app> <app> <name>setiathome_v7</name> <max_concurrent>2</max_concurrent> <gpu_versions> <gpu_usage>0.50</gpu_usage> <cpu_usage>0.04</cpu_usage> </gpu_versions> </app> </app_config> ------------------------------------------------------- With my app_config.xml, (used both on Prometheus with the GTX-750 TI SC, and Exeter with the GTX-760), both of my machines will crunch two units at a time. Either 1 MB and 1 AP, or 2 AP, or 2 MB... TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.