NVIDIA driver crashing

Message boards : Number crunching : NVIDIA driver crashing
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1744415 - Posted: 24 Nov 2015, 1:43:51 UTC

Last night, I was physically sitting in front of my i7 main cruncher (which I only do about once a week), playing Solitaire while waiting for the washing machine to finish. Suddenly, the display went blank. After a few seconds, it recovered and a balloon said the driver had crashed and recovered. This happened several times before the washer finished and I went upstairs.

Sometime during the night, the computer froze. I tried to get in with Teamviewer and it had been offline since around 2-3 a.m.

I got home just now and had to hold the power button to shut it down. After restarting it, everything appeared to be normal (except Boinc took longer than normal to get going). Then the display blanked a couple more times. I have now suspended Boinc and it hasn't happened since.

Details: GT440 running driver 353.30, Win 7 64 bit.

Questions: has this happened to anyone else? Is there a newer driver that is known to be okay for Seti and Einstein? Could this be a sign that the GPU itself is dying? Any other suggestions?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1744415 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1744416 - Posted: 24 Nov 2015, 1:50:31 UTC - in response to Message 1744415.  

I used to get this with my GT430 on Einstein when I was running 2 tasks at a time and OCing about 10%, I was pushing it as hard as I dared. Since then it has been relegated to Seti and only running 1 task but still OCing I have not had that happen.
ID: 1744416 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1744417 - Posted: 24 Nov 2015, 1:58:54 UTC - in response to Message 1744416.  

I used to get this with my GT430 on Einstein when I was running 2 tasks at a time and OCing about 10%, I was pushing it as hard as I dared. Since then it has been relegated to Seti and only running 1 task but still OCing I have not had that happen.

Okay, thanks.

Update: I suspended the GPU and allowed the CPU to go back to work and nothing has happened. I suppose one option would be to go to the Einstein site and disable GPU work, then abort all the GPU tasks I have and see if Seti and Beta work okay. Don't want to do that if there's an easier way.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1744417 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1744423 - Posted: 24 Nov 2015, 2:21:30 UTC - in response to Message 1744417.  

I've seen that when bumping up the memory clock via Nvidia Inspector on the cards.

Couple of question about your GPU computing. Are you overclocking the cards? Do you used NI or GPUZ to increase any of the setting over factory? How many work units per card are you crunching? What about the ratio of CPU to GPU for each work unit?
ID: 1744423 · Report as offensive
KLiK
Volunteer tester

Send message
Joined: 31 Mar 14
Posts: 1304
Credit: 22,994,597
RAC: 60
Croatia
Message 1744485 - Posted: 24 Nov 2015, 10:33:03 UTC

happens to me also...just keep crunching! :/


non-profit org. Play4Life in Zagreb, Croatia, EU
ID: 1744485 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1744490 - Posted: 24 Nov 2015, 11:18:40 UTC

Have you checked the fans and blown out the dust bunnies?
Aloha, Uli

ID: 1744490 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1744598 - Posted: 24 Nov 2015, 22:40:45 UTC

It happens to me too Doc? . If it's doing to much work at SETI and then I try watching a video or play a game or if I've been mining and then stop mining and watch a video seems to be if you ask it to do to much it will crash and recover at a lower resolution this has happened while playing games to me

Until it actually errors out a unit don't worry as so far it hasn't done that Seti just keeps on keeping on with no prob's
ID: 1744598 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1745963 - Posted: 1 Dec 2015, 1:24:00 UTC

Update:

Okay, first of all, it is definitely a problem with it blanks the screen every minute or so while I'm sitting in front of it, and even more of a problem when the whole computer freezes after a few hours.

It did that again last night and I just now took the time while it was shut down anyway to open it up and blow out the dust. There was surprisingly little of it in there.

As soon as I started it up again, it started crashing again. I suspended the GPU and it quit. I updated the driver, but it didn't help. It was weird that it was crashing because I thought I had suspended all the individual GPU tasks. Finally, I noticed a Seti Beta openCL task. It would pop up as running, the screen would blank, and when it recovered the task wasn't running. I tracked it down and suspended it, and the crashing stopped.

Then I unsuspended one Seti task. No crash. I suspended that and unsuspended one Einstein. No crash. I suspended that and unsuspended one Beta Cuda50. No crash. Once more, I suspended that and unsuspended the openCL. Crash. I suspended that again and unsuspended every GPU task I had except the openCLs. No crashing.

This just leaves the question of why it suddenly started crashing when I had not made any changes to the system in months (even the Windows updates are a few weeks old).

I will post this info in Beta too.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1745963 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1745966 - Posted: 1 Dec 2015, 1:46:24 UTC - in response to Message 1745963.  

David have a look at the event logs and see if it can give you more of a idea .

Look at my post Computer shut down WMI I'm using a cmdline on the open cl AP's and it crashed 3 times in the last day or 2 . The computer would restart and be ok till I started seti .

I've made a change so hopefully it's fixed now but I will have to see if it crashes again in the next 24 hrs .
ID: 1745966 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1745972 - Posted: 1 Dec 2015, 2:18:52 UTC - in response to Message 1745963.  
Last modified: 1 Dec 2015, 2:20:48 UTC

Update:

Okay, first of all, it is definitely a problem with it blanks the screen every minute or so while I'm sitting in front of it, and even more of a problem when the whole computer freezes after a few hours.

It did that again last night and I just now took the time while it was shut down anyway to open it up and blow out the dust. There was surprisingly little of it in there.

As soon as I started it up again, it started crashing again. I suspended the GPU and it quit. I updated the driver, but it didn't help. It was weird that it was crashing because I thought I had suspended all the individual GPU tasks. Finally, I noticed a Seti Beta openCL task. It would pop up as running, the screen would blank, and when it recovered the task wasn't running. I tracked it down and suspended it, and the crashing stopped.

Then I unsuspended one Seti task. No crash. I suspended that and unsuspended one Einstein. No crash. I suspended that and unsuspended one Beta Cuda50. No crash. Once more, I suspended that and unsuspended the openCL. Crash. I suspended that again and unsuspended every GPU task I had except the openCLs. No crashing.

This just leaves the question of why it suddenly started crashing when I had not made any changes to the system in months (even the Windows updates are a few weeks old).

I will post this info in Beta too.



Ah, now it makes much more sense.

I thought you were just talking about Seti Main work units.

Now that you said it was a OpenCl on Beta, it makes much more sense.

I'm guessed and saw I was correct, that it's the OpenCL_Nvidia_sah work units aka VLARS

Those really should be crunched on the newer Maxwell cards or Keplers. Even with my Machines, they were hard pressed. They use a full 1 core apiece.

They also shouldn't be mated with any other work units from any other project.

I think there was a decussion about those on Beta some time back if I remember correctly.

I would recommend not crunching any more of those on your GPU .

Zalster
ID: 1745972 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1745977 - Posted: 1 Dec 2015, 3:17:30 UTC - in response to Message 1745972.  

Update:

Okay, first of all, it is definitely a problem with it blanks the screen every minute or so while I'm sitting in front of it, and even more of a problem when the whole computer freezes after a few hours.

It did that again last night and I just now took the time while it was shut down anyway to open it up and blow out the dust. There was surprisingly little of it in there.

As soon as I started it up again, it started crashing again. I suspended the GPU and it quit. I updated the driver, but it didn't help. It was weird that it was crashing because I thought I had suspended all the individual GPU tasks. Finally, I noticed a Seti Beta openCL task. It would pop up as running, the screen would blank, and when it recovered the task wasn't running. I tracked it down and suspended it, and the crashing stopped.

Then I unsuspended one Seti task. No crash. I suspended that and unsuspended one Einstein. No crash. I suspended that and unsuspended one Beta Cuda50. No crash. Once more, I suspended that and unsuspended the openCL. Crash. I suspended that again and unsuspended every GPU task I had except the openCLs. No crashing.

This just leaves the question of why it suddenly started crashing when I had not made any changes to the system in months (even the Windows updates are a few weeks old).

I will post this info in Beta too.



Ah, now it makes much more sense.

I thought you were just talking about Seti Main work units.

Now that you said it was a OpenCl on Beta, it makes much more sense.

I'm guessed and saw I was correct, that it's the OpenCL_Nvidia_sah work units aka VLARS

Those really should be crunched on the newer Maxwell cards or Keplers. Even with my Machines, they were hard pressed. They use a full 1 core apiece.

They also shouldn't be mated with any other work units from any other project.

I think there was a decussion about those on Beta some time back if I remember correctly.

I would recommend not crunching any more of those on your GPU .

Zalster

Forgot to mention, I have made no adjustments to the way the card itself operates since I bought the computer several years ago.

I operate Seti and Einstein two at a time, but I have never set Beta to do that.

Is there a way I can set Beta to run cuda but not opencl? I don't see a pref for it (like Einstein has). Would I have to go with optimized apps? I've never done that on Beta (reasoning that the point of the Beta project is to test the next generation of Main apps, so let it do its own thing).
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1745977 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1745980 - Posted: 1 Dec 2015, 3:27:21 UTC - in response to Message 1745977.  

Update:

Okay, first of all, it is definitely a problem with it blanks the screen every minute or so while I'm sitting in front of it, and even more of a problem when the whole computer freezes after a few hours.

It did that again last night and I just now took the time while it was shut down anyway to open it up and blow out the dust. There was surprisingly little of it in there.

As soon as I started it up again, it started crashing again. I suspended the GPU and it quit. I updated the driver, but it didn't help. It was weird that it was crashing because I thought I had suspended all the individual GPU tasks. Finally, I noticed a Seti Beta openCL task. It would pop up as running, the screen would blank, and when it recovered the task wasn't running. I tracked it down and suspended it, and the crashing stopped.

Then I unsuspended one Seti task. No crash. I suspended that and unsuspended one Einstein. No crash. I suspended that and unsuspended one Beta Cuda50. No crash. Once more, I suspended that and unsuspended the openCL. Crash. I suspended that again and unsuspended every GPU task I had except the openCLs. No crashing.

This just leaves the question of why it suddenly started crashing when I had not made any changes to the system in months (even the Windows updates are a few weeks old).

I will post this info in Beta too.



Ah, now it makes much more sense.

I thought you were just talking about Seti Main work units.

Now that you said it was a OpenCl on Beta, it makes much more sense.

I'm guessed and saw I was correct, that it's the OpenCL_Nvidia_sah work units aka VLARS

Those really should be crunched on the newer Maxwell cards or Keplers. Even with my Machines, they were hard pressed. They use a full 1 core apiece.

They also shouldn't be mated with any other work units from any other project.

I think there was a decussion about those on Beta some time back if I remember correctly.

I would recommend not crunching any more of those on your GPU .

Zalster

Forgot to mention, I have made no adjustments to the way the card itself operates since I bought the computer several years ago.

I operate Seti and Einstein two at a time, but I have never set Beta to do that.

Is there a way I can set Beta to run cuda but not opencl? I don't see a pref for it (like Einstein has). Would I have to go with optimized apps? I've never done that on Beta (reasoning that the point of the Beta project is to test the next generation of Main apps, so let it do its own thing).

No Optimized Apps are allowed at Beta... However; you can run an app_config.xml file that may suit your needs to run only CUDA Units. Zalster, or someone else, will have to help you configure that for your system.

The app_config.xml I have was created for me by Joe Segur. I've modified it a couple of times; but, I could NOT have created it on my own from scratch.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1745980 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1745981 - Posted: 1 Dec 2015, 3:34:02 UTC - in response to Message 1745977.  
Last modified: 1 Dec 2015, 3:35:13 UTC

Edit..

True TL,

you could just do a app_config.xml
ID: 1745981 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1745989 - Posted: 1 Dec 2015, 3:42:12 UTC - in response to Message 1745981.  
Last modified: 1 Dec 2015, 3:46:48 UTC

<app_config>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>cuda42</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
</app_config>

I think this should work. Of course when V8 comes out, it will have to be modified for them

I hope this will prevent you from getting those OpenCl, not 100% sure so if anyone else has input I'd appreciate it.

If it turns out that it doesn't then we can talk about using an <exclude> in the form of a cc_config.xml

This app_config is set up for cuda 42. does your GPU run 32? if so then we can easily modify the above for you.

Zalster

Edit..

I changed it for cuda 42 only
ID: 1745989 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1746075 - Posted: 1 Dec 2015, 10:10:08 UTC - in response to Message 1744415.  
Last modified: 1 Dec 2015, 10:14:01 UTC

As i said at Beta, update your apps with the latest Lunatics installer, the rev 2737 OpenCL AP app that you running has a problem with bool2 being reserved in the latest Nvidia driver,
The latest Lunatics Installer has rev 2887 for the Nvidia OpenCL AP app, this app has been fixed for this problem, download and install the latest Lunatics Installer.

http://setiathome.berkeley.edu/result.php?resultid=4536904643

Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
too many boinc_temporary_exit()s
</message>
<stderr_txt>
^
<scratch space>:2:1: note: expanded from here
__RESERVED_bool2
^
<kernel>:1031:7: error: incomplete result type 'bool2' (aka 'struct __RESERVED_bool2') in function definition
bool2 gtp(float4 a, float4 cc)
^
cl_kernel.h:56:1: note: forward declaration of 'struct __RESERVED_bool2'
__NVCL_RESERVED(bool2)
^
cl_kernel.h:54:43: note: expanded from macro '__NVCL_RESERVED'
#define __NVCL_RESERVED(x) typedef struct __RESERVED_##x x;
^
<scratch space>:2:1: note: expanded from here
__RESERVED_bool2
^
<kernel>:1041:9: error: variable has incomplete type '__attribute__((address_space(16776963))) bool2' (aka '__attribute__((address_space(16776963))) struct __RESERVED_bool2')
bool2 ret;


Claggy
ID: 1746075 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1746120 - Posted: 1 Dec 2015, 15:54:06 UTC - in response to Message 1746075.  

As i said at Beta, update your apps with the latest Lunatics installer, the rev 2737 OpenCL AP app that you running has a problem with bool2 being reserved in the latest Nvidia driver,
The latest Lunatics Installer has rev 2887 for the Nvidia OpenCL AP app, this app has been fixed for this problem, download and install the latest Lunatics Installer.

Claggy

Okay, I'll do that for Main, but how will that fix my Beta problem?

Zalster, I'll try that.

Thanks, guys.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746120 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1746637 - Posted: 3 Dec 2015, 23:17:25 UTC - in response to Message 1745989.  

<app_config>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>cuda42</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
</app_config>

I think this should work. Of course when V8 comes out, it will have to be modified for them

I hope this will prevent you from getting those OpenCl, not 100% sure so if anyone else has input I'd appreciate it.

If it turns out that it doesn't then we can talk about using an <exclude> in the form of a cc_config.xml

This app_config is set up for cuda 42. does your GPU run 32? if so then we can easily modify the above for you.

Zalster

Edit..

I changed it for cuda 42 only

What if I want it to be able to run any cuda?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746637 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1746689 - Posted: 4 Dec 2015, 1:43:38 UTC - in response to Message 1746637.  

Sorry for the delay David.

To change it to any cuda,edit the app_config.xml in notepad, copy the section between <app_version> to </app_version> and paste it before </app_config> then change the number in cuda 42 to which ever cuda you want, example cuda50 or cuda32 save and then have boinc reread the config file.
ID: 1746689 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1746706 - Posted: 4 Dec 2015, 2:51:54 UTC - in response to Message 1746689.  

Sorry for the delay David.

To change it to any cuda,edit the app_config.xml in notepad, copy the section between <app_version> to </app_version> and paste it before </app_config> then change the number in cuda 42 to which ever cuda you want, example cuda50 or cuda32 save and then have boinc reread the config file.

No problem. I was reading (and almost comprehending) the discussion of the current Lunatics.

I have the app_config with cuda32, 42, and 50 in the Beta folder, but I'm still afraid to resume those suspended opencls. Should I just abort them and see what happens with the next server contact?

---------------

As a separate but related matter, if I currently have no APs from Main, can I install Lunatics without finishing off the MB tasks in progress?

The app_config in my Main folder is just this:

<app_config>
   <app>
      <name>setiathome_v7</name>
      <gpu_versions>
          <gpu_usage>0.5</gpu_usage>
          <cpu_usage>.04</cpu_usage>
      </gpu_versions>
    </app>
</app_config>


Do I just need to duplicate the body of that with astropulse instead of setiathome? EVERY occurrence of astropulse in my current app_info.xml has the count set to .5 .

Also, I seem to remember having just a little bit of customization going (there was a thread a while back where I tried to customize it more and it promptly started crashing). Is that controlled from app_info?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746706 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1746713 - Posted: 4 Dec 2015, 3:02:28 UTC - in response to Message 1746706.  

Sorry for the delay David.

To change it to any cuda,edit the app_config.xml in notepad, copy the section between <app_version> to </app_version> and paste it before </app_config> then change the number in cuda 42 to which ever cuda you want, example cuda50 or cuda32 save and then have boinc reread the config file.

No problem. I was reading (and almost comprehending) the discussion of the current Lunatics.

I have the app_config with cuda32, 42, and 50 in the Beta folder, but I'm still afraid to resume those suspended opencls. Should I just abort them and see what happens with the next server contact?

---------------

As a separate but related matter, if I currently have no APs from Main, can I install Lunatics without finishing off the MB tasks in progress?

The app_config in my Main folder is just this:

<app_config>
   <app>
      <name>setiathome_v7</name>
      <gpu_versions>
          <gpu_usage>0.5</gpu_usage>
          <cpu_usage>.04</cpu_usage>
      </gpu_versions>
    </app>
</app_config>


Do I just need to duplicate the body of that with astropulse instead of setiathome? EVERY occurrence of astropulse in my current app_info.xml has the count set to .5 .

Also, I seem to remember having just a little bit of customization going (there was a thread a while back where I tried to customize it more and it promptly started crashing). Is that controlled from app_info?

David,

Here is my app_config.xml file originally created by Joe Segur:


<app_config>
<app>
<name>astropulse_v7</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
</app_config>

-------------------------------------------------------

With my app_config.xml, (used both on Prometheus with the GTX-750 TI SC, and Exeter with the GTX-760), both of my machines will crunch two units at a time. Either 1 MB and 1 AP, or 2 AP, or 2 MB...


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1746713 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : NVIDIA driver crashing


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.