NVIDIA driver crashing

留言板 : Number crunching : NVIDIA driver crashing
留言板合理

To post messages, you must log in.

1 · 2 · 后

作者消息
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1748287 - 发表于:11 Dec 2015, 3:00:21 UTC

Well, actually, while I wasn't paying attention, it's done nearly 100 tasks in every configuration it's capable of, including a lot of opencls and some v8s. Looks like Claggy was right and updating my Lunatics apps for Main fixed my Beta problem.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1748287 · 举报违规帖子
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1748267 - 发表于:11 Dec 2015, 0:22:09 UTC

Status update:

For some reason, it was refusing to ask Beta for work even after I suspended Main and Einstein. I disabled the app_config and then went back to the site and discovered that I had somehow unchecked CPU and all three kinds of GPU. I checked them all and temporarily bumped up the resource share, and it started asking. That's when Beta started saying no tasks available. I posted about it there and Eric did something. Then I got some CPUs and some cuda42s (odd, since it'd had 50s before). Anyway, at some point, it also downloaded some opencls for both MB and AP. Last I looked, it was running an AP opencl_nvidia_100, about half done with no problem.

I'll let you know when it gets to an MB opencl.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1748267 · 举报违规帖子
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1747334 - 发表于:6 Dec 2015, 23:46:42 UTC - 回复消息 1747003.  

Okay, it's been running with the new Lunatics for a few hours and everything seems fine so far. It's currently running a pair of MBs on the GPU, still saying 0.04 CPUs even with the app_config saying .5. Running 7 on the CPU too, just like it should.

The test will be if it downloads a new opencl from Beta. Just have to wait and see.

The Units that were already DL'd and being worked on BEFORE Lunatics was installed will ALL still say .04 CPU; however, IF you had BOINC Reread the Config Files before Resuming tasks, then they are actually being crunched at .5... Especially if you are seeing that 2 GPU Units are crunching at the same time; then your new app_config.xml is working, and Lunatics, also, is working. Once you receive new Units in your cache, you will see the appropriate numbers. (Which I hope are .5 CPU and .5 GPU to yield 2 Units crunching at a time on the GPU...)


TL

It has caught up and is now crunching 2 at .2 CPUs and .5 GPUs, and 7 on the CPU. Just as I want.

On the Beta front, it finished the cudas it had and hasn't downloaded anything else. The Beta app_config reads

<app_config>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>cuda42</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>cuda32</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>cuda50</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
</app_config>

Is that causing it not to ask for CPU tasks?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1747334 · 举报违规帖子
Profile TimeLord04
志愿者测试人员
Avatar

发送消息
已加入:9 Mar 06
贴子:20362
积分:33,933,039
近期平均积分:23
United States
消息 1747003 - 发表于:5 Dec 2015, 7:25:11 UTC - 回复消息 1746984.  

Okay, it's been running with the new Lunatics for a few hours and everything seems fine so far. It's currently running a pair of MBs on the GPU, still saying 0.04 CPUs even with the app_config saying .5. Running 7 on the CPU too, just like it should.

The test will be if it downloads a new opencl from Beta. Just have to wait and see.

The Units that were already DL'd and being worked on BEFORE Lunatics was installed will ALL still say .04 CPU; however, IF you had BOINC Reread the Config Files before Resuming tasks, then they are actually being crunched at .5... Especially if you are seeing that 2 GPU Units are crunching at the same time; then your new app_config.xml is working, and Lunatics, also, is working. Once you receive new Units in your cache, you will see the appropriate numbers. (Which I hope are .5 CPU and .5 GPU to yield 2 Units crunching at a time on the GPU...)


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1747003 · 举报违规帖子
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1746984 - 发表于:5 Dec 2015, 3:30:42 UTC

Okay, it's been running with the new Lunatics for a few hours and everything seems fine so far. It's currently running a pair of MBs on the GPU, still saying 0.04 CPUs even with the app_config saying .5. Running 7 on the CPU too, just like it should.

The test will be if it downloads a new opencl from Beta. Just have to wait and see.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746984 · 举报违规帖子
Profile Zalster Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:27 May 99
贴子:5445
积分:528,817,460
近期平均积分:242
United States
消息 1746929 - 发表于:5 Dec 2015, 0:07:47 UTC - 回复消息 1746926.  

There shouldn't be any problem with the current work in progress.
ID: 1746929 · 举报违规帖子
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1746926 - 发表于:4 Dec 2015, 23:58:26 UTC - 回复消息 1746902.  

You might need to increase the CPU ration to 0.5 from the 0.2 I have listed, that way with 2 GPU task running, it will keep 1 core for the GPU and allow 7 on the CPU.

Otherwise it might try to run 8 CPU task.

If you have any problems let us know

I'll give it a try.

Asking again, will tasks in progress have a problem with having the new Lunatics installed? There are currently 7 running on the CPU and 2 on the GPU, all MBs. I have no APs on board.

Thanks for all your help.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746926 · 举报违规帖子
Profile Zalster Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:27 May 99
贴子:5445
积分:528,817,460
近期平均积分:242
United States
消息 1746902 - 发表于:4 Dec 2015, 21:44:25 UTC - 回复消息 1746898.  

You might need to increase the CPU ration to 0.5 from the 0.2 I have listed, that way with 2 GPU task running, it will keep 1 core for the GPU and allow 7 on the CPU.

Otherwise it might try to run 8 CPU task.

If you have any problems let us know
ID: 1746902 · 举报违规帖子
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1746898 - 发表于:4 Dec 2015, 21:20:33 UTC - 回复消息 1746726.  

David,


Regarding Beta site

Couple of things. what cuda are you running currently on Seti?

I only ask as I don't know what the 440 usually runs

The app_config will be different in cuda for the 440 and the 630

As far as the OpenCl, I would just abort them. I saw the discussion going on as to what might be causing the problem. I would just not crunch them for now.

This entire discussion has been about my i7 with the 440. I do not run Beta on the 630.

The 440 runs cuda42 on Main. I don't remember anymore if that was what the server concluded was best before I started doing Lunatics, or if I did a bit of informal observation and picked that myself. On Beta, I just let it do what it wants, and it appears the server has settled on 50 being the best. <shrug>

________


As far as main, which computer is this app_config going into?


<app_config>
<app>
<name>astropulse_v7</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.20</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.20</cpu_usage>
</gpu_versions>
</app>
</app_config>

Without running any CPU task this should allow for 2 work units each on the GPU in both machines

If you want to run CPU task on that 8 core machine, let me know I will modify this to put a <project_max_concurrent> in to allow the CPU to crunch as well.

I would not recommend any CPU task on your Dual core.

Like I said above, this thread is entirely about the i7/440. (The box with the 630 does not crunch on the CPU because it has another function I consider more important.) I currently have the i7 crunching on 7 cores to keep 1 free for the GPU. This works fine for me and has for quite a while.

I downloaded the new Lunatics. I just want to know what I need to adjust while/after I install it to stay at the status quo, that being 7 cores of CPU and 2 at a time of anything from Main or Einstein on the GPU. (Actually, once I get the Beta problem straightened out, I wouldn't mind letting it do 2 at a time also, but first things first.)
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746898 · 举报违规帖子
Profile Zalster Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:27 May 99
贴子:5445
积分:528,817,460
近期平均积分:242
United States
消息 1746726 - 发表于:4 Dec 2015, 3:32:27 UTC - 回复消息 1746724.  
最近的修改日期:4 Dec 2015, 3:46:33 UTC

David,


Regarding Beta site

Couple of things. what cuda are you running currently on Seti?

I only ask as I don't know what the 440 usually runs

The app_config will be different in cuda for the 440 and the 630

As far as the OpenCl, I would just abort them. I saw the discussion going on as to what might be causing the problem. I would just not crunch them for now.

________


As far as main, which computer is this app_config going into?


<app_config>
<app>
<name>astropulse_v7</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.20</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.20</cpu_usage>
</gpu_versions>
</app>
</app_config>

Without running any CPU task this should allow for 2 work units each on the GPU in both machines

If you want to run CPU task on that 8 core machine, let me know I will modify this to put a <project_max_concurrent> in to allow the CPU to crunch as well.

I would not recommend any CPU task on your Dual core.
ID: 1746726 · 举报违规帖子
Profile TimeLord04
志愿者测试人员
Avatar

发送消息
已加入:9 Mar 06
贴子:20362
积分:33,933,039
近期平均积分:23
United States
消息 1746724 - 发表于:4 Dec 2015, 3:12:15 UTC - 回复消息 1746720.  

Doesn't that affect what gets done on your CPU? That's the other thing I just remembered. Somewhere, I have a setting that limits the CPU to 7 tasks at a time to keep 1 core free for the GPU. I suppose that's a Boinc thing, though, not project specific...?


I don't use CPU to crunch; so, no, no effect on my CPU; only 1 core of my CPU feeds the GPU on each system.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1746724 · 举报违规帖子
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1746720 - 发表于:4 Dec 2015, 3:07:18 UTC - 回复消息 1746713.  
最近的修改日期:4 Dec 2015, 3:08:46 UTC


---------------

As a separate but related matter, if I currently have no APs from Main, can I install Lunatics without finishing off the MB tasks in progress?

The app_config in my Main folder is just this:

<app_config>
   <app>
      <name>setiathome_v7</name>
      <gpu_versions>
          <gpu_usage>0.5</gpu_usage>
          <cpu_usage>.04</cpu_usage>
      </gpu_versions>
    </app>
</app_config>


Do I just need to duplicate the body of that with astropulse instead of setiathome? EVERY occurrence of astropulse in my current app_info.xml has the count set to .5 .

Also, I seem to remember having just a little bit of customization going (there was a thread a while back where I tried to customize it more and it promptly started crashing). Is that controlled from app_info?

David,

Here is my app_config.xml file originally created by Joe Segur:


<app_config>
<app>
<name>astropulse_v7</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
</app_config>

-------------------------------------------------------

With my app_config.xml, (used both on Prometheus with the GTX-750 TI SC, and Exeter with the GTX-760), both of my machines will crunch two units at a time. Either 1 MB and 1 AP, or 2 AP, or 2 MB...


TL

Doesn't that affect what gets done on your CPU? That's the other thing I just remembered. Somewhere, I have a setting that limits the CPU to 7 tasks at a time to keep 1 core free for the GPU. I suppose that's a Boinc thing, though, not project specific...?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746720 · 举报违规帖子
Profile TimeLord04
志愿者测试人员
Avatar

发送消息
已加入:9 Mar 06
贴子:20362
积分:33,933,039
近期平均积分:23
United States
消息 1746713 - 发表于:4 Dec 2015, 3:02:28 UTC - 回复消息 1746706.  

Sorry for the delay David.

To change it to any cuda,edit the app_config.xml in notepad, copy the section between <app_version> to </app_version> and paste it before </app_config> then change the number in cuda 42 to which ever cuda you want, example cuda50 or cuda32 save and then have boinc reread the config file.

No problem. I was reading (and almost comprehending) the discussion of the current Lunatics.

I have the app_config with cuda32, 42, and 50 in the Beta folder, but I'm still afraid to resume those suspended opencls. Should I just abort them and see what happens with the next server contact?

---------------

As a separate but related matter, if I currently have no APs from Main, can I install Lunatics without finishing off the MB tasks in progress?

The app_config in my Main folder is just this:

<app_config>
   <app>
      <name>setiathome_v7</name>
      <gpu_versions>
          <gpu_usage>0.5</gpu_usage>
          <cpu_usage>.04</cpu_usage>
      </gpu_versions>
    </app>
</app_config>


Do I just need to duplicate the body of that with astropulse instead of setiathome? EVERY occurrence of astropulse in my current app_info.xml has the count set to .5 .

Also, I seem to remember having just a little bit of customization going (there was a thread a while back where I tried to customize it more and it promptly started crashing). Is that controlled from app_info?

David,

Here is my app_config.xml file originally created by Joe Segur:


<app_config>
<app>
<name>astropulse_v7</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
</app_config>

-------------------------------------------------------

With my app_config.xml, (used both on Prometheus with the GTX-750 TI SC, and Exeter with the GTX-760), both of my machines will crunch two units at a time. Either 1 MB and 1 AP, or 2 AP, or 2 MB...


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1746713 · 举报违规帖子
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1746706 - 发表于:4 Dec 2015, 2:51:54 UTC - 回复消息 1746689.  

Sorry for the delay David.

To change it to any cuda,edit the app_config.xml in notepad, copy the section between <app_version> to </app_version> and paste it before </app_config> then change the number in cuda 42 to which ever cuda you want, example cuda50 or cuda32 save and then have boinc reread the config file.

No problem. I was reading (and almost comprehending) the discussion of the current Lunatics.

I have the app_config with cuda32, 42, and 50 in the Beta folder, but I'm still afraid to resume those suspended opencls. Should I just abort them and see what happens with the next server contact?

---------------

As a separate but related matter, if I currently have no APs from Main, can I install Lunatics without finishing off the MB tasks in progress?

The app_config in my Main folder is just this:

<app_config>
   <app>
      <name>setiathome_v7</name>
      <gpu_versions>
          <gpu_usage>0.5</gpu_usage>
          <cpu_usage>.04</cpu_usage>
      </gpu_versions>
    </app>
</app_config>


Do I just need to duplicate the body of that with astropulse instead of setiathome? EVERY occurrence of astropulse in my current app_info.xml has the count set to .5 .

Also, I seem to remember having just a little bit of customization going (there was a thread a while back where I tried to customize it more and it promptly started crashing). Is that controlled from app_info?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746706 · 举报违规帖子
Profile Zalster Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:27 May 99
贴子:5445
积分:528,817,460
近期平均积分:242
United States
消息 1746689 - 发表于:4 Dec 2015, 1:43:38 UTC - 回复消息 1746637.  

Sorry for the delay David.

To change it to any cuda,edit the app_config.xml in notepad, copy the section between <app_version> to </app_version> and paste it before </app_config> then change the number in cuda 42 to which ever cuda you want, example cuda50 or cuda32 save and then have boinc reread the config file.
ID: 1746689 · 举报违规帖子
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1746637 - 发表于:3 Dec 2015, 23:17:25 UTC - 回复消息 1745989.  

<app_config>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>cuda42</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
</app_config>

I think this should work. Of course when V8 comes out, it will have to be modified for them

I hope this will prevent you from getting those OpenCl, not 100% sure so if anyone else has input I'd appreciate it.

If it turns out that it doesn't then we can talk about using an <exclude> in the form of a cc_config.xml

This app_config is set up for cuda 42. does your GPU run 32? if so then we can easily modify the above for you.

Zalster

Edit..

I changed it for cuda 42 only

What if I want it to be able to run any cuda?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746637 · 举报违规帖子
David S
志愿者测试人员
Avatar

发送消息
已加入:4 Oct 99
贴子:18352
积分:27,761,924
近期平均积分:12
United States
消息 1746120 - 发表于:1 Dec 2015, 15:54:06 UTC - 回复消息 1746075.  

As i said at Beta, update your apps with the latest Lunatics installer, the rev 2737 OpenCL AP app that you running has a problem with bool2 being reserved in the latest Nvidia driver,
The latest Lunatics Installer has rev 2887 for the Nvidia OpenCL AP app, this app has been fixed for this problem, download and install the latest Lunatics Installer.

Claggy

Okay, I'll do that for Main, but how will that fix my Beta problem?

Zalster, I'll try that.

Thanks, guys.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1746120 · 举报违规帖子
Claggy
志愿者测试人员

发送消息
已加入:5 Jul 99
贴子:4654
积分:47,537,079
近期平均积分:4
United Kingdom
消息 1746075 - 发表于:1 Dec 2015, 10:10:08 UTC - 回复消息 1744415.  
最近的修改日期:1 Dec 2015, 10:14:01 UTC

As i said at Beta, update your apps with the latest Lunatics installer, the rev 2737 OpenCL AP app that you running has a problem with bool2 being reserved in the latest Nvidia driver,
The latest Lunatics Installer has rev 2887 for the Nvidia OpenCL AP app, this app has been fixed for this problem, download and install the latest Lunatics Installer.

http://setiathome.berkeley.edu/result.php?resultid=4536904643

Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
too many boinc_temporary_exit()s
</message>
<stderr_txt>
^
<scratch space>:2:1: note: expanded from here
__RESERVED_bool2
^
<kernel>:1031:7: error: incomplete result type 'bool2' (aka 'struct __RESERVED_bool2') in function definition
bool2 gtp(float4 a, float4 cc)
^
cl_kernel.h:56:1: note: forward declaration of 'struct __RESERVED_bool2'
__NVCL_RESERVED(bool2)
^
cl_kernel.h:54:43: note: expanded from macro '__NVCL_RESERVED'
#define __NVCL_RESERVED(x) typedef struct __RESERVED_##x x;
^
<scratch space>:2:1: note: expanded from here
__RESERVED_bool2
^
<kernel>:1041:9: error: variable has incomplete type '__attribute__((address_space(16776963))) bool2' (aka '__attribute__((address_space(16776963))) struct __RESERVED_bool2')
bool2 ret;


Claggy
ID: 1746075 · 举报违规帖子
Profile Zalster Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:27 May 99
贴子:5445
积分:528,817,460
近期平均积分:242
United States
消息 1745989 - 发表于:1 Dec 2015, 3:42:12 UTC - 回复消息 1745981.  
最近的修改日期:1 Dec 2015, 3:46:48 UTC

<app_config>
<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>cuda42</plan_class>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
</app_config>

I think this should work. Of course when V8 comes out, it will have to be modified for them

I hope this will prevent you from getting those OpenCl, not 100% sure so if anyone else has input I'd appreciate it.

If it turns out that it doesn't then we can talk about using an <exclude> in the form of a cc_config.xml

This app_config is set up for cuda 42. does your GPU run 32? if so then we can easily modify the above for you.

Zalster

Edit..

I changed it for cuda 42 only
ID: 1745989 · 举报违规帖子
Profile Zalster Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:27 May 99
贴子:5445
积分:528,817,460
近期平均积分:242
United States
消息 1745981 - 发表于:1 Dec 2015, 3:34:02 UTC - 回复消息 1745977.  
最近的修改日期:1 Dec 2015, 3:35:13 UTC

Edit..

True TL,

you could just do a app_config.xml
ID: 1745981 · 举报违规帖子
1 · 2 · 后

留言板 : Number crunching : NVIDIA driver crashing


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.