Advice on system optimization needed.

Message boards : Number crunching : Advice on system optimization needed.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 8135
Credit: 497,896,153
RAC: 385,762
Panama
Message 2007041 - Posted: 12 Aug 2019, 21:55:58 UTC - in response to Message 2007039.  
Last modified: 12 Aug 2019, 21:56:17 UTC


Still too long, on my 2070 a WU is crunched on about 1 minute.

Down the CPU usage to 80% just to see if it changes.

And can you post you app_config.html file?



Where is the app_config.html file? I couldn't find it. I do have an app_config,.h and app_config.cpp. Neither one looks like it has info we might be looking for.[/quote]


Sorry my mistake is app_config.xml is located on the same directory of the app_info.xml this is mine

<app_config
 <app_version>
    <app_name>setiathome_v8</app_name>
    <plan_class>cuda90</plan_class>
    <avg_ncpus>1.0</avg_ncpus>
    <ngpus>1.0</ngpus>
    <cmdline>-pfb 32</cmdline>
 </app_version>
 <app_version>
    <app_name>astropulse_v7</app_name>
    <plan_class>opencl_nvidia_100</plan_class>
    <avg_ncpus>1.0</avg_ncpus>
    <ngpus>1.0</ngpus>
    <cmdline>-use_sleep -unroll 15 -sbs 256 -ffa_block 12288 -ffa_block_fetch 6144</cmdline>
 </app_version>
</app_config>


I agree with Keith your card is too hot, set the fans to 100% and redo the test
ID: 2007041 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9893
Credit: 933,883,718
RAC: 1,491,389
United States
Message 2007042 - Posted: 12 Aug 2019, 21:56:40 UTC

Copy this script and run it in Terminal before you start BOINC. It will turn your fans up to 100% and also recover some of the lost memory clock that Nvidia inflicts on their cards when the driver detects a compute load.

#!/bin/bash

/usr/bin/nvidia-smi -pm 1

/usr/bin/nvidia-smi -acp UNRESTRICTED

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=100"
/usr/bin/nvidia-settings -a "[fan:1]/GPUTargetFanSpeed=100"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=800" -a "[gpu:0]/GPUGraphicsClockOffset[4]=60"


I am assuming you have enabled coolbits on the system beforehand to enable clock and fan control.

If not you need to run coolbits tweak and then reboot.

sudo nvidia-xconfig --thermal-configuration-check --cool-bits=28 --enable-all-gpus

Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2007042 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9893
Credit: 933,883,718
RAC: 1,491,389
United States
Message 2007043 - Posted: 12 Aug 2019, 21:59:05 UTC - in response to Message 2007041.  
Last modified: 12 Aug 2019, 22:00:12 UTC


Still too long, on my 2070 a WU is crunched on about 1 minute.

Down the CPU usage to 80% just to see if it changes.

And can you post you app_config.html file?



Where is the app_config.html file? I couldn't find it. I do have an app_config,.h and app_config.cpp. Neither one looks like it has info we might be looking for.



Sorry my mistake is app_config.xml is located on the same directory of the app_info.xml this is mine

<app_config
 <app_version>
    <app_name>setiathome_v8</app_name>
    <plan_class>cuda90</plan_class>
    <avg_ncpus>1.0</avg_ncpus>
    <ngpus>1.0</ngpus>
    <cmdline>-pfb 32</cmdline>
 </app_version>
 <app_version>
    <app_name>astropulse_v7</app_name>
    <plan_class>opencl_nvidia_100</plan_class>
    <avg_ncpus>1.0</avg_ncpus>
    <ngpus>1.0</ngpus>
    <cmdline>-use_sleep -unroll 15 -sbs 256 -ffa_block 12288 -ffa_block_fetch 6144</cmdline>
 </app_version>
</app_config>


I agree with Keith your card is too hot, set the fans to 100% and redo the test


If he adds the -nobs to the app_info he really doesn't need to write an app_config. First get the temps under control. Then try the -nobs command line parameter. Finally, get the memory clocks back to normal.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2007043 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5200
Credit: 433,121,197
RAC: 140,948
United States
Message 2007044 - Posted: 12 Aug 2019, 21:59:09 UTC - in response to Message 2007039.  

You can use Juan's app_config or you can modify your app_info as I did for yours below

Where is the app_config.html file? I couldn't find it. I do have an app_config,.h and app_config.cpp. Neither one looks like it has info we might be looking for.

I have app_info.xml. Looks more useful. Here it is.

<app_info>
  <app>
     <name>setiathome_v8</name>
  </app>
    <file_info>
      <name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>801</version_num>
      <plan_class>cuda90</plan_class>
      <cmdline>-nobs</cmdline>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <avg_ncpus>1</avg_ncpus>
      <max_ncpus>1</max_ncpus>
      <file_ref>
         <file_name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</file_name>
          <main_program/>
      </file_ref>
    </app_version>
  <app>
     <name>astropulse_v7</name>
  </app>
     <file_info>
       <name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name>
        <executable/>
     </file_info>
     <file_info>
       <name>AstroPulse_Kernels_r2751.cl</name>
     </file_info>
     <file_info>
       <name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name>
     </file_info>
    <app_version>
      <app_name>astropulse_v7</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>708</version_num>
      <plan_class>opencl_nvidia_100</plan_class>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <avg_ncpus>1</avg_ncpus>
      <max_ncpus>1</max_ncpus>
      <file_ref>
         <file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name>
          <main_program/>
      </file_ref>
      <file_ref>
         <file_name>AstroPulse_Kernels_r2751.cl</file_name>
      </file_ref>
      <file_ref>
         <file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name>
         <open_name>ap_cmdline.txt</open_name>
      </file_ref>
    </app_version>
   <app>
      <name>setiathome_v8</name>
   </app>
      <file_info>
         <name>MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu</name>
         <executable/>
      </file_info>
     <app_version>
     <app_name>setiathome_v8</app_name>
     <platform>x86_64-pc-linux-gnu</platform>
     <version_num>800</version_num>   
      <file_ref>
        <file_name>MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu</file_name>
        <main_program/>
      </file_ref>
    </app_version>
   <app>
      <name>astropulse_v7</name>
   </app>
     <file_info>
       <name>ap_7.05r2728_sse3_linux64</name>
        <executable/>
     </file_info>
    <app_version>
       <app_name>astropulse_v7</app_name>
       <version_num>704</version_num>
       <platform>x86_64-pc-linux-gnu</platform>
       <plan_class></plan_class>
       <file_ref>
         <file_name>ap_7.05r2728_sse3_linux64</file_name>
          <main_program/>
       </file_ref>
    </app_version>
</app_info>


Eric

ID: 2007044 · Report as offensive     Reply Quote
Profile Eric Claussen

Send message
Joined: 31 Jan 00
Posts: 22
Credit: 2,318,690
RAC: 1,126
United States
Message 2007050 - Posted: 12 Aug 2019, 22:58:54 UTC

I'll work on getting the temps down. Got the fans running 100% and that knocked it down some. The computer is a Dell Precision T7610. It has lots of fans and ducting. When it gets really warm out it sounds like a jet engine. Everything is clean. When it cools off this evening I'll know for sure.

Thanks a ton for everyones help. It's already a MUCH faster than it was.

Eric
ID: 2007050 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9893
Credit: 933,883,718
RAC: 1,491,389
United States
Message 2007054 - Posted: 12 Aug 2019, 23:07:37 UTC - in response to Message 2007050.  

I'll work on getting the temps down. Got the fans running 100% and that knocked it down some. The computer is a Dell Precision T7610. It has lots of fans and ducting. When it gets really warm out it sounds like a jet engine. Everything is clean. When it cools off this evening I'll know for sure.

Thanks a ton for everyones help. It's already a MUCH faster than it was.

Eric

Your more recent are MUCH better. Still a ways to go though for where a 2080 can run. Try the CUDA101 app and the -nobs parameter. The fans certainly helped.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2007054 · Report as offensive     Reply Quote
Profile Eric Claussen

Send message
Joined: 31 Jan 00
Posts: 22
Credit: 2,318,690
RAC: 1,126
United States
Message 2007061 - Posted: 12 Aug 2019, 23:24:42 UTC - in response to Message 2007050.  

One quick question. After changing the config file do I have to reload, or can I just select "read config files" from the menu?

Eric
ID: 2007061 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9893
Credit: 933,883,718
RAC: 1,491,389
United States
Message 2007062 - Posted: 12 Aug 2019, 23:29:00 UTC - in response to Message 2007061.  

If you are referring to the standard cc_config.xml or the app_config.xml file, then yes, do a re-read of config files in the Manager. The only file that needs a shutdown and restart to read is the app_info.xml file.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2007062 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9893
Credit: 933,883,718
RAC: 1,491,389
United States
Message 2007073 - Posted: 13 Aug 2019, 0:36:06 UTC - in response to Message 2007061.  

One quick question. After changing the config file do I have to reload, or can I just select "read config files" from the menu?

Eric

I see 200 abandoned error task now. A goof or typo while editing app_info.xml?
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2007073 · Report as offensive     Reply Quote
Profile Eric Claussen

Send message
Joined: 31 Jan 00
Posts: 22
Credit: 2,318,690
RAC: 1,126
United States
Message 2007076 - Posted: 13 Aug 2019, 1:08:21 UTC - in response to Message 2007073.  

One quick question. After changing the config file do I have to reload, or can I just select "read config files" from the menu?

Eric

I see 200 abandoned error task now. A goof or typo while editing app_info.xml?


haha, how did you guess? I was trying to go to Cuda101 and missed something. Even after going back to the old config file it wouldn't use the GPU.

Eric
ID: 2007076 · Report as offensive     Reply Quote
Profile Eric Claussen

Send message
Joined: 31 Jan 00
Posts: 22
Credit: 2,318,690
RAC: 1,126
United States
Message 2007077 - Posted: 13 Aug 2019, 1:11:47 UTC

Pretty consistently 51 seconds as long as I don't use the computer. Temps much better. Moved the PC out into the open. This is setup in my garage. Not much to be done about ambient temps.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:03:00.0 On | N/A |
|100% 73C P2 215W / 225W | 2364MiB / 7981MiB | 96% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1554 G /usr/lib/xorg/Xorg 396MiB |
| 0 2105 G cinnamon 92MiB |
| 0 2739 G ...uest-channel-token=15644256298604367101 519MiB |
| 0 13260 C ...x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90 1343MiB |
+-----------------------------------------------------------------------------+
ID: 2007077 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9893
Credit: 933,883,718
RAC: 1,491,389
United States
Message 2007081 - Posted: 13 Aug 2019, 1:36:41 UTC - in response to Message 2007076.  

I find the easiest way and least prone to errors of editing app_info is to use the Find and Replace function of the Text Editor. As long as you copy only the text you are trying to replace and don't grab any white space in front or back of the text, it works every time. Another good thing to do for a "sanity check" is to open any XML file with a browser and it will flag any syntax errors that would be caused by missing or dropping or adding a tag delimiter.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2007081 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9893
Credit: 933,883,718
RAC: 1,491,389
United States
Message 2007087 - Posted: 13 Aug 2019, 1:45:05 UTC
Last modified: 13 Aug 2019, 1:45:25 UTC

Well 73°C. is ten degrees better than 83° C. And you still could get more out of the card if you added the -nobs parameter to the command line statement in either a app_conf.xml or the app_info.xml.

Also the 2080 would benefit from the CUDA101 application instead of the CUDA90 application. Open up the project directory and the Text Editor open on app_info and the Find and Replace function from the menu. Right click the CUDA90 application and pull up its Properties. The Basic tab will come up with the name of the app already highlighted, right-click the name and copy and then paste into the Find field, then go back the project folder and find the CUDA101 application. Same thing, Properties, copy name and paste into the Replace field in the Editor. Replace all. Save. Done deal and no errors.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2007087 · Report as offensive     Reply Quote
Profile Eric Claussen

Send message
Joined: 31 Jan 00
Posts: 22
Credit: 2,318,690
RAC: 1,126
United States
Message 2007098 - Posted: 13 Aug 2019, 2:31:02 UTC - in response to Message 2007087.  
Last modified: 13 Aug 2019, 2:36:49 UTC

Well 73°C. is ten degrees better than 83° C. And you still could get more out of the card if you added the -nobs parameter to the command line statement in either a app_conf.xml or the app_info.xml.

Also the 2080 would benefit from the CUDA101 application instead of the CUDA90 application. Open up the project directory and the Text Editor open on app_info and the Find and Replace function from the menu. Right click the CUDA90 application and pull up its Properties. The Basic tab will come up with the name of the app already highlighted, right-click the name and copy and then paste into the Find field, then go back the project folder and find the CUDA101 application. Same thing, Properties, copy name and paste into the Replace field in the Editor. Replace all. Save. Done deal and no errors.


I turned on -nobs. There is a difference for sure. Knocks 10 secs of time and increases the average power dissipation of the gpu.

I'm going to give CUDA101 a shot again. I have replaced the instances using the tool. Do I also need to change the <plan_class>cuda90</plan_class> line to cuda101?

Also, I am still using .1 cpu per GPU. Is this adequate?

Eric
ID: 2007098 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9893
Credit: 933,883,718
RAC: 1,491,389
United States
Message 2007119 - Posted: 13 Aug 2019, 3:42:32 UTC - in response to Message 2007098.  

Well 73°C. is ten degrees better than 83° C. And you still could get more out of the card if you added the -nobs parameter to the command line statement in either a app_conf.xml or the app_info.xml.

Also the 2080 would benefit from the CUDA101 application instead of the CUDA90 application. Open up the project directory and the Text Editor open on app_info and the Find and Replace function from the menu. Right click the CUDA90 application and pull up its Properties. The Basic tab will come up with the name of the app already highlighted, right-click the name and copy and then paste into the Find field, then go back the project folder and find the CUDA101 application. Same thing, Properties, copy name and paste into the Replace field in the Editor. Replace all. Save. Done deal and no errors.


I turned on -nobs. There is a difference for sure. Knocks 10 secs of time and increases the average power dissipation of the gpu.

I'm going to give CUDA101 a shot again. I have replaced the instances using the tool. Do I also need to change the <plan_class>cuda90</plan_class> line to cuda101?

Also, I am still using .1 cpu per GPU. Is this adequate?

Eric

Do I also need to change the <plan_class>cuda90</plan_class> line to cuda101?

Absolutely NOT

That would dump all your work. Just leave the <plan_class>cuda90</plan_class> alone.

Also, I am still using .1 cpu per GPU. Is this adequate?

No not my opinion. I would set cpu usage for 1.0
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2007119 · Report as offensive     Reply Quote
Profile Eric Claussen

Send message
Joined: 31 Jan 00
Posts: 22
Credit: 2,318,690
RAC: 1,126
United States
Message 2007127 - Posted: 13 Aug 2019, 4:00:27 UTC - in response to Message 2007119.  

That is what screwed me the first time. I changed the plan class as well. OK. I'm going to save this file and see how it does,

Thanks again for all your help.

Eric

Well 73°C. is ten degrees better than 83° C. And you still could get more out of the card if you added the -nobs parameter to the command line statement in either a app_conf.xml or the app_info.xml.

Also the 2080 would benefit from the CUDA101 application instead of the CUDA90 application. Open up the project directory and the Text Editor open on app_info and the Find and Replace function from the menu. Right click the CUDA90 application and pull up its Properties. The Basic tab will come up with the name of the app already highlighted, right-click the name and copy and then paste into the Find field, then go back the project folder and find the CUDA101 application. Same thing, Properties, copy name and paste into the Replace field in the Editor. Replace all. Save. Done deal and no errors.


I turned on -nobs. There is a difference for sure. Knocks 10 secs of time and increases the average power dissipation of the gpu.

I'm going to give CUDA101 a shot again. I have replaced the instances using the tool. Do I also need to change the <plan_class>cuda90</plan_class> line to cuda101?

Also, I am still using .1 cpu per GPU. Is this adequate?

Eric

Do I also need to change the <plan_class>cuda90</plan_class> line to cuda101?

Absolutely NOT

That would dump all your work. Just leave the <plan_class>cuda90</plan_class> alone.

Also, I am still using .1 cpu per GPU. Is this adequate?

No not my opinion. I would set cpu usage for 1.0

ID: 2007127 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9893
Credit: 933,883,718
RAC: 1,491,389
United States
Message 2007135 - Posted: 13 Aug 2019, 5:33:58 UTC - in response to Message 2007127.  

That is what screwed me the first time. I changed the plan class as well. OK. I'm going to save this file and see how it does,

Thanks again for all your help.

Eric

Looking a lot better and more normal of what to expect from a 2080. Think you have it figured it out now and need to just let it run for the RAC to stabilize.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2007135 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11659
Credit: 174,313,579
RAC: 118,320
Australia
Message 2007138 - Posted: 13 Aug 2019, 7:44:57 UTC - in response to Message 2007033.  
Last modified: 13 Aug 2019, 7:47:01 UTC

Already sorted.
Grant
Darwin NT
ID: 2007138 · Report as offensive     Reply Quote
Profile Eric Claussen

Send message
Joined: 31 Jan 00
Posts: 22
Credit: 2,318,690
RAC: 1,126
United States
Message 2007174 - Posted: 13 Aug 2019, 19:08:24 UTC

Thanks again for all the help.

Eric
ID: 2007174 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4868
Credit: 593,109,518
RAC: 1,380,936
United States
Message 2007175 - Posted: 13 Aug 2019, 19:11:55 UTC
Last modified: 13 Aug 2019, 19:14:07 UTC

For anyone else out there...
There is a README in the All-In-One that covers most of what is in this thread. It Helps if you Read the Manual.
As for assigning One CPU per GPU, that Will Not Work on some machines, the Default setting Will Work.
All you are doing by assigning a full CPU to a GPU is telling BOINC to Not Start a task unless there is One full CPU available. That BOINC setting has absolutely No Control over how much CPU the App uses. The App nobs setting controls how much CPU the App uses, and nobs doesn't require a Full CPU to start a task.
ID: 2007175 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Number crunching : Advice on system optimization needed.


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.