How can I squeeze as much out of my 3 GTX 1080s as possible?

Message boards : Number crunching : How can I squeeze as much out of my 3 GTX 1080s as possible?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Anthony Chapman
Volunteer tester
Avatar

Send message
Joined: 16 Apr 08
Posts: 34
Credit: 10,262,744
RAC: 0
United States
Message 1835129 - Posted: 10 Dec 2016, 4:30:55 UTC

I am currently running stock app. I tried running Lunatics v0.45 - Beta6 but only made about a 10 second improvement per unit. I am running 1 task per GPU. I have 5 of 12 threads free.


12/9/2016 9:24:31 PM | | cc_config.xml not found - using defaults
12/9/2016 9:24:31 PM | | Starting BOINC client version 7.6.22 for windows_x86_64
12/9/2016 9:24:31 PM | | log flags: file_xfer, sched_ops, task
12/9/2016 9:24:31 PM | | Libraries: libcurl/7.45.0 OpenSSL/1.0.2d zlib/1.2.8
12/9/2016 9:24:31 PM | | Data directory: C:\ProgramData\BOINC
12/9/2016 9:24:31 PM | | Running under account tony
12/9/2016 9:24:32 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 376.19, CUDA version 8.0, compute capability 6.1, 4096MB, 3557MB available, 8876 GFLOPS peak)
12/9/2016 9:24:32 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 1080 (driver version 376.19, CUDA version 8.0, compute capability 6.1, 4096MB, 3557MB available, 8876 GFLOPS peak)
12/9/2016 9:24:32 PM | | CUDA: NVIDIA GPU 2: GeForce GTX 1080 (driver version 376.19, CUDA version 8.0, compute capability 6.1, 4096MB, 3557MB available, 8876 GFLOPS peak)
12/9/2016 9:24:32 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 376.19, device version OpenCL 1.2 CUDA, 8192MB, 3557MB available, 8876 GFLOPS peak)
12/9/2016 9:24:32 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 1080 (driver version 376.19, device version OpenCL 1.2 CUDA, 8192MB, 3557MB available, 8876 GFLOPS peak)
12/9/2016 9:24:32 PM | | OpenCL: NVIDIA GPU 2: GeForce GTX 1080 (driver version 376.19, device version OpenCL 1.2 CUDA, 8192MB, 3557MB available, 8876 GFLOPS peak)
12/9/2016 9:24:32 PM | | Host name: Prometheus
12/9/2016 9:24:32 PM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz [Family 6 Model 63 Stepping 2]
12/9/2016 9:24:32 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 smep bmi2
12/9/2016 9:24:32 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.10586.00)
12/9/2016 9:24:32 PM | | Memory: 15.90 GB physical, 18.77 GB virtual
12/9/2016 9:24:32 PM | | Disk: 223.08 GB total, 192.23 GB free
Member of ATI GPU USERS
ID: 1835129 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1835130 - Posted: 10 Dec 2016, 4:44:57 UTC - in response to Message 1835129.  

In the project folder you'll find a configuration file.
mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt

If you try
tt 1500 -hp -period_iterations_num 3 -high_perf -high_prec_timer -sbs 1024 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
you should get a boost in performance.

Setting -period_iterations_num 3 to 10 or higher will reduce screen & keyboard input/mouse lag. For a dedicated cruncher 1 is generally the best option.

Also in the project folder I run an app_config.xml file

<app_config>
    <app>
        <name>setiathome_v8</name>
        <gpu_versions>
        <gpu_usage>1.00</gpu_usage>
        <cpu_usage>1.00</cpu_usage>
        </gpu_versions>
    </app>
</app_config>


Just to reserve a CPU core per GPU WU being crunched.
Grant
Darwin NT
ID: 1835130 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1835144 - Posted: 10 Dec 2016, 7:56:40 UTC - in response to Message 1835130.  

Grant: Questions. 1) does this exact command line apply to GTX 980s, too? 2) What if I am running 2 or 3 WUs/GPU? 3) what about using -use_sleep to free up some CPU for more threads for CPU WUs?
ID: 1835144 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1835146 - Posted: 10 Dec 2016, 8:17:37 UTC - in response to Message 1835144.  
Last modified: 10 Dec 2016, 8:20:03 UTC

Grant: Questions. 1) does this exact command line apply to GTX 980s, too?

I'm running it on my GTX 750Ti & GTX 1070, so it'd be suitable for the GTX 980.
Others should be able to help out with some settings that can take greater advantage of your hardware,

2) What if I am running 2 or 3 WUs/GPU?

If you've got the CPU cores to spare, no problem, otherwise don't reserve any CPU cores for the GPU WUs. The current application may no longer require this, but when I was doing most of my fiddling it gave a significant boost in throughput.

3) what about using -use_sleep to free up some CPU for more threads for CPU WUs?

I haven't tried the -use_sleep function. I'd rather have the performance than give the system a short rest.
I get way more work from the GPU than I do from several of my CPU cores, so i'm happy to sacrifice a core per GPU WU, running one at a time. The current SoG application may perform perfectly well running multiple WUs & making use of the -use_sleep function, and running more than 1WU at a time may give more work per hour with your hardware, but with mine I found running more than 1WU only gave a very slight boost (and when you end up with 1 Arecibo & 1 Guppie on the same GPU the Arecibo WU will take forever to crunch) so I just stuck with 1 WU at a time and once I found something that was doing the job very nicely I decided there was no need to fiddle any further.
Grant
Darwin NT
ID: 1835146 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1835178 - Posted: 10 Dec 2016, 13:41:07 UTC
Last modified: 10 Dec 2016, 13:45:12 UTC

To answer the thread title - FEED THEM LOTS !!!

I would rather use -hp than -use_sleep , It looks like that computer has 8 cores, so 2 tasks x 3 = 6 cores, or 5 if you consider some tasks don't use much at all.

So what I would do it run 2x3 at high priority, and 3 CPU tasks at low priority, That should keep your CPU at 95-100% while keeping the 3 monsters fed.

EDIT: You are better off starving your CPU work than the GPUs - one card will do MUCH more work than the entire CPU.
ID: 1835178 · Report as offensive
Profile Anthony Chapman
Volunteer tester
Avatar

Send message
Joined: 16 Apr 08
Posts: 34
Credit: 10,262,744
RAC: 0
United States
Message 1835208 - Posted: 10 Dec 2016, 16:16:39 UTC

I have made the command line changes suggested by Grant (SSSF) and gave each card 2 tasks at a time as suggested by Brent Norman. I will let you all know how it goes.

One more question
How important are PCI Express lanes for crunching? I know my CPU only has 28.
Member of ATI GPU USERS
ID: 1835208 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1835226 - Posted: 10 Dec 2016, 17:19:54 UTC - in response to Message 1835208.  

Not that important here at Seti as the apps are specialized. If this was Einstein or another project then the PICe lanes become more of an issue
ID: 1835226 · Report as offensive

Message boards : Number crunching : How can I squeeze as much out of my 3 GTX 1080s as possible?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.