How can I squeeze as much out of my 3 GTX 1080s as possible?

Author	Message
Anthony Chapman Volunteer tester Send message Joined: 16 Apr 08 Posts: 34 Credit: 10,262,744 RAC: 0	Message 1835129 - Posted: 10 Dec 2016, 4:30:55 UTC I am currently running stock app. I tried running Lunatics v0.45 - Beta6 but only made about a 10 second improvement per unit. I am running 1 task per GPU. I have 5 of 12 threads free. 12/9/2016 9:24:31 PM \| \| cc_config.xml not found - using defaults 12/9/2016 9:24:31 PM \| \| Starting BOINC client version 7.6.22 for windows_x86_64 12/9/2016 9:24:31 PM \| \| log flags: file_xfer, sched_ops, task 12/9/2016 9:24:31 PM \| \| Libraries: libcurl/7.45.0 OpenSSL/1.0.2d zlib/1.2.8 12/9/2016 9:24:31 PM \| \| Data directory: C:\ProgramData\BOINC 12/9/2016 9:24:31 PM \| \| Running under account tony 12/9/2016 9:24:32 PM \| \| CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 376.19, CUDA version 8.0, compute capability 6.1, 4096MB, 3557MB available, 8876 GFLOPS peak) 12/9/2016 9:24:32 PM \| \| CUDA: NVIDIA GPU 1: GeForce GTX 1080 (driver version 376.19, CUDA version 8.0, compute capability 6.1, 4096MB, 3557MB available, 8876 GFLOPS peak) 12/9/2016 9:24:32 PM \| \| CUDA: NVIDIA GPU 2: GeForce GTX 1080 (driver version 376.19, CUDA version 8.0, compute capability 6.1, 4096MB, 3557MB available, 8876 GFLOPS peak) 12/9/2016 9:24:32 PM \| \| OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 376.19, device version OpenCL 1.2 CUDA, 8192MB, 3557MB available, 8876 GFLOPS peak) 12/9/2016 9:24:32 PM \| \| OpenCL: NVIDIA GPU 1: GeForce GTX 1080 (driver version 376.19, device version OpenCL 1.2 CUDA, 8192MB, 3557MB available, 8876 GFLOPS peak) 12/9/2016 9:24:32 PM \| \| OpenCL: NVIDIA GPU 2: GeForce GTX 1080 (driver version 376.19, device version OpenCL 1.2 CUDA, 8192MB, 3557MB available, 8876 GFLOPS peak) 12/9/2016 9:24:32 PM \| \| Host name: Prometheus 12/9/2016 9:24:32 PM \| \| Processor: 12 GenuineIntel Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz [Family 6 Model 63 Stepping 2] 12/9/2016 9:24:32 PM \| \| Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 smep bmi2 12/9/2016 9:24:32 PM \| \| OS: Microsoft Windows 10: Professional x64 Edition, (10.00.10586.00) 12/9/2016 9:24:32 PM \| \| Memory: 15.90 GB physical, 18.77 GB virtual 12/9/2016 9:24:32 PM \| \| Disk: 223.08 GB total, 192.23 GB free Member of ATI GPU USERS ID: 1835129 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 1835130 - Posted: 10 Dec 2016, 4:44:57 UTC - in response to Message 1835129. In the project folder you'll find a configuration file. mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt If you try tt 1500 -hp -period_iterations_num 3 -high_perf -high_prec_timer -sbs 1024 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 you should get a boost in performance. Setting -period_iterations_num 3 to 10 or higher will reduce screen & keyboard input/mouse lag. For a dedicated cruncher 1 is generally the best option. Also in the project folder I run an app_config.xml file <app_config> <app> <name>setiathome_v8</name> <gpu_versions> <gpu_usage>1.00</gpu_usage> <cpu_usage>1.00</cpu_usage> </gpu_versions> </app> </app_config> Just to reserve a CPU core per GPU WU being crunched. Grant Darwin NT ID: 1835130 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1835144 - Posted: 10 Dec 2016, 7:56:40 UTC - in response to Message 1835130. Grant: Questions. 1) does this exact command line apply to GTX 980s, too? 2) What if I am running 2 or 3 WUs/GPU? 3) what about using -use_sleep to free up some CPU for more threads for CPU WUs? ID: 1835144 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 1835146 - Posted: 10 Dec 2016, 8:17:37 UTC - in response to Message 1835144. Last modified: 10 Dec 2016, 8:20:03 UTC Grant: Questions. 1) does this exact command line apply to GTX 980s, too? I'm running it on my GTX 750Ti & GTX 1070, so it'd be suitable for the GTX 980. Others should be able to help out with some settings that can take greater advantage of your hardware, 2) What if I am running 2 or 3 WUs/GPU? If you've got the CPU cores to spare, no problem, otherwise don't reserve any CPU cores for the GPU WUs. The current application may no longer require this, but when I was doing most of my fiddling it gave a significant boost in throughput. 3) what about using -use_sleep to free up some CPU for more threads for CPU WUs? I haven't tried the -use_sleep function. I'd rather have the performance than give the system a short rest. I get way more work from the GPU than I do from several of my CPU cores, so i'm happy to sacrifice a core per GPU WU, running one at a time. The current SoG application may perform perfectly well running multiple WUs & making use of the -use_sleep function, and running more than 1WU at a time may give more work per hour with your hardware, but with mine I found running more than 1WU only gave a very slight boost (and when you end up with 1 Arecibo & 1 Guppie on the same GPU the Arecibo WU will take forever to crunch) so I just stuck with 1 WU at a time and once I found something that was doing the job very nicely I decided there was no need to fiddle any further. Grant Darwin NT ID: 1835146 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1835178 - Posted: 10 Dec 2016, 13:41:07 UTC Last modified: 10 Dec 2016, 13:45:12 UTC To answer the thread title - FEED THEM LOTS !!! I would rather use -hp than -use_sleep , It looks like that computer has 8 cores, so 2 tasks x 3 = 6 cores, or 5 if you consider some tasks don't use much at all. So what I would do it run 2x3 at high priority, and 3 CPU tasks at low priority, That should keep your CPU at 95-100% while keeping the 3 monsters fed. EDIT: You are better off starving your CPU work than the GPUs - one card will do MUCH more work than the entire CPU. ID: 1835178 ·

Anthony Chapman Volunteer tester Send message Joined: 16 Apr 08 Posts: 34 Credit: 10,262,744 RAC: 0	Message 1835208 - Posted: 10 Dec 2016, 16:16:39 UTC I have made the command line changes suggested by Grant (SSSF) and gave each card 2 tasks at a time as suggested by Brent Norman. I will let you all know how it goes. One more question How important are PCI Express lanes for crunching? I know my CPU only has 28. Member of ATI GPU USERS ID: 1835208 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1835226 - Posted: 10 Dec 2016, 17:19:54 UTC - in response to Message 1835208. Not that important here at Seti as the apps are specialized. If this was Einstein or another project then the PICe lanes become more of an issue ID: 1835226 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.