Getting the most bang for your buck from a GTX 1060

Author	Message
Jim1348 Send message Joined: 13 Dec 01 Posts: 212 Credit: 520,150 RAC: 0	Message 1871250 - Posted: 5 Jun 2017, 14:18:41 UTC - in response to Message 1871247. Last modified: 5 Jun 2017, 14:19:26 UTC I tried -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer on my GTX 1060 (6 GB), and got the same performance as using Lunatics 0.45_beta6 (Windows 7 64-bit). But it was a short test. I was running 8.20 setiathome_v8 (opencl_nvidia_SoG) work units. Should I see an improvement? If you used the same comand line no. Let me be a little more specific. I had the Lunatics already installed, as was getting a certain average time for those work units (though they vary as you know). Then, I added the -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer command line, while leaving Lunatics still installed, and I didn't see any obvious change. So my impression is that Lunatics by itself gets the job done, though I may be missing something. ID: 1871250 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1871256 - Posted: 5 Jun 2017, 15:05:50 UTC - in response to Message 1871250. If you used the same comand line no. Let me be a little more specific. I had the Lunatics already installed, as was getting a certain average time for those work units (though they vary as you know). Then, I added the -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer command line, while leaving Lunatics still installed, and I didn't see any obvious change. So my impression is that Lunatics by itself gets the job done, though I may be missing something. I am not sure, but your Gflops numbers are lower than what I am getting (yours 292.36 GFLOPS vs. mine 313.47 GFLOPS or Giggo's so you might keep your "default" settings until you have a couple of weeks in, and have some kind of stable RAC (not going up anymore) and then try the ones we have been talking about. All to often we can make generalization/statements that will hold true for most cpu/gpu combinations, but may have an exception. Tom A proud member of the OFA (Old Farts Association). ID: 1871256 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1871257 - Posted: 5 Jun 2017, 15:09:00 UTC - in response to Message 1871250. From NO command line to using one, you should see a difference. Did the options used appear in the sttderr output of new tasks? ID: 1871257 ·

Jim1348 Send message Joined: 13 Dec 01 Posts: 212 Credit: 520,150 RAC: 0	Message 1871265 - Posted: 5 Jun 2017, 15:25:41 UTC - in response to Message 1871257. From NO command line to using one, you should see a difference. Did the options used appear in the sttderr output of new tasks? I think so. <core_client_version>7.6.33</core_client_version> <![CDATA[ <stderr_txt> Maximum single buffer size set to:1024MB Number of period iterations for PulseFind set to:1 Target kernel sequence time set to 1500ms High-performance path selected. If GUI lags occur consider to remove -high_perf option from tuning line System timer will be set in high resolution mode Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: Intel(R) Corporation OpenCL platform detected: NVIDIA Corporation BOINC assigns device 0 Info: BOINC provided OpenCL device ID used Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW USE_SSE3 x86 CPUID: Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz Maybe I should just let it run for a couple of days each way to see if there is any difference. ID: 1871265 ·

Jim1348 Send message Joined: 13 Dec 01 Posts: 212 Credit: 520,150 RAC: 0	Message 1871266 - Posted: 5 Jun 2017, 15:28:57 UTC - in response to Message 1871256. I am not sure, but your Gflops numbers are lower than what I am getting (yours 292.36 GFLOPS vs. mine 313.47 GFLOPS or Giggo's so you might keep your "default" settings until you have a couple of weeks in, and have some kind of stable RAC (not going up anymore) and then try the ones we have been talking about. That is rather small. I don't overclock, which could account for it. ID: 1871266 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1871271 - Posted: 5 Jun 2017, 15:48:14 UTC - in response to Message 1871250. Last modified: 5 Jun 2017, 15:48:42 UTC I tried -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer on my GTX 1060 (6 GB), and got the same performance as using Lunatics 0.45_beta6 (Windows 7 64-bit). But it was a short test. I was running 8.20 setiathome_v8 (opencl_nvidia_SoG) work units. Should I see an improvement? If you used the same comand line no. Let me be a little more specific. I had the Lunatics already installed, as was getting a certain average time for those work units (though they vary as you know). Then, I added the -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer command line, while leaving Lunatics still installed, and I didn't see any obvious change. So my impression is that Lunatics by itself gets the job done, though I may be missing something. To be more clear. All SoG apps are Lunatics apps, the one in the Installer beta 6 is just a little bit older. A few fixes has been included to reduce number of inclonclusives. But indeed r_3557 is slightly faster. With each crime and every kindness we birth our future. ID: 1871271 ·

Jim1348 Send message Joined: 13 Dec 01 Posts: 212 Credit: 520,150 RAC: 0	Message 1871277 - Posted: 5 Jun 2017, 16:24:58 UTC - in response to Message 1871271. To be more clear. All SoG apps are Lunatics apps, the one in the Installer beta 6 is just a little bit older. A few fixes has been included to reduce number of inclonclusives. But indeed r_3557 is slightly faster. Actually, right after installing Lunatics I updated to r_3584, though I don't think it is supposed to make a speed difference. So all these tests are consistent in that at least. I will just let it run. Thanks for the insights. I have not done much SETI recently, and am catching up. ID: 1871277 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1873234 - Posted: 16 Jun 2017, 0:59:42 UTC I have started getting "white lines" in my Gpu-Z for my 1060. The only thing I did was replace the W3565 (4 cores w/HT, 3.20Ghz) with a X5680 (6 cores w/HT, 3.33Ghz). I have tinkered with taking out -high_precision_timer, -high_perf and switched the "period" from 1 to 4. And putting some back in. Right now its: -sbs 1024 -spike_fft_thresh 4096 -tune 1 64 1 4 -tt 1500 -period_iterations_num 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf The top of the graph has been smoother since I dropped period to 4 and it has stopped cratering to under 80% sometimes. But I am still getting that occasional "white line". I can't tell that the processing speed has dropped (or increased ;) Any ideas? Tom A proud member of the OFA (Old Farts Association). ID: 1873234 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1873263 - Posted: 16 Jun 2017, 4:16:23 UTC - in response to Message 1873234. Last modified: 16 Jun 2017, 4:16:59 UTC I have started getting "white lines" in my Gpu-Z for my 1060. It's not an issue. What matters is how long it takes to process each WU. Check your processing times with the old settings compared to the new ones, that's all that really matters. period_iterations_num 1 gives me the best throughput. What the GPUz graph does or doesn't look like isn't of concern if my cards are producing their maximum amount of work possible. Grant Darwin NT ID: 1873263 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1874027 - Posted: 19 Jun 2017, 16:53:16 UTC 1) How much more production do you (usually) get when you add -hp and/or -high_prec_timer and/or -high_perf? 2) How much does -period_iterations_num 1 as compared to -period_iterations_num 4 effect production? I find even though in theory one of my computers is "dedicated" to Boinc/Seti/Whatever, the lag in the keyboard/screen response is so bad, that it frustrates me. So I am trying to see how much I am losing. Thank you. Tom A proud member of the OFA (Old Farts Association). ID: 1874027 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1874039 - Posted: 19 Jun 2017, 18:41:43 UTC - in response to Message 1873234. In my experience, "white lines" in the GPU usage graphs is caused by data starvation, not because the GPU is working too hard and throttling, The cpu core(s) feeding the gpu task is not able to keep the card fed fast enough. I see this with Einstein tasks mainly but also with AP tasks even with a whole cpu core dedicated to the task and it is running by itself. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1874039 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1878326 - Posted: 15 Jul 2017, 3:10:25 UTC - in response to Message 1874039. In my experience, "white lines" in the GPU usage graphs is caused by data starvation, not because the GPU is working too hard and throttling, The cpu core(s) feeding the gpu task is not able to keep the card fed fast enough. I see this with Einstein tasks mainly but also with AP tasks even with a whole cpu core dedicated to the task and it is running by itself. Keith, Is there any evidence that devoting 2 cpus to one Gpu (and task) reduces the wait for the gpu task? I tried it and the gpu load started bouncing around with a bigger range of loads than with a single cpu. So it might have actually slowed the gpu down. Tom A proud member of the OFA (Old Farts Association). ID: 1878326 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1878331 - Posted: 15 Jul 2017, 3:48:28 UTC - in response to Message 1878326. Is there any evidence that devoting 2 cpus to one Gpu (and task) reduces the wait for the gpu task? For OpenCL applications on Nvidia hardware 1 CPU core per WU is required for best performance. Grant Darwin NT ID: 1878331 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1878334 - Posted: 15 Jul 2017, 4:13:11 UTC - in response to Message 1878331. As Grant stated, the SoG app needs a full CPU core and I don't think is written to handle more than 1 CPU core. The only thing I can think of is reducing PCIe bus contention from other apps or other tasks. With Einstein we have proved that the tasks run faster at PCIe X16 bus speeds than at slower X8 , X4 or X1 speeds. This doesn't seem to be the case with SETI work though or the reduction is marginal at slower bus speeds. As previously commented, the "white lines" in the GPU usage graphs are meaningless in the larger scope. Task completion times should be the benchmark to judge system tuning. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1878334 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1878335 - Posted: 15 Jul 2017, 4:33:32 UTC - in response to Message 1878334. Task completion times should be the benchmark to judge system tuning. Yep. It's doesn't matter whether the GPU usage is 50% or 100%, the Memory controller load 50% or 100%, the power load 50% or 100%. It's the end result that counts. If increasing any of those indicators results in more work per hour, then good. If it results in less work per hour, then it's bad. Don't get caught up in CPU or GPU loads, or portion of time in kernel mode or user mode. Task completion times are be the benchmark to judge system tuning. Grant Darwin NT ID: 1878335 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22202 Credit: 416,307,556 RAC: 380	Message 1878343 - Posted: 15 Jul 2017, 7:15:30 UTC Currently none of the SETI applications are capable of using more than one CPU at a time, and that includes the SoG applications for GPUs. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1878343 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1878355 - Posted: 15 Jul 2017, 10:30:36 UTC Thank you for your clear direction in the previous messages. I have another question about the -tt 1500 setting. Is there any experience, experimentation, or guesses about the upper and lower limits, optimum setting and how it effects the put thru? Thank you. Tom A proud member of the OFA (Old Farts Association). ID: 1878355 ·

Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62	Message 1878356 - Posted: 15 Jul 2017, 10:42:52 UTC all explanations are there http://lunatics.kwsn.info/index.php/topic,1808.msg60931.html#msg60931 ID: 1878356 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1878370 - Posted: 15 Jul 2017, 13:24:52 UTC - in response to Message 1878355. Thank you for your clear direction in the previous messages. I have another question about the -tt 1500 setting. Is there any experience, experimentation, or guesses about the upper and lower limits, optimum setting and how it effects the put thru? Thank you. Tom -tt 1500 is the upper limit. It's has to be used with the -period_iterations_num. That value is also for top of the line GPUs like 980Tis, 1070, 1080, 1080ti, Titans. ID: 1878370 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1878487 - Posted: 16 Jul 2017, 1:50:41 UTC - in response to Message 1870935. Tom how are you dedicating that core for the GPU? Are all 4 cores being used? How do you monitor you CPU usage? Very good questions. I apologize for not ever answering them. On the now departed i5, I was running a 1 core per gpu card/task with the other three doing seti cpu processing. I monitored the cpu usage by eyeballing the task manager. I didn't want to fire up the resource monitor because I hadn't thought about it. Tom A proud member of the OFA (Old Farts Association). ID: 1878487 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.