Getting the most bang for your buck from a GTX 1060

Message boards : Number crunching : Getting the most bang for your buck from a GTX 1060
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Jim1348

Send message
Joined: 13 Dec 01
Posts: 212
Credit: 520,150
RAC: 0
United States
Message 1871250 - Posted: 5 Jun 2017, 14:18:41 UTC - in response to Message 1871247.  
Last modified: 5 Jun 2017, 14:19:26 UTC

I tried -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer on my GTX 1060 (6 GB), and got the same performance as using Lunatics 0.45_beta6 (Windows 7 64-bit).

But it was a short test. I was running 8.20 setiathome_v8 (opencl_nvidia_SoG) work units. Should I see an improvement?


If you used the same comand line no.

Let me be a little more specific. I had the Lunatics already installed, as was getting a certain average time for those work units (though they vary as you know). Then, I added the -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer command line, while leaving Lunatics still installed, and I didn't see any obvious change. So my impression is that Lunatics by itself gets the job done, though I may be missing something.
ID: 1871250 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1871256 - Posted: 5 Jun 2017, 15:05:50 UTC - in response to Message 1871250.  

If you used the same comand line no.

Let me be a little more specific. I had the Lunatics already installed, as was getting a certain average time for those work units (though they vary as you know). Then, I added the -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer command line, while leaving Lunatics still installed, and I didn't see any obvious change. So my impression is that Lunatics by itself gets the job done, though I may be missing something.


I am not sure, but your Gflops numbers are lower than what I am getting (yours 292.36 GFLOPS vs. mine 313.47 GFLOPS or Giggo's so you might keep your "default" settings until you have a couple of weeks in, and have some kind of stable RAC (not going up anymore) and then try the ones we have been talking about.

All to often we can make generalization/statements that will hold true for most cpu/gpu combinations, but may have an exception.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1871256 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1871257 - Posted: 5 Jun 2017, 15:09:00 UTC - in response to Message 1871250.  

From NO command line to using one, you should see a difference.
Did the options used appear in the sttderr output of new tasks?
ID: 1871257 · Report as offensive
Jim1348

Send message
Joined: 13 Dec 01
Posts: 212
Credit: 520,150
RAC: 0
United States
Message 1871265 - Posted: 5 Jun 2017, 15:25:41 UTC - in response to Message 1871257.  

From NO command line to using one, you should see a difference.
Did the options used appear in the sttderr output of new tasks?

I think so.

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<stderr_txt>
Maximum single buffer size set to:1024MB
Number of period iterations for PulseFind set to:1
Target kernel sequence time set to 1500ms
High-performance path selected. If GUI lags occur consider to remove -high_perf option from tuning line
System timer will be set in high resolution mode
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Intel(R) Corporation
OpenCL platform detected: NVIDIA Corporation
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW USE_SSE3 x86
CPUID: Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz

Maybe I should just let it run for a couple of days each way to see if there is any difference.
ID: 1871265 · Report as offensive
Jim1348

Send message
Joined: 13 Dec 01
Posts: 212
Credit: 520,150
RAC: 0
United States
Message 1871266 - Posted: 5 Jun 2017, 15:28:57 UTC - in response to Message 1871256.  

I am not sure, but your Gflops numbers are lower than what I am getting (yours 292.36 GFLOPS vs. mine 313.47 GFLOPS or Giggo's so you might keep your "default" settings until you have a couple of weeks in, and have some kind of stable RAC (not going up anymore) and then try the ones we have been talking about.

That is rather small. I don't overclock, which could account for it.
ID: 1871266 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1871271 - Posted: 5 Jun 2017, 15:48:14 UTC - in response to Message 1871250.  
Last modified: 5 Jun 2017, 15:48:42 UTC

I tried -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer on my GTX 1060 (6 GB), and got the same performance as using Lunatics 0.45_beta6 (Windows 7 64-bit).

But it was a short test. I was running 8.20 setiathome_v8 (opencl_nvidia_SoG) work units. Should I see an improvement?


If you used the same comand line no.

Let me be a little more specific. I had the Lunatics already installed, as was getting a certain average time for those work units (though they vary as you know). Then, I added the -sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer command line, while leaving Lunatics still installed, and I didn't see any obvious change. So my impression is that Lunatics by itself gets the job done, though I may be missing something.


To be more clear.
All SoG apps are Lunatics apps, the one in the Installer beta 6 is just a little bit older.
A few fixes has been included to reduce number of inclonclusives.
But indeed r_3557 is slightly faster.


With each crime and every kindness we birth our future.
ID: 1871271 · Report as offensive
Jim1348

Send message
Joined: 13 Dec 01
Posts: 212
Credit: 520,150
RAC: 0
United States
Message 1871277 - Posted: 5 Jun 2017, 16:24:58 UTC - in response to Message 1871271.  

To be more clear.
All SoG apps are Lunatics apps, the one in the Installer beta 6 is just a little bit older.
A few fixes has been included to reduce number of inclonclusives.
But indeed r_3557 is slightly faster.

Actually, right after installing Lunatics I updated to r_3584, though I don't think it is supposed to make a speed difference. So all these tests are consistent in that at least.
I will just let it run. Thanks for the insights. I have not done much SETI recently, and am catching up.
ID: 1871277 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1873234 - Posted: 16 Jun 2017, 0:59:42 UTC

I have started getting "white lines" in my Gpu-Z for my 1060. The only thing I did was replace the W3565 (4 cores w/HT, 3.20Ghz) with a X5680 (6 cores w/HT, 3.33Ghz).

I have tinkered with taking out -high_precision_timer, -high_perf and switched the "period" from 1 to 4.

And putting some back in. Right now its:

-sbs 1024 -spike_fft_thresh 4096 -tune 1 64 1 4 -tt 1500 -period_iterations_num 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf

The top of the graph has been smoother since I dropped period to 4 and it has stopped cratering to under 80% sometimes. But I am still getting that occasional "white line".

I can't tell that the processing speed has dropped (or increased ;)

Any ideas?

Tom
A proud member of the OFA (Old Farts Association).
ID: 1873234 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1873263 - Posted: 16 Jun 2017, 4:16:23 UTC - in response to Message 1873234.  
Last modified: 16 Jun 2017, 4:16:59 UTC

I have started getting "white lines" in my Gpu-Z for my 1060.

It's not an issue.

What matters is how long it takes to process each WU. Check your processing times with the old settings compared to the new ones, that's all that really matters.
period_iterations_num 1 gives me the best throughput. What the GPUz graph does or doesn't look like isn't of concern if my cards are producing their maximum amount of work possible.
Grant
Darwin NT
ID: 1873263 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1874027 - Posted: 19 Jun 2017, 16:53:16 UTC

1) How much more production do you (usually) get when you add -hp and/or -high_prec_timer and/or -high_perf?

2) How much does -period_iterations_num 1 as compared to -period_iterations_num 4 effect production?

I find even though in theory one of my computers is "dedicated" to Boinc/Seti/Whatever, the lag in the keyboard/screen response is so bad, that it frustrates me. So I am trying to see how much I am losing.

Thank you.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1874027 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1874039 - Posted: 19 Jun 2017, 18:41:43 UTC - in response to Message 1873234.  

In my experience, "white lines" in the GPU usage graphs is caused by data starvation, not because the GPU is working too hard and throttling, The cpu core(s) feeding the gpu task is not able to keep the card fed fast enough. I see this with Einstein tasks mainly but also with AP tasks even with a whole cpu core dedicated to the task and it is running by itself.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1874039 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1878326 - Posted: 15 Jul 2017, 3:10:25 UTC - in response to Message 1874039.  

In my experience, "white lines" in the GPU usage graphs is caused by data starvation, not because the GPU is working too hard and throttling, The cpu core(s) feeding the gpu task is not able to keep the card fed fast enough. I see this with Einstein tasks mainly but also with AP tasks even with a whole cpu core dedicated to the task and it is running by itself.


Keith,
Is there any evidence that devoting 2 cpus to one Gpu (and task) reduces the wait for the gpu task?

I tried it and the gpu load started bouncing around with a bigger range of loads than with a single cpu. So it might have actually slowed the gpu down.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1878326 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1878331 - Posted: 15 Jul 2017, 3:48:28 UTC - in response to Message 1878326.  

Is there any evidence that devoting 2 cpus to one Gpu (and task) reduces the wait for the gpu task?

For OpenCL applications on Nvidia hardware 1 CPU core per WU is required for best performance.
Grant
Darwin NT
ID: 1878331 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1878334 - Posted: 15 Jul 2017, 4:13:11 UTC - in response to Message 1878331.  

As Grant stated, the SoG app needs a full CPU core and I don't think is written to handle more than 1 CPU core. The only thing I can think of is reducing PCIe bus contention from other apps or other tasks. With Einstein we have proved that the tasks run faster at PCIe X16 bus speeds than at slower X8 , X4 or X1 speeds. This doesn't seem to be the case with SETI work though or the reduction is marginal at slower bus speeds. As previously commented, the "white lines" in the GPU usage graphs are meaningless in the larger scope. Task completion times should be the benchmark to judge system tuning.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1878334 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1878335 - Posted: 15 Jul 2017, 4:33:32 UTC - in response to Message 1878334.  

Task completion times should be the benchmark to judge system tuning.

Yep.
It's doesn't matter whether the GPU usage is 50% or 100%, the Memory controller load 50% or 100%, the power load 50% or 100%.
It's the end result that counts.
If increasing any of those indicators results in more work per hour, then good. If it results in less work per hour, then it's bad. Don't get caught up in CPU or GPU loads, or portion of time in kernel mode or user mode.
Task completion times are be the benchmark to judge system tuning.
Grant
Darwin NT
ID: 1878335 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1878343 - Posted: 15 Jul 2017, 7:15:30 UTC

Currently none of the SETI applications are capable of using more than one CPU at a time, and that includes the SoG applications for GPUs.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1878343 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1878355 - Posted: 15 Jul 2017, 10:30:36 UTC

Thank you for your clear direction in the previous messages.

I have another question about the -tt 1500 setting. Is there any experience, experimentation, or guesses about the upper and lower limits, optimum setting and how it effects the put thru?

Thank you.
Tom
A proud member of the OFA (Old Farts Association).
ID: 1878355 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1878356 - Posted: 15 Jul 2017, 10:42:52 UTC

all explanations are there http://lunatics.kwsn.info/index.php/topic,1808.msg60931.html#msg60931
ID: 1878356 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1878370 - Posted: 15 Jul 2017, 13:24:52 UTC - in response to Message 1878355.  

Thank you for your clear direction in the previous messages.

I have another question about the -tt 1500 setting. Is there any experience, experimentation, or guesses about the upper and lower limits, optimum setting and how it effects the put thru?

Thank you.
Tom


-tt 1500 is the upper limit. It's has to be used with the -period_iterations_num. That value is also for top of the line GPUs like 980Tis, 1070, 1080, 1080ti, Titans.
ID: 1878370 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1878487 - Posted: 16 Jul 2017, 1:50:41 UTC - in response to Message 1870935.  

Tom how are you dedicating that core for the GPU? Are all 4 cores being used? How do you monitor you CPU usage?


Very good questions. I apologize for not ever answering them.

On the now departed i5, I was running a 1 core per gpu card/task with the other three doing seti cpu processing.

I monitored the cpu usage by eyeballing the task manager. I didn't want to fire up the resource monitor because I hadn't thought about it.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1878487 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Getting the most bang for your buck from a GTX 1060


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.