Getting the most bang for your buck from a GTX 1060

Message boards : Number crunching : Getting the most bang for your buck from a GTX 1060
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1870570 - Posted: 1 Jun 2017, 18:56:23 UTC

I will stipulate that I have a 3 GB card (it listed as faster than the 6 GB card) and this model is supposed to be very overclock-able.

1) What you favorite Seti MB*sog.txt or lunatic version parameters for this gpu card?

2) How many tasks are you running (I am getting a massive slow down when I try two tasks)?

3) Have you tried to Overclock this card? What are you using that you think are reliable/stable?

Thank you,
Tom
A proud member of the OFA (Old Farts Association).
ID: 1870570 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1870577 - Posted: 1 Jun 2017, 19:37:33 UTC - in response to Message 1870570.  

Hi Tom, I'll leave items one and two to others that use NVidia cards.

Concerning item three, Overclocking I have a few comments.

1) it doesn't really pay to ask others for their overclock settings, because the effects of overclocking are very system dependent.
2) Be very careful when overclocking your card as the Seti apps are sensitive to it and you may start returning invalid results. This doesn't mean you can't do it, you just have to be vigilant and watch your results closely for any increase in invalid results.
3) Overclocking without producing invalid results is getting the most out of your GPU, so go ahead and do it carefully.
4)I like to use a program called HWinfo64 to monitor my system (frequencies, temperatures, fans, voltages, HDDs, GPUs) and would recommend using it or something like it.
5) The link below is to a YouTube video on overclocking Pascal based GPUs, remember that almost all of the overclocking videos are for getting the fastest settings for gaming and that using those settings will most likely lead to invalid Seti results, but the procedure is the same to follow in getting the fastest settings for Seti while producing valid results
6) Have fun playing with your card. I know I have with my RX480.

https://www.youtube.com/watch?v=SKWbKCKsWVQ
ID: 1870577 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1870581 - Posted: 1 Jun 2017, 19:56:34 UTC - in response to Message 1870577.  

5) The link below is to a YouTube video on overclocking Pascal based GPUs, remember that almost all of the overclocking videos are for getting the fastest settings for gaming and that using those settings will most likely lead to invalid Seti results, but the procedure is the same to follow in getting the fastest settings for Seti while producing valid results
6) Have fun playing with your card. I know I have with my RX480.

https://www.youtube.com/watch?v=SKWbKCKsWVQ


I hear you on the overclocking causing "invalid" results. I used the overclocking # from a review article and darned if the computation errors didn't start cropping up so I took the settings back to "stock" for the time being.

I want my cake and eat it too. I want it smokin' fast and reliable :)

Oh, well. Thank you!

Tom
A proud member of the OFA (Old Farts Association).
ID: 1870581 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11358
Credit: 29,581,041
RAC: 66
United States
Message 1870583 - Posted: 1 Jun 2017, 20:03:47 UTC

I have a 3GB factory OCd card I use over a Einstein. It was producing about 1% invalid results, slowing down a bit got rid of them.
ID: 1870583 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1870590 - Posted: 1 Jun 2017, 20:50:37 UTC - in response to Message 1870570.  

I will stipulate that I have a 3 GB card (it listed as faster than the 6 GB card) and this model is supposed to be very overclock-able.

1) What you favorite Seti MB*sog.txt or lunatic version parameters for this gpu card?


When I searched for "1060" in the messages here at Seti@home, I got this command list. The author of it called it his "lean" command.

-sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer
A proud member of the OFA (Old Farts Association).
ID: 1870590 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1870592 - Posted: 1 Jun 2017, 20:52:29 UTC
Last modified: 1 Jun 2017, 20:53:39 UTC

I use " -sbs 1024 -period_iterations_num 10 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64" running a single task.

Cheers.
ID: 1870592 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1870593 - Posted: 1 Jun 2017, 20:52:34 UTC

How many Seti tasks are you processing an hour on your Gtx 1060?
A proud member of the OFA (Old Farts Association).
ID: 1870593 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1870595 - Posted: 1 Jun 2017, 20:59:03 UTC - in response to Message 1870592.  

Wiggo "Socalist"

Is there a through explanation including what happens when you increase/decrease the parameters for these SOG commands. I mean, more than is explained in the text/readme files.

Otherwise, it sounds like a really interesting new thread :)

Thank you for your example of a good working SOG command list.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1870595 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1870597 - Posted: 1 Jun 2017, 21:05:51 UTC

I just cheated and asked an expert for a no lag cmdline that I could use on my workstations. ;-)

Mike would be the best person to explain it all that I know, but I'm happy that it just works. :-)

Cheers.
ID: 1870597 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1870608 - Posted: 1 Jun 2017, 22:09:04 UTC - in response to Message 1870590.  

I will stipulate that I have a 3 GB card (it listed as faster than the 6 GB card) and this model is supposed to be very overclock-able.

1) What you favorite Seti MB*sog.txt or lunatic version parameters for this gpu card?


When I searched for "1060" in the messages here at Seti@home, I got this command list. The author of it called it his "lean" command.

-sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer


While it is lean, it's also aggressive. That's what I use on my 1070s and 1080s. Best to do more like Wiggos.
ID: 1870608 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1870619 - Posted: 1 Jun 2017, 23:52:25 UTC - in response to Message 1870608.  

When I searched for "1060" in the messages here at Seti@home, I got this command list. The author of it called it his "lean" command.

-sbs 1024 -hp -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer

While it is lean, it's also aggressive. That's what I use on my 1070s and 1080s. Best to do more like Wiggos.


Since the GTX 1060 bought specifically to "pump up" the RAC my (mostly) dedicated to BOINC pc, at least the SETI part of it, "aggressive" (presumably meaning there is lots of screen lag) is good. It looks like 22,000 is a possible goal but since my cpu doesn't do AVX, its going to be a 1060 Gpu and sse3 all the way.....

I would be "perfectly" happy with exact instructions on how to get the most production even if I almost have to re-boot the box to get its attention. Except I draw the line at having to really learn Linux.... :) And I know that BOINC/Seti tuning is more an art plus lots of testing, than a step-by-step SOP.

Thank you,
Tom
A proud member of the OFA (Old Farts Association).
ID: 1870619 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1870623 - Posted: 2 Jun 2017, 0:27:10 UTC
Last modified: 2 Jun 2017, 0:37:12 UTC

If that's the case Tom then try this.

-tt 1500 -sbs 1024 -period_iterations_num 4 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64

You could also experiment with the -hp and -high_perf -high_prec_timer commands but they could cause a lot of lag and the -period_iterations_num can be adjusted.

Cheers.
ID: 1870623 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1870653 - Posted: 2 Jun 2017, 2:43:59 UTC

From my experience with the AMD driver over the last two months, and I have no idea if the same is true for the NVidia driver, the command line options -high_perf and the ratio between -period_iterartions and -tts, would often cause the video driver to reset. This doesn't trash the task, just sends it into LaLa land forcing me to do a system restart. So experimenting with these options and/or the overclocking settings (you don't have to try increasing in large steps, just try small increments of like 5MHz) would be a good situation where you would want to have Boinc getting tasks one at a time, and once you get to a stable maximum, then go back to your original cache settings.

As for the longer times when running two tasks at a time, I don't know how the -sbs setting affects the cuda app, but the NVidia opencl app appears produce the same result information as on an AMD, and here I know how the -sbs increase affects a task. Watch for the following lines in the result file for an opencl task:

OpenCL-kernels filename : MultiBeam_Kernels_r3584.cl
ar=0.446582 NumCfft=191633 NumGauss=1059283078 NumPulse=226323398120 NumTriplet=452717731126
Currently allocated 1573 MB for GPU buffers

At first I thought those buffers were on the CPU, but after communication with Rastimer, they are in the GPU's opencl memory. Now both NVidia and AMD limit the amount of opencl memory on their cards. So even though my GPU has 8192MB of memory, it only has 3072MB of opencl memory. For Rastimer's Seti stock app (r3584.cl) a -sbs 1476 setting produces the 1573MB of buffers which is fine for one task at a time. However, 1573*2 (tasks) = 3146MB which exceeds the opencl limit of 3072MB, so the GPU has to do a lot of swapping causing increased runtimes. Rastimer's Lunactics Seti app (r3557.cl) allows you to use an even higher -sbs setting.
ID: 1870653 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1870655 - Posted: 2 Jun 2017, 2:54:25 UTC - in response to Message 1870653.  

Darrell, where did you find the Memory allocation for OpenCl for your cards? Did you have to run a script?
ID: 1870655 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1870659 - Posted: 2 Jun 2017, 3:22:35 UTC - in response to Message 1870655.  

I'm interested too. I did an experiment with -SBS 3072 and noticed that the tasks still did not allocate what I thought it should have. I tried on a 1070 with 8 GB of memory and it SHOULD have had enough to run two tasks per card as usual. But that is not what I observed. So if there is some hard OpenCL limit on graphics memory allocation, that would explain what I observed.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1870659 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1870668 - Posted: 2 Jun 2017, 4:02:38 UTC
Last modified: 2 Jun 2017, 4:27:14 UTC

Easiest way to find the amount of OpenCL memory is to look at a task result file on the website. In the result, Rastimer's program lists the OpenCl caps of your card. Look for the following line:

Max memory allocation: 3221225472 <-- this is the amount of OpenCL memory your card has. Value is in bytes and to convert to MB:

3221225472/1024 = 3145728KB

3145728/1024 = 3072MB of OpenCl memory.

In my previous post I said I was using a -sbs setting of 1476, this is incorrect, it is 1472

From my testing and reading, there is nothing that says if you use a -sbs X will result in Y MB of memory buffers to be allocated. All you can do is increase the -sbs setting by increments of 64 and see the increased amount of buffer size. For one task at a time with the stock r3584.cl the max -sbs is 1472, another increase of 64 will result in error messages and the app falls back to the default of 256. What the max is in the Lunatics r3557.cl I do not know, but it is much higher.

When I run two tasks at a time on my RX480, I have to drop the -sbs setting down to 1208 to avoid exceeding the amount of OpenCl memory.
ID: 1870668 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1870672 - Posted: 2 Jun 2017, 4:32:06 UTC - in response to Message 1870668.  
Last modified: 2 Jun 2017, 4:32:47 UTC

Easiest way to find the amount of OpenCL memory is to look at a task result file on the website. In the result, Rastimer's program lists the OpenCl caps of your card. Look for the following line:

Max memory allocation: 3221225472 <-- this is the amount of OpenCL memory your card has. Value is in bytes and to convert to MB:

3221225472/1024 = 3145728KB

3145728/1024 = 3072MB of OpenCl memory.



If you are converting from bytes to MB , shouldn't it be

3221225472/1000 = 3221225.472

Then

3221225.472/1024 = 3145.724

Where does the value of 1024 come from. Sorry for so many questions, just trying to understand these calculations.

Zalster
ID: 1870672 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1870674 - Posted: 2 Jun 2017, 4:39:10 UTC
Last modified: 2 Jun 2017, 4:39:58 UTC

From my testing and reading, there is nothing that says if you use a -sbs X will result in Y MB of memory buffers to be allocated. All you can do is increase the -sbs setting by increments of 64 and see the increased amount of buffer size. For one task at a time with the stock r3584.cl the max -sbs is 1472, another increase of 64 will result in error messages and the app falls back to the default of 256. What the max is in the Lunatics r3557.cl I do not know, but it is much higher.


When we originally did this testing of the -sbs values, it was based on the increments of 256.

What we found was even if you put 1024 in as a value there was additional memory allocated to the work (which we assumed was OS related operations and support) So a buffer value of additional 256 needed to be considered when running more than 1 work unit per GPU and subtracting from overall Memory of the Card. However, we found that there was a limit to how many work units could be run on a card even with a high GB number of RAM, We couldn't figure out why. If what you are saying is true about OpenCl memory being only a portion of the total Memory (which appears to be true from what I have read) Nvidia limits it to 25% of the total GPU Ram (AMD and Intel is higher, somewhere around 50-75%) Which is why I was trying to figure out your calculations.
ID: 1870674 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1870683 - Posted: 2 Jun 2017, 16:02:38 UTC - in response to Message 1870674.  

Ok, during the down time we had a discussion over on the BONIC website and I learned that using 1024 was the correct way to calculate the values as 1024 is the unit for Memory, not for mathmatics so it makes more sense now. So I can't go back and correct my previous post due to the time that has pasted, but for anyone reading along, know that the 1024 is the number to use.
ID: 1870683 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1870688 - Posted: 2 Jun 2017, 16:18:06 UTC - in response to Message 1870672.  
Last modified: 2 Jun 2017, 16:28:44 UTC



If you are converting from bytes to MB , shouldn't it be

3221225472/1000 = 3221225.472

Then

3221225.472/1024 = 3145.724

Where does the value of 1024 come from. Sorry for so many questions, just trying to understand these calculations.

Zalster


I studied it in my computer courses at Everest College, but here is a link showing the powers of two:

https://www.computerhope.com/issues/chspace.htm
ID: 1870688 · Report as offensive
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Number crunching : Getting the most bang for your buck from a GTX 1060


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.