Best tuning for 1080ti and Process Lasso use.

Message boards : Number crunching : Best tuning for 1080ti and Process Lasso use.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1921058 - Posted: 25 Feb 2018, 17:41:43 UTC
Last modified: 25 Feb 2018, 18:00:07 UTC

Well, I have 2 identical 1080ti's. And I do not understand what all of those settings mean and do.
So if I should change some of them back, or if you have any better settings for me to try, by all means please let me know. I am pretty good with the hardware, but not mathematics.

Meow?

EDIT....
I set wg back to 256.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1921058 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921064 - Posted: 25 Feb 2018, 18:23:45 UTC

All the values Marco posted are valid except for the bn memory bank value I think. I will have to do some research in how the memory architecture of the 1080T is laid out. Maybe Marco already did that and the 1080Ti actually has that many memory banks. It does have 3GB more than any other Nvidia card.

This tuning would be valid.

<cmdline>-sbs 2048 -period_iterations_num 1 -tt 1500 -high_perf -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 512 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -pref_wg_num_per_cu 6 -oclfft_tune_cw 256 -high_prec_timer</cmdline>

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921064 · Report as offensive
Marco Vandebergh ( SETI orphan )

Send message
Joined: 27 Aug 10
Posts: 39
Credit: 12,630,994
RAC: 9
Netherlands
Message 1921065 - Posted: 25 Feb 2018, 18:33:52 UTC - in response to Message 1921052.  
Last modified: 25 Feb 2018, 18:56:58 UTC

What is the difference between 1080Ti chip spec ( Max work group size: 1024 ) and the workgroup size mentioned in SOG app?
It may clear things up.

When entering 1024 WG size workunits don't start, so that is useless indeed.

And -pref_wg_num_per_cu 6 does increase performance and GPU load, test it out. Higher than 6 is getting slower, lower than 6 also.

On my machine that is, 1 WU on card, Windows 10 x64 all updates.



My stderr output:

core_client_version>7.8.3</core_client_version>
<![CDATA[
<stderr_txt>
Maximum single buffer size set to:2048MB
SpikeFind FFT size threshold override set to:4096
TUNE: kernel 1 now has workgroup size of (64,1,4)
oclFFT global radix override set to:256
oclFFT local radix override set to:16
oclFFT max WG size override set to:512
oclFFT max local FFT size override set to:512
oclFFT number of local memory banks set to:256
oclFFT minimal memory coalesce width set to:256
Preferred workgroups number per compute unit set to 6.
Number of period iterations for PulseFind set to:1
System timer will be set in high resolution mode
High-performance path selected. If GUI lags occur consider to remove -high_perf option from tuning line
Target kernel sequence time set to 1500ms
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: NVIDIA Corporation
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used


Post edit:

If i was Kittyman, i would run 1 WU per card with the settings i use, and let it fold for a while. see if its better or not.

If there are things to improve im all ears, its all a learning process.
ID: 1921065 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921077 - Posted: 25 Feb 2018, 19:22:32 UTC - in response to Message 1921065.  

Well your stderr.txt output doesn't complain about any invalid values. But I can't remember if the SoG app has any internal error reporting functions. So hard to say whether the memory bank = 256 is valid or not.

I also can't find any information on how the memory banks are allocated on the card. All I can find is that it has 11 32 bit controllers. I can't deduce the number of banks just on that information alone.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921077 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921078 - Posted: 25 Feb 2018, 19:25:16 UTC

When entering 1024 WG size workunits don't start, so that is useless indeed.

Good to know. From the docs it said up to 2048 in 32kb increments was viable.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921078 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1921079 - Posted: 25 Feb 2018, 19:26:02 UTC - in response to Message 1921064.  

All the values Marco posted are valid except for the bn memory bank value I think. I will have to do some research in how the memory architecture of the 1080T is laid out. Maybe Marco already did that and the 1080Ti actually has that many memory banks. It does have 3GB more than any other Nvidia card.

This tuning would be valid.

<cmdline>-sbs 2048 -period_iterations_num 1 -tt 1500 -high_perf -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 512 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -pref_wg_num_per_cu 6 -oclfft_tune_cw 256 -high_prec_timer</cmdline>

And this is what I have.
-v 0 -tt 1500 -period_iterations_num 1 -high_perf -high_prec_timer -sbs 2048 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 256 -oclfft_tune_cw 256

I don't think anybody ever mentioned that wg per cpu bit to me before.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1921079 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921080 - Posted: 25 Feb 2018, 19:34:50 UTC

I don't think anybody ever mentioned that wg per cpu bit to me before.

It's work group per CU (compute unit). IOW the number of SM (Shader Modules). That would be 28 for the 1080Ti.

The -pref_wg_num_per_cu 6 parameter seems to be useful only when used with period_iterations 1 parameter and high -sbs N values. It's explained in the SoG doc file.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921080 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921085 - Posted: 25 Feb 2018, 20:03:00 UTC

I just tried the Marco tuning. The 1080Ti liked it by about 4-6 seconds. The 1070's got penalized by about 20 seconds. So I have returned to the suggested now other than I am experimenting with the memory coalesce value at 128 now. I saw it helped in the past on Arecibo so now I am testing to see if it works on BLC tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921085 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921113 - Posted: 25 Feb 2018, 21:14:09 UTC

As I expected, memory coalesce really did nothing much for me. Now testing wg size of 512 by itself.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921113 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921136 - Posted: 25 Feb 2018, 22:07:47 UTC

I keep coming back to the default tunings suggested in the SoG docs file as best working for my mixed card system. Plus the added, well known and established, high performance tuning parameters.

I wondered if the docs were still relevant since the Nvidia card choices that Raistmer has in his farm are an old GTX 570 and a laptop GTX 940M. I assume he tested the higher performance suggested tunings with beta testers with relevant cards. But the card choices current at the time of the published docs are very far away from what is currently available.

It is good to continue testing of possible parameters with the current crop of Nvidia cards to see if any more performance can be extracted out of the application.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921136 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1921172 - Posted: 25 Feb 2018, 23:45:04 UTC

Keith, you think that Marks tuning parameters he is using would work with my newly relaunched Atom rig? I am going to of course dedicate all the CPU horsepower towards feeding the GPU. Thoughts? This stuff boggles my mind a bit when I try to dig into it and understand it, hence asking those who seem to get it.

ID: 1921172 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921183 - Posted: 26 Feb 2018, 0:09:37 UTC - in response to Message 1921172.  

Hey Al, that would work to start out. Mark was messing around with some of the parameters that Marco showed for his 1080Ti. I think you might get in trouble with the cpu not having enough horsepower to run the gpu and also take care of the desktop and user input experience. I have no experience at all with Intel cpu for 25 years. This is the last tuning that Mark posted.

-v 0 -tt 1500 -period_iterations_num 1 -high_perf -high_prec_timer -sbs 2048 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 256 -oclfft_tune_cw 256

You can drop the -v 0 at the beginning since verbose level 0 is the default anyway. He must have used it at higher verbose levels in the past for the reason it is still there.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921183 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1921188 - Posted: 26 Feb 2018, 0:14:58 UTC - in response to Message 1921183.  
Last modified: 26 Feb 2018, 0:18:48 UTC

Thanks Keith. I just opened up the gates for tasks to that lil guy, once the downloads settle down, I will add that line to it, and see what happens. This should be interesting to say the least, one of the lowest powered Intel CPU's handling one of the most powerful GPU's available today. Crazy, I know, but that's what makes it fun! :-)

*Edit* Well, after a couple tries, it got about 20 tasks. When it started running, it said .04 CPU and 1 GPU, will the new config file parms correct that to be one CPU core for 1 GPU task?

ID: 1921188 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1921190 - Posted: 26 Feb 2018, 0:20:36 UTC - in response to Message 1921188.  

I just add to keep and eye on the CPU temp. Normally Atom's not has very sophisticated cooling systems since the running cold.
ID: 1921190 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1921208 - Posted: 26 Feb 2018, 0:50:37 UTC - in response to Message 1921183.  

Hey Al, that would work to start out. Mark was messing around with some of the parameters that Marco showed for his 1080Ti. I think you might get in trouble with the cpu not having enough horsepower to run the gpu and also take care of the desktop and user input experience. I have no experience at all with Intel cpu for 25 years. This is the last tuning that Mark posted.

-v 0 -tt 1500 -period_iterations_num 1 -high_perf -high_prec_timer -sbs 2048 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 256 -oclfft_tune_cw 256

You can drop the -v 0 at the beginning since verbose level 0 is the default anyway. He must have used it at higher verbose levels in the past for the reason it is still there.

I thought I read in the SOG notes that 1 was the default.
And I am currently running on the command line that Marco posted. It did seem a few seconds faster, but I don't have any benchmark program set up to actually test that.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1921208 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1921230 - Posted: 26 Feb 2018, 1:45:05 UTC

Huh, just went to add those commands, and did a search for the app_config.xml, because I couldn't seem to find it, and for good reason. It wasn't there. Shouldn't that have been created when the Lunatics installer was ran?

ID: 1921230 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1921234 - Posted: 26 Feb 2018, 1:54:23 UTC - in response to Message 1921230.  

Huh, just went to add those commands, and did a search for the app_config.xml, because I couldn't seem to find it, and for good reason. It wasn't there. Shouldn't that have been created when the Lunatics installer was ran?

No you need to create by using a text editor.
ID: 1921234 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921240 - Posted: 26 Feb 2018, 2:04:17 UTC
Last modified: 26 Feb 2018, 2:09:01 UTC

App_config is user created. Neither the project or Lunatics creates one. The .04 CPU and 1 GPU is just the default cpu/gpu resource. That is only there for the scheduler to determine scheduling needs. It has no bearing on how the application uses the cpu. The SoG app uses one full cpu core to support each gpu task. Normally, we add that 1 cpu, 1 gpu setting to the app_config file to make sure the scheduler doesn't overcommit our cpu resources.

For @kittyman. I compared my stderr.txt gpu output to one of mine with no verbosity statement and the output is identical. So that would indicate that verbosity 0 is the default. I would have to reread the docs to make sure.

[Edit]
-v N :sets level of verbosity of app. N - integer number.  Default corresponds to -v 1. 
    -v 0 disables almost all output.
    Levels from 2 to 5 reserved for increasing verbosity, higher levels reserved for specific usage.
    -v 2 enables all signals output.
    -v 6 enables delays printing where sleep loops used.
    -v 7 enables oclFFT config printing for oclFFT fine tune.
    -v 8 prints kernel launch configuration for PulseFind algorithm\

I guess default is 1. But I don't see any difference in stderr.txt output in our tasks. If verbose = 0 produces almost no output, I don't really see what is dropped.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921240 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1921242 - Posted: 26 Feb 2018, 2:07:29 UTC - in response to Message 1921240.  
Last modified: 26 Feb 2018, 2:12:15 UTC

The following is from the SOG notes.............

-v N :sets level of verbosity of app. N - integer number. Default corresponds to -v 1.
-v 0 disables almost all output.
Levels from 2 to 5 reserved for increasing verbosity, higher levels reserved for specific usage.
-v 2 enables all signals output.
-v 6 enables delays printing where sleep loops used.
-v 7 enables oclFFT config printing for oclFFT fine tune.
-v 8 prints kernel launch configuration for PulseFind algorithm

That does not mean it is 100% accurate, or that 0 and 1 are not very close to the same thing.
But that is where I read it.

And, since I am running an older version of Boinc, which does not support an app_config.xml file, my command line is in the mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt file, which I would think you might have as well. It works very well and can be changed on the fly without rebooting Boinc. I do not know if the app_config.xml file works on the fly or not.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1921242 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1921243 - Posted: 26 Feb 2018, 2:15:25 UTC

I don't think it is recognizing the parameter. If it was, it should echo back the setting in the output. I know that you reread the app_config file since it picked up the new Preferred workgroups number per compute unit set to 6 setting.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1921243 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Best tuning for 1080ti and Process Lasso use.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.