Message boards :
Number crunching :
Best tuning for 1080ti and Process Lasso use.
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Well, I have 2 identical 1080ti's. And I do not understand what all of those settings mean and do. So if I should change some of them back, or if you have any better settings for me to try, by all means please let me know. I am pretty good with the hardware, but not mathematics. Meow? EDIT.... I set wg back to 256. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
All the values Marco posted are valid except for the bn memory bank value I think. I will have to do some research in how the memory architecture of the 1080T is laid out. Maybe Marco already did that and the 1080Ti actually has that many memory banks. It does have 3GB more than any other Nvidia card. This tuning would be valid. <cmdline>-sbs 2048 -period_iterations_num 1 -tt 1500 -high_perf -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 512 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -pref_wg_num_per_cu 6 -oclfft_tune_cw 256 -high_prec_timer</cmdline> Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Marco Vandebergh ( SETI orphan ) Send message Joined: 27 Aug 10 Posts: 39 Credit: 12,630,994 RAC: 9 |
What is the difference between 1080Ti chip spec ( Max work group size: 1024 ) and the workgroup size mentioned in SOG app? It may clear things up. When entering 1024 WG size workunits don't start, so that is useless indeed. And -pref_wg_num_per_cu 6 does increase performance and GPU load, test it out. Higher than 6 is getting slower, lower than 6 also. On my machine that is, 1 WU on card, Windows 10 x64 all updates. My stderr output: core_client_version>7.8.3</core_client_version> <![CDATA[ <stderr_txt> Maximum single buffer size set to:2048MB SpikeFind FFT size threshold override set to:4096 TUNE: kernel 1 now has workgroup size of (64,1,4) oclFFT global radix override set to:256 oclFFT local radix override set to:16 oclFFT max WG size override set to:512 oclFFT max local FFT size override set to:512 oclFFT number of local memory banks set to:256 oclFFT minimal memory coalesce width set to:256 Preferred workgroups number per compute unit set to 6. Number of period iterations for PulseFind set to:1 System timer will be set in high resolution mode High-performance path selected. If GUI lags occur consider to remove -high_perf option from tuning line Target kernel sequence time set to 1500ms Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation BOINC assigns device 0 Info: BOINC provided OpenCL device ID used Post edit: If i was Kittyman, i would run 1 WU per card with the settings i use, and let it fold for a while. see if its better or not. If there are things to improve im all ears, its all a learning process. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Well your stderr.txt output doesn't complain about any invalid values. But I can't remember if the SoG app has any internal error reporting functions. So hard to say whether the memory bank = 256 is valid or not. I also can't find any information on how the memory banks are allocated on the card. All I can find is that it has 11 32 bit controllers. I can't deduce the number of banks just on that information alone. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
When entering 1024 WG size workunits don't start, so that is useless indeed. Good to know. From the docs it said up to 2048 in 32kb increments was viable. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
All the values Marco posted are valid except for the bn memory bank value I think. I will have to do some research in how the memory architecture of the 1080T is laid out. Maybe Marco already did that and the 1080Ti actually has that many memory banks. It does have 3GB more than any other Nvidia card. And this is what I have. -v 0 -tt 1500 -period_iterations_num 1 -high_perf -high_prec_timer -sbs 2048 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 256 -oclfft_tune_cw 256 I don't think anybody ever mentioned that wg per cpu bit to me before. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I don't think anybody ever mentioned that wg per cpu bit to me before. It's work group per CU (compute unit). IOW the number of SM (Shader Modules). That would be 28 for the 1080Ti. The -pref_wg_num_per_cu 6 parameter seems to be useful only when used with period_iterations 1 parameter and high -sbs N values. It's explained in the SoG doc file. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I just tried the Marco tuning. The 1080Ti liked it by about 4-6 seconds. The 1070's got penalized by about 20 seconds. So I have returned to the suggested now other than I am experimenting with the memory coalesce value at 128 now. I saw it helped in the past on Arecibo so now I am testing to see if it works on BLC tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
As I expected, memory coalesce really did nothing much for me. Now testing wg size of 512 by itself. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I keep coming back to the default tunings suggested in the SoG docs file as best working for my mixed card system. Plus the added, well known and established, high performance tuning parameters. I wondered if the docs were still relevant since the Nvidia card choices that Raistmer has in his farm are an old GTX 570 and a laptop GTX 940M. I assume he tested the higher performance suggested tunings with beta testers with relevant cards. But the card choices current at the time of the published docs are very far away from what is currently available. It is good to continue testing of possible parameters with the current crop of Nvidia cards to see if any more performance can be extracted out of the application. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Keith, you think that Marks tuning parameters he is using would work with my newly relaunched Atom rig? I am going to of course dedicate all the CPU horsepower towards feeding the GPU. Thoughts? This stuff boggles my mind a bit when I try to dig into it and understand it, hence asking those who seem to get it. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Hey Al, that would work to start out. Mark was messing around with some of the parameters that Marco showed for his 1080Ti. I think you might get in trouble with the cpu not having enough horsepower to run the gpu and also take care of the desktop and user input experience. I have no experience at all with Intel cpu for 25 years. This is the last tuning that Mark posted. -v 0 -tt 1500 -period_iterations_num 1 -high_perf -high_prec_timer -sbs 2048 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 256 -oclfft_tune_cw 256 You can drop the -v 0 at the beginning since verbose level 0 is the default anyway. He must have used it at higher verbose levels in the past for the reason it is still there. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Thanks Keith. I just opened up the gates for tasks to that lil guy, once the downloads settle down, I will add that line to it, and see what happens. This should be interesting to say the least, one of the lowest powered Intel CPU's handling one of the most powerful GPU's available today. Crazy, I know, but that's what makes it fun! :-) *Edit* Well, after a couple tries, it got about 20 tasks. When it started running, it said .04 CPU and 1 GPU, will the new config file parms correct that to be one CPU core for 1 GPU task? |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I just add to keep and eye on the CPU temp. Normally Atom's not has very sophisticated cooling systems since the running cold. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Hey Al, that would work to start out. Mark was messing around with some of the parameters that Marco showed for his 1080Ti. I think you might get in trouble with the cpu not having enough horsepower to run the gpu and also take care of the desktop and user input experience. I have no experience at all with Intel cpu for 25 years. This is the last tuning that Mark posted. I thought I read in the SOG notes that 1 was the default. And I am currently running on the command line that Marco posted. It did seem a few seconds faster, but I don't have any benchmark program set up to actually test that. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Huh, just went to add those commands, and did a search for the app_config.xml, because I couldn't seem to find it, and for good reason. It wasn't there. Shouldn't that have been created when the Lunatics installer was ran? |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Huh, just went to add those commands, and did a search for the app_config.xml, because I couldn't seem to find it, and for good reason. It wasn't there. Shouldn't that have been created when the Lunatics installer was ran? No you need to create by using a text editor. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
App_config is user created. Neither the project or Lunatics creates one. The .04 CPU and 1 GPU is just the default cpu/gpu resource. That is only there for the scheduler to determine scheduling needs. It has no bearing on how the application uses the cpu. The SoG app uses one full cpu core to support each gpu task. Normally, we add that 1 cpu, 1 gpu setting to the app_config file to make sure the scheduler doesn't overcommit our cpu resources. For @kittyman. I compared my stderr.txt gpu output to one of mine with no verbosity statement and the output is identical. So that would indicate that verbosity 0 is the default. I would have to reread the docs to make sure. [Edit] -v N :sets level of verbosity of app. N - integer number. Default corresponds to -v 1. -v 0 disables almost all output. Levels from 2 to 5 reserved for increasing verbosity, higher levels reserved for specific usage. -v 2 enables all signals output. -v 6 enables delays printing where sleep loops used. -v 7 enables oclFFT config printing for oclFFT fine tune. -v 8 prints kernel launch configuration for PulseFind algorithm\ I guess default is 1. But I don't see any difference in stderr.txt output in our tasks. If verbose = 0 produces almost no output, I don't really see what is dropped. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
The following is from the SOG notes............. -v N :sets level of verbosity of app. N - integer number. Default corresponds to -v 1. -v 0 disables almost all output. Levels from 2 to 5 reserved for increasing verbosity, higher levels reserved for specific usage. -v 2 enables all signals output. -v 6 enables delays printing where sleep loops used. -v 7 enables oclFFT config printing for oclFFT fine tune. -v 8 prints kernel launch configuration for PulseFind algorithm That does not mean it is 100% accurate, or that 0 and 1 are not very close to the same thing. But that is where I read it. And, since I am running an older version of Boinc, which does not support an app_config.xml file, my command line is in the mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt file, which I would think you might have as well. It works very well and can be changed on the fly without rebooting Boinc. I do not know if the app_config.xml file works on the fly or not. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I don't think it is recognizing the parameter. If it was, it should echo back the setting in the output. I know that you reread the app_config file since it picked up the new Preferred workgroups number per compute unit set to 6 setting. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.