Message boards :
Number crunching :
Vega Frontier Edition - MB Options Tuning
Message board moderation
Author | Message |
---|---|
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I have decided to take some time on my new system to study the effects of MB command line options on compute performance. I will be starting with app revision r3584 and will use a newer long version guppi WU. WU: blc04_2bit_blc04_guppi_57898_17662_DIAG_KIC8462852_OFF_0020.12892.818.17.26.125.vlar Initial Command Options: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 500 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleep -no_defaults_scaling The results obtained are likely to be relevant for only the VegaFE or similar GPU. Any suggestions on the approach taken are welcome. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here are my first DOE results exploring the effects of tt and sbs. It also includes a test of the effect of -no_use_sleep and -no_defaults_scaling options. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
|
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
You need to check your results better Rick. [b]TUNE:incorrect tune params: 1 (128,1,4[/b]) That would be work group size 512 your GPU only has 256. One need to understand params first to make such tests. oclfft_tune_bn 128 would mean 128 memory banks whilst the GPU only has 64. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Thanks for the feedback. This one was only a sensitivity analysis, using + and - for each nominal. It was what I could do without understanding the parameters. I think it is useful in that it does show that -tune is not optimal. Ideally, a DOE design with full understanding would be great. If you could help me understand how to better plan the DOE or where I could get additional details, I am willing to dedicate system time to fully explore all parameters. I would especially like to explore -tune. You need to check your results better Rick. GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
There is no such thing like optimal. I provide best settings in the read me which of course can be optimized, especially with kernel target time tuning. But you need to understand that each GPU/host combination reacts a little different with tuning params. Also you will get different results with each different task you are testing. Just test the same task on different days you will get slightly different results. It took me month to understand oclfft tuning, you can just try to fine tune some params. Each test will be different with diffeent type of task ie guppi or arecibo and angle range. The tune switch is rather easy. -tune 1 64 1 4 means kernel 1 will be split in in chunks of 64 x 1 x 4 = WG size 256 So a big number of combination is possible 1 1 16 1 2 16 1 3 16 1 4 16 .......... 128 2 1 or what ever it must not bigger than 256 for AMD. Maybe your card acts better in big chunks. Like 1 128 1 2 or 1 1 2 128 or alike. Needless to say i have tested them all and provide best config already. You wont find anything new in this case. Also i have to admit that your times are very good already, you just still use wrong -tt value. With each crime and every kindness we birth our future. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Also i have to admit that your times are very good alreadyI had to go take a look at my times. My 1060s and 980 are in that time frame, but they are 'sauced' up. I know it not apples to apples, but my cards are a whole lot cheaper, and so is the OS :D |
Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196 |
Fascinating analysis! Thank you! |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Also i have to admit that your times are very good alreadyI had to go take a look at my times. My 1060s and 980 are in that time frame, but they are 'sauced' up. I know it not apples to apples, but my cards are a whole lot cheaper, and so is the OS :D Yep, this build is certainly not cost effective for SETI. It is actually my main workstation and just does SETI/LHC part time. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Fascinating analysis! Thank you! Thanks for the feedback! Still lots of work to do. Each DOE takes many hours to run. I am running a 3 parameter interaction DOE now and it is looking interesting. I should be able to post those results in my morning. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here are the results for period_iterations_num vs pref_wg_size vs pref_wg_size GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here is my first look at the "-tune" parameter. I need some help understanding what is going on but what I suspect is that tuning of kernel=1 is the only one that has any effect. The results for tuning other parameters leaves kernel=1 unspecified, and probably best due to the current tune parameters not being optimized for my GPU. I plan a follow up DOE focusing on more cells with higher Mz values for kernel=1. Let me know of any recommendations. GitHub: Ricks-Lab Instagram: ricks_labs |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
As I recall few -tune lines can be provided one for each kernel. But not sure more than 1 implemented for MB. SETI apps news We're not gonna fight them. We're gonna transcend them. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
As I recall few -tune lines can be provided one for each kernel. But not sure more than 1 implemented for MB. Thanks for the feedback. It looks like the application is accepting multiple kernel settings, but with no effect for those above 1. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here is my next level look into "-tune" parameters. I focused on large Mz values based on the results of the first DOE. Here is a sample of the command line arguments used in the BenchCfg file: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 2048 -period_iterations_num 1 -tt 500 -spike_fft_thresh 4096 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_defaults_scaling -pref_wg_size 256 -pref_wg_num_per_cu 4 -tune 1 1 1 256 GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here are the results of a verification run on optimization results so far using 3 conditions: Original Command Line Options: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 500 -no_defaults_scaling -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleepOptimized Command Line Options: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 2048 -period_iterations_num 1 -tt 500 -spike_fft_thresh 4096 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_defaults_scaling -pref_wg_size 256 -pref_wg_num_per_cu 4 -tune 1 4 1 64Optimized without -tune Command Line Options: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 2048 -period_iterations_num 1 -tt 500 -spike_fft_thresh 4096 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -pref_wg_size 256 -pref_wg_num_per_cu 4 Seems like the optimization on a guppi unit resulted in a degradation for Arecibo WUs. I will redo the optimization DOEs using both types of WUs. Let me know of any other recommendations. GitHub: Ricks-Lab Instagram: ricks_labs |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
As processing chain relating from AR it's recommended to use PG* set of tasks for benchmarking. Maybe, with additional inclusion of GUPPI VLAR. SETI apps news We're not gonna fight them. We're gonna transcend them. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Thanks for the recommendation. Is that the set of 4 WUs beginning with PG that was downloaded with MB_Bench? I will give it a try after my current effort. Currently, I am using the most degraded Arecibo WU with the most improved guppi to find the best condition. After this, I plan to analyze the SOG version of r3584. Does SOG change the strategy for optimization? GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here is the first Arecibo/GreenBanks combined DOE. SBS vs TT shows no significant difference in optimal conditions. GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
After this, I plan to analyze the SOG version of r3584. Does SOG change the strategy for optimization? No, on my tests SoG was always slower on AMD GPU`s but you have got a much faster GPU so worth a try. With each crime and every kindness we birth our future. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.