Vega Frontier Edition - MB Options Tuning

Message boards : Number crunching : Vega Frontier Edition - MB Options Tuning
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1895499 - Posted: 15 Oct 2017, 14:46:15 UTC - in response to Message 1895486.  

After this, I plan to analyze the SOG version of r3584. Does SOG change the strategy for optimization?


No, on my tests SoG was always slower on AMD GPU`s but you have got a much faster GPU so worth a try.

I found SoG to be faster than the HD app on my R9 390X .
Perhaps GPUs with with a memory interface >256-bit take better advantage of the SoG app over the HD app? I would guess it is more the total memory bandwidth if related.
There is always the possibility something else in my system is having an effect too.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1895499 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1895582 - Posted: 15 Oct 2017, 23:10:41 UTC - in response to Message 1895499.  

After this, I plan to analyze the SOG version of r3584. Does SOG change the strategy for optimization?


No, on my tests SoG was always slower on AMD GPU`s but you have got a much faster GPU so worth a try.

I found SoG to be faster than the HD app on my R9 390X .
Perhaps GPUs with with a memory interface >256-bit take better advantage of the SoG app over the HD app? I would guess it is more the total memory bandwidth if related.
There is always the possibility something else in my system is having an effect too.


I will definitely give it a try. Almost done with the non-SOG version.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1895582 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1895584 - Posted: 15 Oct 2017, 23:20:40 UTC

Here is the first Arecibo/GreenBanks combined DOE. PIN, PWS, and PWN show no significant difference in optimal conditions, but I found a second different optimal condition for tuning.


GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1895584 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1895620 - Posted: 16 Oct 2017, 4:28:09 UTC

Here is the updated Tune DOE. In this case I used 2 different PIN-PWS-PWG values identified in the pervious DOE. But previous graphic highlighted incorrect second condition. Should have been 1-64-32, instead of 1-64-16. The new optimized tune parameters are highlighted in blue. These should improve guppi without degrading Arecibo.


GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1895620 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1895624 - Posted: 16 Oct 2017, 4:55:49 UTC

Here are the results of a verification run on Arecibo/GreenBank optimization compared to original:

Original Command Line Options:
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 500 -no_defaults_scaling -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleep

Optimized Command Line Options:
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 2048 -period_iterations_num 1 -tt 500 -spike_fft_thresh 4096 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_defaults_scaling -pref_wg_size 64 -pref_wg_num_per_cu 32 -tune 1 4 4 16



I have implemented these (with sbs reduced to 1028) on my Hexa-Nano (Fiji based system): 8091204
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1895624 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1895712 - Posted: 16 Oct 2017, 19:18:35 UTC - in response to Message 1895624.  

I really appreciate you taking the time to run all of these tests.

I don't know if your results are the absolute optimum settings, but I'll try running them on my RX480.

There are so many different options that its hard to find heads or tails in optimizing things.
ID: 1895712 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1895813 - Posted: 17 Oct 2017, 5:35:45 UTC - in response to Message 1895712.  

I am not sure if these Vega optimized settings would be optimal for other GPU configurations. I am hoping they will at least be valid for 64CU AMD GPUs. I am working to validate that on a Fiji GPU. I am also testing them with SoG version of the app. If I can get my hands on a 32CU GPU like the RX480, I will definitely give it a try.

I really appreciate you taking the time to run all of these tests.

I don't know if your results are the absolute optimum settings, but I'll try running them on my RX480.

There are so many different options that its hard to find heads or tails in optimizing things.

GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1895813 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1895839 - Posted: 17 Oct 2017, 11:32:28 UTC

Here are the results of original vs optimized command line options for non-SoG and SoG versions of r3584:

GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1895839 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1895841 - Posted: 17 Oct 2017, 11:46:33 UTC - in response to Message 1895839.  

I don't have any solid measurements, but the Wu's do seem to be crunching faster on my RX480, with these settings, even though its a less powerfull processor with fever CU's.

I'll have to find a way to measure the differences :)
ID: 1895841 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1895912 - Posted: 18 Oct 2017, 9:32:38 UTC

I have repeated the same tuning process for r3584_SoG and confirmed that the optimized parameters are the same as I found for noSoG, so the previous optimization verification run for r3584_SoG is final. Next step is to do the same for Fiji based GPU. But I need a time to catch up on other stuff first.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1895912 · Report as offensive
PappaLitto
Volunteer tester

Send message
Joined: 27 Jul 15
Posts: 11
Credit: 1,579,218
RAC: 8
United States
Message 1901270 - Posted: 16 Nov 2017, 2:00:11 UTC

Hey Rick,

What happened to your Youtube channel? I loved it and hope it returns
ID: 1901270 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1901328 - Posted: 16 Nov 2017, 11:28:10 UTC - in response to Message 1901270.  
Last modified: 16 Nov 2017, 11:40:40 UTC

Rick writes this in the "About" section at his Youtube channel:

"I have been the target of a doxing attack and harassment, so I have suspended the channel until I can get things figured out. In the meantime, I will posting lab updates on Instagram: rpc_labs"



I find it very sad, as I also enjoyed watching his channel.
ID: 1901328 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1901349 - Posted: 16 Nov 2017, 14:58:30 UTC - in response to Message 1901328.  

I hope to be back soon.
ID: 1901349 · Report as offensive
Marco Vandebergh ( SETI orphan )

Send message
Joined: 27 Aug 10
Posts: 39
Credit: 12,630,994
RAC: 9
Netherlands
Message 1902054 - Posted: 20 Nov 2017, 14:00:12 UTC

Sorry for not being on topic, but where does one find such specific information for the GPU?
For my 1080Ti, how does one know work group size and memory banks and so on?

I'm eager to learn :)
ID: 1902054 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1902056 - Posted: 20 Nov 2017, 14:21:26 UTC

Some things you can read in your tasks list.

OpenCL Platform Name: NVIDIA CUDA
Number of devices: 1
Max compute units: 28
Max work group size: 1024
Max clock frequency: 1683Mhz
Max memory allocation: 2952790016
Cache type: Read/Write
Cache line size: 128
Cache size: 458752
Global memory size: 11811160064
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Queue properties:
Out-of-Order: Yes
Name: GeForce GTX 1080 Ti
Vendor: NVIDIA Corporation
Driver version: 388.31
Version: OpenCL 1.2 CUDA

Some useful info are in the read me file ReadMe_MultiBeam_OpenCL_NV_SoG.txt located in your projects folder.
The rest is lots of reading and testing.


With each crime and every kindness we birth our future.
ID: 1902056 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1902230 - Posted: 21 Nov 2017, 11:47:55 UTC - in response to Message 1895912.  
Last modified: 21 Nov 2017, 11:49:28 UTC

I have repeated the same tuning process for r3584_SoG and confirmed that the optimized parameters are the same as I found for noSoG, so the previous optimization verification run for r3584_SoG is final. Next step is to do the same for Fiji based GPU. But I need a time to catch up on other stuff first.

Would be just great to collect all testing results you gathered and put them in one place for future reference.
Thread on forum tends to either be diluted or just go down and disappear. Search could miss it and such tremendous massive of testing work (and resulting data representation) will be lost...

Being case study it can provide some generalizations for tuning on other hardware also.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1902230 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Vega Frontier Edition - MB Options Tuning


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.