Message boards :
Number crunching :
Vega Frontier Edition - MB Options Tuning
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
After this, I plan to analyze the SOG version of r3584. Does SOG change the strategy for optimization? I found SoG to be faster than the HD app on my R9 390X . Perhaps GPUs with with a memory interface >256-bit take better advantage of the SoG app over the HD app? I would guess it is more the total memory bandwidth if related. There is always the possibility something else in my system is having an effect too. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
After this, I plan to analyze the SOG version of r3584. Does SOG change the strategy for optimization? I will definitely give it a try. Almost done with the non-SOG version. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here is the first Arecibo/GreenBanks combined DOE. PIN, PWS, and PWN show no significant difference in optimal conditions, but I found a second different optimal condition for tuning. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here is the updated Tune DOE. In this case I used 2 different PIN-PWS-PWG values identified in the pervious DOE. But previous graphic highlighted incorrect second condition. Should have been 1-64-32, instead of 1-64-16. The new optimized tune parameters are highlighted in blue. These should improve guppi without degrading Arecibo. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here are the results of a verification run on Arecibo/GreenBank optimization compared to original: Original Command Line Options: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 500 -no_defaults_scaling -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleep Optimized Command Line Options: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 2048 -period_iterations_num 1 -tt 500 -spike_fft_thresh 4096 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_defaults_scaling -pref_wg_size 64 -pref_wg_num_per_cu 32 -tune 1 4 4 16 I have implemented these (with sbs reduced to 1028) on my Hexa-Nano (Fiji based system): 8091204 GitHub: Ricks-Lab Instagram: ricks_labs |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
I really appreciate you taking the time to run all of these tests. I don't know if your results are the absolute optimum settings, but I'll try running them on my RX480. There are so many different options that its hard to find heads or tails in optimizing things. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I am not sure if these Vega optimized settings would be optimal for other GPU configurations. I am hoping they will at least be valid for 64CU AMD GPUs. I am working to validate that on a Fiji GPU. I am also testing them with SoG version of the app. If I can get my hands on a 32CU GPU like the RX480, I will definitely give it a try. I really appreciate you taking the time to run all of these tests. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here are the results of original vs optimized command line options for non-SoG and SoG versions of r3584: GitHub: Ricks-Lab Instagram: ricks_labs |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
I don't have any solid measurements, but the Wu's do seem to be crunching faster on my RX480, with these settings, even though its a less powerfull processor with fever CU's. I'll have to find a way to measure the differences :) |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I have repeated the same tuning process for r3584_SoG and confirmed that the optimized parameters are the same as I found for noSoG, so the previous optimization verification run for r3584_SoG is final. Next step is to do the same for Fiji based GPU. But I need a time to catch up on other stuff first. GitHub: Ricks-Lab Instagram: ricks_labs |
PappaLitto Send message Joined: 27 Jul 15 Posts: 11 Credit: 1,579,218 RAC: 8 |
Hey Rick, What happened to your Youtube channel? I loved it and hope it returns |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
Rick writes this in the "About" section at his Youtube channel: "I have been the target of a doxing attack and harassment, so I have suspended the channel until I can get things figured out. In the meantime, I will posting lab updates on Instagram: rpc_labs" I find it very sad, as I also enjoyed watching his channel. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I hope to be back soon. |
Marco Vandebergh ( SETI orphan ) Send message Joined: 27 Aug 10 Posts: 39 Credit: 12,630,994 RAC: 9 |
Sorry for not being on topic, but where does one find such specific information for the GPU? For my 1080Ti, how does one know work group size and memory banks and so on? I'm eager to learn :) |
Mike Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80 |
Some things you can read in your tasks list. OpenCL Platform Name: NVIDIA CUDA Number of devices: 1 Max compute units: 28 Max work group size: 1024 Max clock frequency: 1683Mhz Max memory allocation: 2952790016 Cache type: Read/Write Cache line size: 128 Cache size: 458752 Global memory size: 11811160064 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: Yes Name: GeForce GTX 1080 Ti Vendor: NVIDIA Corporation Driver version: 388.31 Version: OpenCL 1.2 CUDA Some useful info are in the read me file ReadMe_MultiBeam_OpenCL_NV_SoG.txt located in your projects folder. The rest is lots of reading and testing. With each crime and every kindness we birth our future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I have repeated the same tuning process for r3584_SoG and confirmed that the optimized parameters are the same as I found for noSoG, so the previous optimization verification run for r3584_SoG is final. Next step is to do the same for Fiji based GPU. But I need a time to catch up on other stuff first. Would be just great to collect all testing results you gathered and put them in one place for future reference. Thread on forum tends to either be diluted or just go down and disappear. Search could miss it and such tremendous massive of testing work (and resulting data representation) will be lost... Being case study it can provide some generalizations for tuning on other hardware also. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.