Message boards :
Number crunching :
Optimizing SBS and Period Iterations for the Fury X
Message board moderation
Author | Message |
---|---|
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Raistmer's detailed post of MB options over at Lunatics motivated me to look deeper into the impact on overall performance of the SBS and Period Iterations parameters. http://lunatics.kwsn.info/index.php/topic,1808.0.html I have completed a DOE which explores the effect of these two factors on overall processing times of Arecibo and GUPPI VLAR WUs. I found that I could get substantially faster processing times for the GUPPI VLARs using a much lower Period Iteration. The tables below show results for 2 sample WUs, Arecibo on the left and GUPPI VLAR on the right. Top tables show total and CPU processing times and bottom tables show percent improvement using 256/60 as the baseline. Dark red indicates verified error, and light red is assumed to cause the same error. For valid results, darker shading indicates better performance. https://goo.gl/photos/ToZxyJgxrxDFNGSa7 I have been running for the last 1/2 day with the following optimization for these two parameters: -sbs 1024 -period_iterations_num 5 I have verified good results for the Fury X and the Nano. I suspect Hawaii based cards will have similar period iteration optimal value, but this needs to be verified. GitHub: Ricks-Lab Instagram: ricks_labs |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
I've gotten similar results on tahiti and picarin based AMD gpus. Good to see th higher end cards responding well too. Chris |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
After about 15 hours, I found the system locked with 3 GPU tasks at 99.99% complete. I rebooted and lowered SBS to 512. RAC still climbing! |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
You used some very long link for your image which don't work, here is your image: Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
You used some very long link for your image which don't work, here is your image: Thanks for posting a link that all can see! It seems the link I used only works for me. Google doesn't support embedded images on message boards, so I just got it from the html while viewing the image. Any better recommendation for future posts? GitHub: Ricks-Lab Instagram: ricks_labs |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Any better recommendation for future posts? I just click on your link: https://goo.gl/photos/ToZxyJgxrxDFNGSa7 ... then on Thumbnail, then Right-Click on image -> Open image in new tab  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
After running several days, I am getting about 6-7 invalid results per day for Arecibo tasks. No invalids for GUPPI tasks. This is the case for both of my systems. Has anyone found any additional tuning to resolve this? GitHub: Ricks-Lab Instagram: ricks_labs |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
One of tasks from 3-GPU host: http://setiathome.berkeley.edu/workunit.php?wuid=2178535079 CUDA50 overflowed: Spike count: 30 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 ATI gave: Spike count: 0 Autocorr count: 0 Pulse count: 14 Triplet count: 0 Gaussian count: 0 both matched each other CPU stock gave: Spike count: 0 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 Well... I'll try to catch thsi task and repeat offline on own hardware. EDIT: unfortunately, task file already deleted (even while results listed on WEB frontpage). So try to pre-copy all downloaded tasks to archive location to be able to present one that happened to be invalid for offline check. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
But that 3-GPU host currently supplies AP invalids too: http://setiathome.berkeley.edu/workunit.php?wuid=2174519575 GPU gave: single pulses: 0 repetitive pulses: 2 percent blanked: 2.41 Rep. pulse: num_std_devs=6.956 peak_power=2552.563 dm=2992 peak_bin=256 scale=4 ffa_scale=0 period=470.8912 Rep. pulse: num_std_devs=7.07 peak_power=2643.914 dm=-5008 peak_bin=1616 scale=4 ffa_scale=0 period=452.8656 CPU SSE3 gave: single pulses: 3 repetitive pulses: 1 percent blanked: 2.39 Rep. pulse: num_std_devs=6.851 peak_power=4333 dm=2672 peak_bin=3808 scale=4 ffa_scale=0 period=267.6526 Single pulse: peak_power=38.01 dm=-5314 fft_num=11173888 peak_bin=11180568 scale=2 Single pulse: peak_power=365.6 dm=6345 fft_num=7831552 peak_bin=7832832 scale=8 Single pulse: peak_power=218.8 dm=10564 fft_num=16302080 peak_bin=16316928 scale=7 I would say results too different to be just some precision issue. If GPU OCed try to reduce freq. Check for dust. Check for enough cooling for such 3-GPU host. In short, looks like hardware issue for now. EDIT: or incompatible driver. Please check if others who use same driver with similar hardware recive valid results. 6-7 invalids (even 1 invalid) per day is absolute not acceptable high rate of errors to just leave it as is. SETI apps news We're not gonna fight them. We're gonna transcend them. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
But that 3-GPU host currently supplies AP invalids too: Others have indicated it is a driver issue and need to drop back to a much older driver. I really can't go back to the older driver due to other issues. Next time I get some AP WU, I will do some tests with bench. Or, are there sample AP WU with results on the Lunatics site? It would be better to have a known case. Also, I noticed other Fiji owners getting good results with no optimization arguments, so I changed my command options, but no new WU yet... GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
One of tasks from 3-GPU host: I like the idea of copying all of the work units out to catch the problematic one. I will attempt this when I get back to my computer. Unfortunately, I won't be back to my computer for another 3 days... Also, when I find a problematic WU, can I just copy the output from a valid task and paste to a text file so that MB bench can find it? GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34259 Credit: 79,922,639 RAC: 80 |
of course we have test samples for AP too but Lunatics is currently not accessable. Running without app args will be much slower on your cards. The fact others doing so doesn`t mean its better ??? Example from one of my benches. WU : ap_06ap11aa_B3_P0_00374_20151123_10357.wu AP6_win_x86_SSE2_OpenCL_ATI_r2346.exe : Elapsed 1623.393 secs CPU 889.518 secs AP7_win_x86_SSE2_OpenCL_ATI_r2742.exe -unroll 24 -oclFFT_plan 256 16 256 -tune 1 64 4 1 -tune 2 64 4 1 -ffa_block 2830 -ffa_block_fetch 2830 : Elapsed 1082.992 secs, speedup: 33.29% ratio: 1.50x CPU 126.876 secs, speedup: 85.74% ratio: 7.01x With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
of course we have test samples for AP too but Lunatics is currently not accessable. Well, they are getting all valid results on Fiji, so I thought I would give it a try. I will check on the AP reference WUs next week. I hope to fix the MB issue first, since I don't have any AP WUs now. GitHub: Ricks-Lab Instagram: ricks_labs |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
-oclFFT_plan first candidate if broken config suspicted. Try to remove all -oclFFT_plan options from both lines to see if invalids disappear. There is no established rules what configs will fine everywhere or why some of them fail. So care needed with this ones. Example: https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing SETI apps news We're not gonna fight them. We're gonna transcend them. |
Mike Send message Joined: 17 Feb 01 Posts: 34259 Credit: 79,922,639 RAC: 80 |
-oclFFT_plan first candidate if broken config suspicted. I gave him this app args and they did work fine with older drivers. With each crime and every kindness we birth our future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
-oclFFT_plan first candidate if broken config suspicted. Yes, quick check versus my table also shows they in "green zone"... but it could be changed with next device family or driver iteration. So, amongst all options this one most fragile IMO. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Mike Send message Joined: 17 Feb 01 Posts: 34259 Credit: 79,922,639 RAC: 80 |
-oclFFT_plan first candidate if broken config suspicted. Of course they are highly optimized. I dont want to go into details here, not to worry Rick to much. Since cpu affinity doesn`t work as it should on AP we had to do compromise here. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Of course they are highly optimized. I have changed the command line to this for all 3 systems: -cpu_lock_fixed_cpu 6 -instances_per_device 1 -sbs 768 -hp I started to receive AP tasks again on 15-Jun and results so far are: Valid = 15 invalid = 2 Inconclusive = 19 Pending = 44 In Progress = 11 Still not great but much better than before. I wonder if the invalid rate is related to the sporadic MB Invalids that I have been getting. I have lowered the memory clock OC from 530MHz to 525MHz and now to 520MHz. GitHub: Ricks-Lab Instagram: ricks_labs |
Rasputin42 Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0 |
-instances_per_device 1 Is that actually necessary? Does it do anything compared to not being there? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
I have lowered the memory clock OC from 530MHz to 525MHz and now to 520MHz. While trying out different settings it would be worth removing any overclocks IMHO. Your card might be right on the edge, and a particular group of settings might result in a significant speed up- and increase in load- that results in errors. Not because of the settings themselves, but because the load has pushed the overclocked hardware over the edge. It would be a shame if certain settings were written of as resulting in errors, when it was the hardware that couldn't cope with the increased load, not the settings themselves. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.