Optimizing SBS and Period Iterations for the Fury X

Author	Message
RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1794288 - Posted: 7 Jun 2016, 23:18:06 UTC Raistmer's detailed post of MB options over at Lunatics motivated me to look deeper into the impact on overall performance of the SBS and Period Iterations parameters. http://lunatics.kwsn.info/index.php/topic,1808.0.html I have completed a DOE which explores the effect of these two factors on overall processing times of Arecibo and GUPPI VLAR WUs. I found that I could get substantially faster processing times for the GUPPI VLARs using a much lower Period Iteration. The tables below show results for 2 sample WUs, Arecibo on the left and GUPPI VLAR on the right. Top tables show total and CPU processing times and bottom tables show percent improvement using 256/60 as the baseline. Dark red indicates verified error, and light red is assumed to cause the same error. For valid results, darker shading indicates better performance. https://goo.gl/photos/ToZxyJgxrxDFNGSa7 I have been running for the last 1/2 day with the following optimization for these two parameters: -sbs 1024 -period_iterations_num 5 I have verified good results for the Fury X and the Nano. I suspect Hawaii based cards will have similar period iteration optimal value, but this needs to be verified. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1794288 ·

Chris Adamek Volunteer tester Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236	Message 1794317 - Posted: 8 Jun 2016, 1:46:13 UTC - in response to Message 1794288. I've gotten similar results on tahiti and picarin based AMD gpus. Good to see th higher end cards responding well too. Chris ID: 1794317 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1794345 - Posted: 8 Jun 2016, 5:09:34 UTC After about 15 hours, I found the system locked with 3 GPU tasks at 99.99% complete. I rebooted and lowered SBS to 512. RAC still climbing! ID: 1794345 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1794401 - Posted: 8 Jun 2016, 9:40:24 UTC - in response to Message 1794288. You used some very long link for your image which don't work, here is your image: Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1794401 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1794406 - Posted: 8 Jun 2016, 9:55:33 UTC - in response to Message 1794401. You used some very long link for your image which don't work, here is your image: Thanks for posting a link that all can see! It seems the link I used only works for me. Google doesn't support embedded images on message boards, so I just got it from the html while viewing the image. Any better recommendation for future posts? GitHub: Ricks-Lab Instagram: ricks_labs ID: 1794406 ·

BilBg Volunteer tester Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0	Message 1794418 - Posted: 8 Jun 2016, 11:43:51 UTC - in response to Message 1794406. Any better recommendation for future posts? I just click on your link: https://goo.gl/photos/ToZxyJgxrxDFNGSa7 ... then on Thumbnail, then Right-Click on image -> Open image in new tab Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â ID: 1794418 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1795283 - Posted: 11 Jun 2016, 3:04:44 UTC After running several days, I am getting about 6-7 invalid results per day for Arecibo tasks. No invalids for GUPPI tasks. This is the case for both of my systems. Has anyone found any additional tuning to resolve this? GitHub: Ricks-Lab Instagram: ricks_labs ID: 1795283 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1795323 - Posted: 11 Jun 2016, 8:55:57 UTC Last modified: 11 Jun 2016, 9:00:21 UTC One of tasks from 3-GPU host: http://setiathome.berkeley.edu/workunit.php?wuid=2178535079 CUDA50 overflowed: Spike count: 30 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 ATI gave: Spike count: 0 Autocorr count: 0 Pulse count: 14 Triplet count: 0 Gaussian count: 0 both matched each other CPU stock gave: Spike count: 0 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 Well... I'll try to catch thsi task and repeat offline on own hardware. EDIT: unfortunately, task file already deleted (even while results listed on WEB frontpage). So try to pre-copy all downloaded tasks to archive location to be able to present one that happened to be invalid for offline check. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1795323 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1795325 - Posted: 11 Jun 2016, 9:05:26 UTC - in response to Message 1795283. Last modified: 11 Jun 2016, 9:08:15 UTC But that 3-GPU host currently supplies AP invalids too: http://setiathome.berkeley.edu/workunit.php?wuid=2174519575 GPU gave: single pulses: 0 repetitive pulses: 2 percent blanked: 2.41 Rep. pulse: num_std_devs=6.956 peak_power=2552.563 dm=2992 peak_bin=256 scale=4 ffa_scale=0 period=470.8912 Rep. pulse: num_std_devs=7.07 peak_power=2643.914 dm=-5008 peak_bin=1616 scale=4 ffa_scale=0 period=452.8656 CPU SSE3 gave: single pulses: 3 repetitive pulses: 1 percent blanked: 2.39 Rep. pulse: num_std_devs=6.851 peak_power=4333 dm=2672 peak_bin=3808 scale=4 ffa_scale=0 period=267.6526 Single pulse: peak_power=38.01 dm=-5314 fft_num=11173888 peak_bin=11180568 scale=2 Single pulse: peak_power=365.6 dm=6345 fft_num=7831552 peak_bin=7832832 scale=8 Single pulse: peak_power=218.8 dm=10564 fft_num=16302080 peak_bin=16316928 scale=7 I would say results too different to be just some precision issue. If GPU OCed try to reduce freq. Check for dust. Check for enough cooling for such 3-GPU host. In short, looks like hardware issue for now. EDIT: or incompatible driver. Please check if others who use same driver with similar hardware recive valid results. 6-7 invalids (even 1 invalid) per day is absolute not acceptable high rate of errors to just leave it as is. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1795325 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1795349 - Posted: 11 Jun 2016, 12:31:56 UTC - in response to Message 1795325. But that 3-GPU host currently supplies AP invalids too: http://setiathome.berkeley.edu/workunit.php?wuid=2174519575 GPU gave: single pulses: 0 repetitive pulses: 2 percent blanked: 2.41 Rep. pulse: num_std_devs=6.956 peak_power=2552.563 dm=2992 peak_bin=256 scale=4 ffa_scale=0 period=470.8912 Rep. pulse: num_std_devs=7.07 peak_power=2643.914 dm=-5008 peak_bin=1616 scale=4 ffa_scale=0 period=452.8656 CPU SSE3 gave: single pulses: 3 repetitive pulses: 1 percent blanked: 2.39 Rep. pulse: num_std_devs=6.851 peak_power=4333 dm=2672 peak_bin=3808 scale=4 ffa_scale=0 period=267.6526 Single pulse: peak_power=38.01 dm=-5314 fft_num=11173888 peak_bin=11180568 scale=2 Single pulse: peak_power=365.6 dm=6345 fft_num=7831552 peak_bin=7832832 scale=8 Single pulse: peak_power=218.8 dm=10564 fft_num=16302080 peak_bin=16316928 scale=7 I would say results too different to be just some precision issue. If GPU OCed try to reduce freq. Check for dust. Check for enough cooling for such 3-GPU host. In short, looks like hardware issue for now. EDIT: or incompatible driver. Please check if others who use same driver with similar hardware recive valid results. 6-7 invalids (even 1 invalid) per day is absolute not acceptable high rate of errors to just leave it as is. Others have indicated it is a driver issue and need to drop back to a much older driver. I really can't go back to the older driver due to other issues. Next time I get some AP WU, I will do some tests with bench. Or, are there sample AP WU with results on the Lunatics site? It would be better to have a known case. Also, I noticed other Fiji owners getting good results with no optimization arguments, so I changed my command options, but no new WU yet... GitHub: Ricks-Lab Instagram: ricks_labs ID: 1795349 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1795352 - Posted: 11 Jun 2016, 12:38:54 UTC - in response to Message 1795323. One of tasks from 3-GPU host: http://setiathome.berkeley.edu/workunit.php?wuid=2178535079 CUDA50 overflowed: Spike count: 30 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 ATI gave: Spike count: 0 Autocorr count: 0 Pulse count: 14 Triplet count: 0 Gaussian count: 0 both matched each other CPU stock gave: Spike count: 0 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 Well... I'll try to catch thsi task and repeat offline on own hardware. EDIT: unfortunately, task file already deleted (even while results listed on WEB frontpage). So try to pre-copy all downloaded tasks to archive location to be able to present one that happened to be invalid for offline check. I like the idea of copying all of the work units out to catch the problematic one. I will attempt this when I get back to my computer. Unfortunately, I won't be back to my computer for another 3 days... Also, when I find a problematic WU, can I just copy the output from a valid task and paste to a text file so that MB bench can find it? GitHub: Ricks-Lab Instagram: ricks_labs ID: 1795352 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34259 Credit: 79,922,639 RAC: 80	Message 1795355 - Posted: 11 Jun 2016, 12:53:08 UTC Last modified: 11 Jun 2016, 12:57:12 UTC of course we have test samples for AP too but Lunatics is currently not accessable. Running without app args will be much slower on your cards. The fact others doing so doesn`t mean its better ??? Example from one of my benches. WU : ap_06ap11aa_B3_P0_00374_20151123_10357.wu AP6_win_x86_SSE2_OpenCL_ATI_r2346.exe : Elapsed 1623.393 secs CPU 889.518 secs AP7_win_x86_SSE2_OpenCL_ATI_r2742.exe -unroll 24 -oclFFT_plan 256 16 256 -tune 1 64 4 1 -tune 2 64 4 1 -ffa_block 2830 -ffa_block_fetch 2830 : Elapsed 1082.992 secs, speedup: 33.29% ratio: 1.50x CPU 126.876 secs, speedup: 85.74% ratio: 7.01x With each crime and every kindness we birth our future. ID: 1795355 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1795357 - Posted: 11 Jun 2016, 13:05:11 UTC - in response to Message 1795355. of course we have test samples for AP too but Lunatics is currently not accessable. Running without app args will be much slower on your cards. The fact others doing so doesn`t mean its better ??? Example from one of my benches. WU : ap_06ap11aa_B3_P0_00374_20151123_10357.wu AP6_win_x86_SSE2_OpenCL_ATI_r2346.exe : Elapsed 1623.393 secs CPU 889.518 secs AP7_win_x86_SSE2_OpenCL_ATI_r2742.exe -unroll 24 -oclFFT_plan 256 16 256 -tune 1 64 4 1 -tune 2 64 4 1 -ffa_block 2830 -ffa_block_fetch 2830 : Elapsed 1082.992 secs, speedup: 33.29% ratio: 1.50x CPU 126.876 secs, speedup: 85.74% ratio: 7.01x Well, they are getting all valid results on Fiji, so I thought I would give it a try. I will check on the AP reference WUs next week. I hope to fix the MB issue first, since I don't have any AP WUs now. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1795357 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1795371 - Posted: 11 Jun 2016, 14:34:39 UTC - in response to Message 1795357. -oclFFT_plan first candidate if broken config suspicted. Try to remove all -oclFFT_plan options from both lines to see if invalids disappear. There is no established rules what configs will fine everywhere or why some of them fail. So care needed with this ones. Example: https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1795371 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34259 Credit: 79,922,639 RAC: 80	Message 1795378 - Posted: 11 Jun 2016, 15:05:20 UTC - in response to Message 1795371. -oclFFT_plan first candidate if broken config suspicted. Try to remove all -oclFFT_plan options from both lines to see if invalids disappear. There is no established rules what configs will fine everywhere or why some of them fail. So care needed with this ones. Example: https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing I gave him this app args and they did work fine with older drivers. With each crime and every kindness we birth our future. ID: 1795378 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1795380 - Posted: 11 Jun 2016, 15:08:18 UTC - in response to Message 1795378. -oclFFT_plan first candidate if broken config suspicted. Try to remove all -oclFFT_plan options from both lines to see if invalids disappear. There is no established rules what configs will fine everywhere or why some of them fail. So care needed with this ones. Example: https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing I gave him this app args and they did work fine with older drivers. Yes, quick check versus my table also shows they in "green zone"... but it could be changed with next device family or driver iteration. So, amongst all options this one most fragile IMO. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1795380 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34259 Credit: 79,922,639 RAC: 80	Message 1795398 - Posted: 11 Jun 2016, 16:08:27 UTC - in response to Message 1795380. -oclFFT_plan first candidate if broken config suspicted. Try to remove all -oclFFT_plan options from both lines to see if invalids disappear. There is no established rules what configs will fine everywhere or why some of them fail. So care needed with this ones. Example: https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing I gave him this app args and they did work fine with older drivers. Yes, quick check versus my table also shows they in "green zone"... but it could be changed with next device family or driver iteration. So, amongst all options this one most fragile IMO. Of course they are highly optimized. I dont want to go into details here, not to worry Rick to much. Since cpu affinity doesn`t work as it should on AP we had to do compromise here. With each crime and every kindness we birth our future. ID: 1795398 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1796575 - Posted: 16 Jun 2016, 12:18:59 UTC - in response to Message 1795398. Of course they are highly optimized. I dont want to go into details here, not to worry Rick to much. Since cpu affinity doesn`t work as it should on AP we had to do compromise here. I have changed the command line to this for all 3 systems: -cpu_lock_fixed_cpu 6 -instances_per_device 1 -sbs 768 -hp I started to receive AP tasks again on 15-Jun and results so far are: Valid = 15 invalid = 2 Inconclusive = 19 Pending = 44 In Progress = 11 Still not great but much better than before. I wonder if the invalid rate is related to the sporadic MB Invalids that I have been getting. I have lowered the memory clock OC from 530MHz to 525MHz and now to 520MHz. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1796575 ·

Rasputin42 Volunteer tester Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0	Message 1796577 - Posted: 16 Jun 2016, 12:33:42 UTC -instances_per_device 1 Is that actually necessary? Does it do anything compared to not being there? ID: 1796577 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1796724 - Posted: 16 Jun 2016, 22:56:30 UTC - in response to Message 1796575. I have lowered the memory clock OC from 530MHz to 525MHz and now to 520MHz. While trying out different settings it would be worth removing any overclocks IMHO. Your card might be right on the edge, and a particular group of settings might result in a significant speed up- and increase in load- that results in errors. Not because of the settings themselves, but because the load has pushed the overclocked hardware over the edge. It would be a shame if certain settings were written of as resulting in errors, when it was the hardware that couldn't cope with the increased load, not the settings themselves. Grant Darwin NT ID: 1796724 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.