Optimizing SBS and Period Iterations for the Fury X

Message boards : Number crunching : Optimizing SBS and Period Iterations for the Fury X
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1794288 - Posted: 7 Jun 2016, 23:18:06 UTC

Raistmer's detailed post of MB options over at Lunatics motivated me to look deeper into the impact on overall performance of the SBS and Period Iterations parameters.
http://lunatics.kwsn.info/index.php/topic,1808.0.html

I have completed a DOE which explores the effect of these two factors on overall processing times of Arecibo and GUPPI VLAR WUs. I found that I could get substantially faster processing times for the GUPPI VLARs using a much lower Period Iteration. The tables below show results for 2 sample WUs, Arecibo on the left and GUPPI VLAR on the right. Top tables show total and CPU processing times and bottom tables show percent improvement using 256/60 as the baseline. Dark red indicates verified error, and light red is assumed to cause the same error. For valid results, darker shading indicates better performance.

https://goo.gl/photos/ToZxyJgxrxDFNGSa7


I have been running for the last 1/2 day with the following optimization for these two parameters: -sbs 1024 -period_iterations_num 5
I have verified good results for the Fury X and the Nano. I suspect Hawaii based cards will have similar period iteration optimal value, but this needs to be verified.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1794288 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1794317 - Posted: 8 Jun 2016, 1:46:13 UTC - in response to Message 1794288.  

I've gotten similar results on tahiti and picarin based AMD gpus. Good to see th higher end cards responding well too.

Chris
ID: 1794317 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1794345 - Posted: 8 Jun 2016, 5:09:34 UTC

After about 15 hours, I found the system locked with 3 GPU tasks at 99.99% complete. I rebooted and lowered SBS to 512. RAC still climbing!
ID: 1794345 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1794401 - Posted: 8 Jun 2016, 9:40:24 UTC - in response to Message 1794288.  

You used some very long link for your image which don't work, here is your image:


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1794401 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1794406 - Posted: 8 Jun 2016, 9:55:33 UTC - in response to Message 1794401.  

You used some very long link for your image which don't work, here is your image:


Thanks for posting a link that all can see! It seems the link I used only works for me. Google doesn't support embedded images on message boards, so I just got it from the html while viewing the image. Any better recommendation for future posts?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1794406 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1794418 - Posted: 8 Jun 2016, 11:43:51 UTC - in response to Message 1794406.  

Any better recommendation for future posts?

I just click on your link:
https://goo.gl/photos/ToZxyJgxrxDFNGSa7

... then on Thumbnail, then Right-Click on image -> Open image in new tab
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1794418 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1795283 - Posted: 11 Jun 2016, 3:04:44 UTC

After running several days, I am getting about 6-7 invalid results per day for Arecibo tasks. No invalids for GUPPI tasks. This is the case for both of my systems. Has anyone found any additional tuning to resolve this?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1795283 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1795323 - Posted: 11 Jun 2016, 8:55:57 UTC
Last modified: 11 Jun 2016, 9:00:21 UTC

One of tasks from 3-GPU host:
http://setiathome.berkeley.edu/workunit.php?wuid=2178535079

CUDA50 overflowed:
Spike count: 30
Autocorr count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 0

ATI gave:
Spike count: 0
Autocorr count: 0
Pulse count: 14
Triplet count: 0
Gaussian count: 0

both matched each other CPU stock gave:
Spike count: 0
Autocorr count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 0

Well... I'll try to catch thsi task and repeat offline on own hardware.

EDIT: unfortunately, task file already deleted (even while results listed on WEB frontpage).

So try to pre-copy all downloaded tasks to archive location to be able to present one that happened to be invalid for offline check.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1795323 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1795325 - Posted: 11 Jun 2016, 9:05:26 UTC - in response to Message 1795283.  
Last modified: 11 Jun 2016, 9:08:15 UTC

But that 3-GPU host currently supplies AP invalids too:

http://setiathome.berkeley.edu/workunit.php?wuid=2174519575

GPU gave:
single pulses: 0
repetitive pulses: 2

percent blanked: 2.41
Rep. pulse: num_std_devs=6.956 peak_power=2552.563 dm=2992 peak_bin=256 scale=4 ffa_scale=0 period=470.8912
Rep. pulse: num_std_devs=7.07 peak_power=2643.914 dm=-5008 peak_bin=1616 scale=4 ffa_scale=0 period=452.8656

CPU SSE3 gave:
single pulses: 3
repetitive pulses: 1

percent blanked: 2.39
Rep. pulse: num_std_devs=6.851 peak_power=4333 dm=2672 peak_bin=3808 scale=4 ffa_scale=0 period=267.6526
Single pulse: peak_power=38.01 dm=-5314 fft_num=11173888 peak_bin=11180568 scale=2
Single pulse: peak_power=365.6 dm=6345 fft_num=7831552 peak_bin=7832832 scale=8
Single pulse: peak_power=218.8 dm=10564 fft_num=16302080 peak_bin=16316928 scale=7

I would say results too different to be just some precision issue.

If GPU OCed try to reduce freq.
Check for dust. Check for enough cooling for such 3-GPU host.
In short, looks like hardware issue for now.

EDIT: or incompatible driver.
Please check if others who use same driver with similar hardware recive valid results. 6-7 invalids (even 1 invalid) per day is absolute not acceptable high rate of errors to just leave it as is.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1795325 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1795349 - Posted: 11 Jun 2016, 12:31:56 UTC - in response to Message 1795325.  

But that 3-GPU host currently supplies AP invalids too:

http://setiathome.berkeley.edu/workunit.php?wuid=2174519575

GPU gave:
single pulses: 0
repetitive pulses: 2

percent blanked: 2.41
Rep. pulse: num_std_devs=6.956 peak_power=2552.563 dm=2992 peak_bin=256 scale=4 ffa_scale=0 period=470.8912
Rep. pulse: num_std_devs=7.07 peak_power=2643.914 dm=-5008 peak_bin=1616 scale=4 ffa_scale=0 period=452.8656

CPU SSE3 gave:
single pulses: 3
repetitive pulses: 1

percent blanked: 2.39
Rep. pulse: num_std_devs=6.851 peak_power=4333 dm=2672 peak_bin=3808 scale=4 ffa_scale=0 period=267.6526
Single pulse: peak_power=38.01 dm=-5314 fft_num=11173888 peak_bin=11180568 scale=2
Single pulse: peak_power=365.6 dm=6345 fft_num=7831552 peak_bin=7832832 scale=8
Single pulse: peak_power=218.8 dm=10564 fft_num=16302080 peak_bin=16316928 scale=7

I would say results too different to be just some precision issue.

If GPU OCed try to reduce freq.
Check for dust. Check for enough cooling for such 3-GPU host.
In short, looks like hardware issue for now.

EDIT: or incompatible driver.
Please check if others who use same driver with similar hardware recive valid results. 6-7 invalids (even 1 invalid) per day is absolute not acceptable high rate of errors to just leave it as is.


Others have indicated it is a driver issue and need to drop back to a much older driver. I really can't go back to the older driver due to other issues. Next time I get some AP WU, I will do some tests with bench. Or, are there sample AP WU with results on the Lunatics site? It would be better to have a known case. Also, I noticed other Fiji owners getting good results with no optimization arguments, so I changed my command options, but no new WU yet...
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1795349 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1795352 - Posted: 11 Jun 2016, 12:38:54 UTC - in response to Message 1795323.  

One of tasks from 3-GPU host:
http://setiathome.berkeley.edu/workunit.php?wuid=2178535079

CUDA50 overflowed:
Spike count: 30
Autocorr count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 0

ATI gave:
Spike count: 0
Autocorr count: 0
Pulse count: 14
Triplet count: 0
Gaussian count: 0

both matched each other CPU stock gave:
Spike count: 0
Autocorr count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 0

Well... I'll try to catch thsi task and repeat offline on own hardware.

EDIT: unfortunately, task file already deleted (even while results listed on WEB frontpage).

So try to pre-copy all downloaded tasks to archive location to be able to present one that happened to be invalid for offline check.


I like the idea of copying all of the work units out to catch the problematic one. I will attempt this when I get back to my computer. Unfortunately, I won't be back to my computer for another 3 days...

Also, when I find a problematic WU, can I just copy the output from a valid task and paste to a text file so that MB bench can find it?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1795352 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34259
Credit: 79,922,639
RAC: 80
Germany
Message 1795355 - Posted: 11 Jun 2016, 12:53:08 UTC
Last modified: 11 Jun 2016, 12:57:12 UTC

of course we have test samples for AP too but Lunatics is currently not accessable.

Running without app args will be much slower on your cards.
The fact others doing so doesn`t mean its better ???

Example from one of my benches.

WU : ap_06ap11aa_B3_P0_00374_20151123_10357.wu
AP6_win_x86_SSE2_OpenCL_ATI_r2346.exe :
Elapsed 1623.393 secs
CPU 889.518 secs

AP7_win_x86_SSE2_OpenCL_ATI_r2742.exe -unroll 24 -oclFFT_plan 256 16 256 -tune 1 64 4 1 -tune 2 64 4 1 -ffa_block 2830 -ffa_block_fetch 2830 :
Elapsed 1082.992 secs, speedup: 33.29% ratio: 1.50x
CPU 126.876 secs, speedup: 85.74% ratio: 7.01x



With each crime and every kindness we birth our future.
ID: 1795355 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1795357 - Posted: 11 Jun 2016, 13:05:11 UTC - in response to Message 1795355.  

of course we have test samples for AP too but Lunatics is currently not accessable.

Running without app args will be much slower on your cards.
The fact others doing so doesn`t mean its better ???

Example from one of my benches.

WU : ap_06ap11aa_B3_P0_00374_20151123_10357.wu
AP6_win_x86_SSE2_OpenCL_ATI_r2346.exe :
Elapsed 1623.393 secs
CPU 889.518 secs

AP7_win_x86_SSE2_OpenCL_ATI_r2742.exe -unroll 24 -oclFFT_plan 256 16 256 -tune 1 64 4 1 -tune 2 64 4 1 -ffa_block 2830 -ffa_block_fetch 2830 :
Elapsed 1082.992 secs, speedup: 33.29% ratio: 1.50x
CPU 126.876 secs, speedup: 85.74% ratio: 7.01x


Well, they are getting all valid results on Fiji, so I thought I would give it a try. I will check on the AP reference WUs next week. I hope to fix the MB issue first, since I don't have any AP WUs now.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1795357 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1795371 - Posted: 11 Jun 2016, 14:34:39 UTC - in response to Message 1795357.  

-oclFFT_plan first candidate if broken config suspicted.
Try to remove all -oclFFT_plan options from both lines to see if invalids disappear.
There is no established rules what configs will fine everywhere or why some of them fail. So care needed with this ones.

Example: https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1795371 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34259
Credit: 79,922,639
RAC: 80
Germany
Message 1795378 - Posted: 11 Jun 2016, 15:05:20 UTC - in response to Message 1795371.  

-oclFFT_plan first candidate if broken config suspicted.
Try to remove all -oclFFT_plan options from both lines to see if invalids disappear.
There is no established rules what configs will fine everywhere or why some of them fail. So care needed with this ones.

Example: https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing


I gave him this app args and they did work fine with older drivers.


With each crime and every kindness we birth our future.
ID: 1795378 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1795380 - Posted: 11 Jun 2016, 15:08:18 UTC - in response to Message 1795378.  

-oclFFT_plan first candidate if broken config suspicted.
Try to remove all -oclFFT_plan options from both lines to see if invalids disappear.
There is no established rules what configs will fine everywhere or why some of them fail. So care needed with this ones.

Example: https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing


I gave him this app args and they did work fine with older drivers.

Yes, quick check versus my table also shows they in "green zone"... but it could be changed with next device family or driver iteration. So, amongst all options this one most fragile IMO.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1795380 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34259
Credit: 79,922,639
RAC: 80
Germany
Message 1795398 - Posted: 11 Jun 2016, 16:08:27 UTC - in response to Message 1795380.  

-oclFFT_plan first candidate if broken config suspicted.
Try to remove all -oclFFT_plan options from both lines to see if invalids disappear.
There is no established rules what configs will fine everywhere or why some of them fail. So care needed with this ones.

Example: https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing


I gave him this app args and they did work fine with older drivers.

Yes, quick check versus my table also shows they in "green zone"... but it could be changed with next device family or driver iteration. So, amongst all options this one most fragile IMO.


Of course they are highly optimized.
I dont want to go into details here, not to worry Rick to much.
Since cpu affinity doesn`t work as it should on AP we had to do compromise here.


With each crime and every kindness we birth our future.
ID: 1795398 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1796575 - Posted: 16 Jun 2016, 12:18:59 UTC - in response to Message 1795398.  

Of course they are highly optimized.
I dont want to go into details here, not to worry Rick to much.
Since cpu affinity doesn`t work as it should on AP we had to do compromise here.


I have changed the command line to this for all 3 systems:
-cpu_lock_fixed_cpu 6 -instances_per_device 1 -sbs 768 -hp

I started to receive AP tasks again on 15-Jun and results so far are:
Valid = 15
invalid = 2
Inconclusive = 19
Pending = 44
In Progress = 11

Still not great but much better than before. I wonder if the invalid rate is related to the sporadic MB Invalids that I have been getting. I have lowered the memory clock OC from 530MHz to 525MHz and now to 520MHz.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1796575 · Report as offensive
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,661
RAC: 0
United States
Message 1796577 - Posted: 16 Jun 2016, 12:33:42 UTC

-instances_per_device 1


Is that actually necessary?
Does it do anything compared to not being there?
ID: 1796577 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1796724 - Posted: 16 Jun 2016, 22:56:30 UTC - in response to Message 1796575.  

I have lowered the memory clock OC from 530MHz to 525MHz and now to 520MHz.

While trying out different settings it would be worth removing any overclocks IMHO.

Your card might be right on the edge, and a particular group of settings might result in a significant speed up- and increase in load- that results in errors. Not because of the settings themselves, but because the load has pushed the overclocked hardware over the edge.
It would be a shame if certain settings were written of as resulting in errors, when it was the hardware that couldn't cope with the increased load, not the settings themselves.
Grant
Darwin NT
ID: 1796724 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Optimizing SBS and Period Iterations for the Fury X


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.