Documentation of multibeam app internal programming?

Message boards : Number crunching : Documentation of multibeam app internal programming?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1980477 - Posted: 15 Feb 2019, 10:53:44 UTC - in response to Message 1980370.  

One other thing, looking at the code I'm guessing it chops up the PoT down to a minimum of 32 time slices divided in half?

Thank you.


Maybe, it depends on number of iterations in PulseFind loop for particular WU.
Useful time length depends on how fast teleskope moved while recording data so different ARs have different PulseFind configs. So called VHARs don't do PoT analysis at all.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1980477 · Report as offensive
Ben

Send message
Joined: 15 Jun 99
Posts: 54
Credit: 60,003,756
RAC: 150
United States
Message 1981040 - Posted: 18 Feb 2019, 20:04:33 UTC

Is there a maximum size for the PoT that is handed to the pulse finder routine? I would like to store it in local or private memory but I don't know how big it can get. Thank you.
ID: 1981040 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1981052 - Posted: 18 Feb 2019, 22:01:07 UTC - in response to Message 1981040.  

Is there a maximum size for the PoT that is handed to the pulse finder routine? I would like to store it in local or private memory but I don't know how big it can get. Thank you.

Will not fit.
Data matrix consists of 1M points.
Minimal number of FFT bands is8 (for VLARs). So it's 1024*1024/8 points per single PoT array with 8 such arrays per icfft cycle.
Each point 4 bytes (float).

If you'll lookinto CL file (maybe second one) you'll find different variants of PulseFind kernels I tried, some of them based on local memory.
Feasiblenotfor all sizes.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1981052 · Report as offensive
Ben

Send message
Joined: 15 Jun 99
Posts: 54
Credit: 60,003,756
RAC: 150
United States
Message 1981147 - Posted: 19 Feb 2019, 18:58:52 UTC

Besides optimizing the GPU kernels I would also like to trace a corruption bug that seems to only effect Radeon GPUs using the new AMD ROCm OpenCL drivers. Is there any way to "turn off" individual kernels and have the work done on the CPU instead. That would help to narrow down the source of the problem. Thank you.
ID: 1981147 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1981880 - Posted: 23 Feb 2019, 19:18:49 UTC - in response to Message 1981147.  

Besides optimizing the GPU kernels I would also like to trace a corruption bug that seems to only effect Radeon GPUs using the new AMD ROCm OpenCL drivers. Is there any way to "turn off" individual kernels and have the work done on the CPU instead. That would help to narrow down the source of the problem. Thank you.

Well, it's not so easy cause data stored on GPU.
For start you could catch repeatable case for offline testing.
Then to process it via SoG and via non-SoG versions - will both experience problems?
Next there is -v 4 switch to provide additional info about each found signal.

And for heavy debugging there are many array dump points wrapped in #if 0/#endif directives left from initial developing.
For PoT searches it's possible to emulate signal reporting from GPU to make CPU re-process data (look how very first icfft cycle processed - it always goes to CPU).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1981880 · Report as offensive
Ben

Send message
Joined: 15 Jun 99
Posts: 54
Credit: 60,003,756
RAC: 150
United States
Message 1982939 - Posted: 1 Mar 2019, 19:28:27 UTC - in response to Message 1981880.  

The current version of the seti app for AMD/ATI seems to be 8.22. The one I have been working with from svn identifies itself as 8.18. How do I download the source for the newest version? Thank you.

On a different note, have anyone tried turning the big matrix sideways before processing it? It seems like it should run faster that way.
ID: 1982939 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1983182 - Posted: 3 Mar 2019, 18:05:37 UTC - in response to Message 1982939.  
Last modified: 3 Mar 2019, 18:08:25 UTC

The current version of the seti app for AMD/ATI seems to be 8.22. The one I have been working with from svn identifies itself as 8.18. How do I download the source for the newest version? Thank you.

On a different note, have anyone tried turning the big matrix sideways before processing it? It seems like it should run faster that way.

1) SVN contains fresh code (actually, more recent than released one in v8.22). Version info assigned on release may not be properly reflected in sources themselves.
What acually matters -SVN revision number - it's used in OpenCL binaries caching so one can easely say what SVN revision particular binary was build from.
Just look at name of cache files.

2) Do you mean transpose?
Well, at least initial AMD GPUs had no real local memory and NV has coalescing requirements so better performance (and here is big difference with CPU memory reading) achieved if each thread/work item read ajacent element in row, not column. in PoT this corresponds non-transposed matrix (so, CPU does transpose, GPU doesn't).
EDIT: that way each workitem/thread reads column but workgroup in lock-step reads rows. For CPU single thread shoul read row, not column.
Modern GPUs could have different memory organization so feel free to experiment with transpose. Sometimes memory organization could be changed just by reversing (x,y)<->(y,x) for 2D kernels. No actual transpose required.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1983182 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1986741 - Posted: 23 Mar 2019, 14:16:38 UTC

Any success in profiling?
Please keep us all updated, very interesting if you could make some adaptations for new GPU cards generations...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1986741 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Documentation of multibeam app internal programming?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.