Message boards :
Number crunching :
Documentation of multibeam app internal programming?
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
One other thing, looking at the code I'm guessing it chops up the PoT down to a minimum of 32 time slices divided in half? Maybe, it depends on number of iterations in PulseFind loop for particular WU. Useful time length depends on how fast teleskope moved while recording data so different ARs have different PulseFind configs. So called VHARs don't do PoT analysis at all. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Ben Send message Joined: 15 Jun 99 Posts: 54 Credit: 60,003,756 RAC: 150 |
Is there a maximum size for the PoT that is handed to the pulse finder routine? I would like to store it in local or private memory but I don't know how big it can get. Thank you. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Is there a maximum size for the PoT that is handed to the pulse finder routine? I would like to store it in local or private memory but I don't know how big it can get. Thank you. Will not fit. Data matrix consists of 1M points. Minimal number of FFT bands is8 (for VLARs). So it's 1024*1024/8 points per single PoT array with 8 such arrays per icfft cycle. Each point 4 bytes (float). If you'll lookinto CL file (maybe second one) you'll find different variants of PulseFind kernels I tried, some of them based on local memory. Feasiblenotfor all sizes. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Ben Send message Joined: 15 Jun 99 Posts: 54 Credit: 60,003,756 RAC: 150 |
Besides optimizing the GPU kernels I would also like to trace a corruption bug that seems to only effect Radeon GPUs using the new AMD ROCm OpenCL drivers. Is there any way to "turn off" individual kernels and have the work done on the CPU instead. That would help to narrow down the source of the problem. Thank you. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Besides optimizing the GPU kernels I would also like to trace a corruption bug that seems to only effect Radeon GPUs using the new AMD ROCm OpenCL drivers. Is there any way to "turn off" individual kernels and have the work done on the CPU instead. That would help to narrow down the source of the problem. Thank you. Well, it's not so easy cause data stored on GPU. For start you could catch repeatable case for offline testing. Then to process it via SoG and via non-SoG versions - will both experience problems? Next there is -v 4 switch to provide additional info about each found signal. And for heavy debugging there are many array dump points wrapped in #if 0/#endif directives left from initial developing. For PoT searches it's possible to emulate signal reporting from GPU to make CPU re-process data (look how very first icfft cycle processed - it always goes to CPU). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Ben Send message Joined: 15 Jun 99 Posts: 54 Credit: 60,003,756 RAC: 150 |
The current version of the seti app for AMD/ATI seems to be 8.22. The one I have been working with from svn identifies itself as 8.18. How do I download the source for the newest version? Thank you. On a different note, have anyone tried turning the big matrix sideways before processing it? It seems like it should run faster that way. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
The current version of the seti app for AMD/ATI seems to be 8.22. The one I have been working with from svn identifies itself as 8.18. How do I download the source for the newest version? Thank you. 1) SVN contains fresh code (actually, more recent than released one in v8.22). Version info assigned on release may not be properly reflected in sources themselves. What acually matters -SVN revision number - it's used in OpenCL binaries caching so one can easely say what SVN revision particular binary was build from. Just look at name of cache files. 2) Do you mean transpose? Well, at least initial AMD GPUs had no real local memory and NV has coalescing requirements so better performance (and here is big difference with CPU memory reading) achieved if each thread/work item read ajacent element in row, not column. in PoT this corresponds non-transposed matrix (so, CPU does transpose, GPU doesn't). EDIT: that way each workitem/thread reads column but workgroup in lock-step reads rows. For CPU single thread shoul read row, not column. Modern GPUs could have different memory organization so feel free to experiment with transpose. Sometimes memory organization could be changed just by reversing (x,y)<->(y,x) for 2D kernels. No actual transpose required. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Any success in profiling? Please keep us all updated, very interesting if you could make some adaptations for new GPU cards generations... SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.