Folding Algorithm Characteristics

Author	Message
Eddie Cottongim Volunteer tester Send message Joined: 15 Sep 01 Posts: 26 Credit: 6,149,499 RAC: 6	Message 770166 - Posted: 19 Jun 2008, 0:18:10 UTC Last modified: 19 Jun 2008, 0:21:07 UTC Hi all, I've been studying seti@home's pulse folding algorithms. I have a few questions (with a specific application in mind, but I think others may learn from this discussion as well). First, I'll state what I believe to have happened up to this point so you can see if I'm way off: Baseline Smoothing (once per WU), produces an 8 meg array consisting of ~1 million complex sampled points. Dechirping (varies but thousands of times per WU), reads the previous array, outputs another 8 meg array of ~1 million complex sampled points FFT: For each chirp rate, analyze at a variety of FFT sizes/bandwidths. FFT sizes from n=8 to 128k. FFT sizes actually used vary (8 to max at chirp rate 0, with less of the large sizes as chirprate goes to extremes). For each FFT size n, read the dechirped array, perform the FFT across the array (no overlap). Take the complex magnitude of the time values, giving us real data, and do a spike search. Each of these passes at FFT size m will be stored as a "Power over time" (PoT) array. 4 meg, 1 million real values. This should be interpreted as a 2d array, with height being the number of frequency divisions (FFT size) and the width being the number of time divisions (number of samples(1048576)/FFT size). Gaussian fitting is done across the table at a constant frequency. (ie pick a row and walk across the columns in this table) Again, loop through frequencies. Loop through time in chunks to perform the triplet search and pulse folding. The size of the chunks is based on the slew rate of the telescope and/or some fields out of the WU header. It could be the entire PoT time series at zero slew rate(as long as 131072 entries for FFT length=8 which is probably the most common case), or it could be much smaller. Ok, here's where I could use some help: I'd like to understand what is really typical for chunk size (called AdvanceBy in the code) across the population of WU's and within WUs. I am interested in getting the folding to work fast in a situation where there is very limited fast local memory - lets say 1024 entries (4 kbytes). Thats probably enough for the WU I'm looking at with pulse_pot_length=256, but I have no idea what the population out there looks like. If it often goes above 1024, I'd have a problem as folding would quickly become dependent on a slower memory bus. I don't think there is much way to modify the folding to hit memory less. It would pretty much require a complete pass for every pass of folding until the folds were small enough to fit in cache. A minor additional question: What purpose does GetFixedPoT serve? Why does the PoT time series need to be a fixed length? To simplify the Gaussian fit? I hope this makes sense. Thanks for your help. ID: 770166 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 770505 - Posted: 19 Jun 2008, 21:00:55 UTC - in response to Message 770166. Loop through time in chunks to perform the triplet search and pulse folding. The size of the chunks is based on the slew rate of the telescope and/or some fields out of the WU header. It could be the entire PoT time series at zero slew rate(as long as 131072 entries for FFT length=8 which is probably the most common case), or it could be much smaller. <pulse_max>40960 limits the PoT size for Pulse (and IIRC Triplet) finding. I'd like to understand what is really typical for chunk size (called AdvanceBy in the code) across the population of WU's and within WUs. I am interested in getting the folding to work fast in a situation where there is very limited fast local memory - lets say 1024 entries (4 kbytes). Thats probably enough for the WU I'm looking at with pulse_pot_length=256, but I have no idea what the population out there looks like. If it often goes above 1024, I'd have a problem as folding would quickly become dependent on a slower memory bus. I don't think there is much way to modify the folding to hit memory less. It would pretty much require a complete pass for every pass of folding until the folds were small enough to fit in cache. The telescope usually moves in one of 3 ways: * It is being used to observe a specific astronomical object, so is driven to counteract Earth's motion and gives slew rates near zero. * It is doing drift scanning, so gets its motion across the sky strictly from Earth's motion. This gives angle ranges in the 0.35 to 0.45 degree range, slew rates 0.0033 to 0.0042 degrees per second. * It is doing basketweave scanning (azimuth at 0 or 180, elevation moving up and down), giving high rates of motion with angle range typically 1.5 to 3 degrees. The first case gives Pulse PoT lengths from 32768 at FFT length 32 to 128 at FFT length 8192 (<pulse_fft_max>8192). Because the chirp/fft pairs at which Pulse finding is done are chosen based on the relationship of chirp rate to how much data is in one beam width, there's about twice as many pairs at FFT length 8192 as at FFT length 4096, and so on down to FFT length 32: FFTLen: :NumCfft 32: :77 64: :155 128: :311 256: :623 512: :1245 1024: :2489 2048: :4979 4096: :9957 8192: :19913 The same kind of pattern holds for the other scan modes, except that the minimum PoT length is in the 16 to 32 range, the maximum PoT length occurs at FFT length 8 and can be calculated by dividing 131072 by the number of beam widths in the WU (angle range/0.05). A minor additional question: What purpose does GetFixedPoT serve? Why does the PoT time series need to be a fixed length? To simplify the Gaussian fit? I think that's right. Gaussian fitting was in version 1 of classic S@H and basically hasn't changed since. It still takes a significant fraction of the crunch time (for the drift scan WUs for which it's used). Joe ID: 770505 ·

Eddie Cottongim Volunteer tester Send message Joined: 15 Sep 01 Posts: 26 Credit: 6,149,499 RAC: 6	Message 772794 - Posted: 24 Jun 2008, 7:35:34 UTC Thanks for your reply. That helps a lot. One interesting tidbit about the folding is the credit calculation: analysis_state.FLOP_counter+=(PulsePotLen0.1818181818182+400.0)PulsePotLen; I don't understand quite how the period seaches operate but I assume this line is a fair approximation of how many operations are done. At 40,960 that comes out to around 320 megaflops per chunk, quite intensive! ID: 772794 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.