Folding Algorithm Characteristics

Message boards : Number crunching : Folding Algorithm Characteristics
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Eddie Cottongim Project Donor
Volunteer tester

Send message
Joined: 15 Sep 01
Posts: 26
Credit: 6,149,499
RAC: 6
United States
Message 770166 - Posted: 19 Jun 2008, 0:18:10 UTC
Last modified: 19 Jun 2008, 0:21:07 UTC

Hi all,

I've been studying seti@home's pulse folding algorithms. I have a few questions (with a specific application in mind, but I think others may learn from this discussion as well).

First, I'll state what I believe to have happened up to this point so you can see if I'm way off:

Baseline Smoothing (once per WU), produces an 8 meg array consisting of ~1 million complex sampled points.

Dechirping (varies but thousands of times per WU), reads the previous array, outputs another 8 meg array of ~1 million complex sampled points

FFT: For each chirp rate, analyze at a variety of FFT sizes/bandwidths. FFT sizes from n=8 to 128k. FFT sizes actually used vary (8 to max at chirp rate 0, with less of the large sizes as chirprate goes to extremes). For each FFT size n, read the dechirped array, perform the FFT across the array (no overlap). Take the complex magnitude of the time values, giving us real data, and do a spike search.

Each of these passes at FFT size m will be stored as a "Power over time" (PoT) array. 4 meg, 1 million real values. This should be interpreted as a 2d array, with height being the number of frequency divisions (FFT size) and the width being the number of time divisions (number of samples(1048576)/FFT size).

Gaussian fitting is done across the table at a constant frequency. (ie pick a row and walk across the columns in this table)

Again, loop through frequencies.
Loop through time in chunks to perform the triplet search and pulse folding. The size of the chunks is based on the slew rate of the telescope and/or some fields out of the WU header. It could be the entire PoT time series at zero slew rate(as long as 131072 entries for FFT length=8 which is probably the most common case), or it could be much smaller.

Ok, here's where I could use some help:
I'd like to understand what is really typical for chunk size (called AdvanceBy in the code) across the population of WU's and within WUs.

I am interested in getting the folding to work fast in a situation where there is very limited fast local memory - lets say 1024 entries (4 kbytes). Thats probably enough for the WU I'm looking at with pulse_pot_length=256, but I have no idea what the population out there looks like. If it often goes above 1024, I'd have a problem as folding would quickly become dependent on a slower memory bus. I don't think there is much way to modify the folding to hit memory less. It would pretty much require a complete pass for every pass of folding until the folds were small enough to fit in cache.

A minor additional question: What purpose does GetFixedPoT serve? Why does the PoT time series need to be a fixed length? To simplify the Gaussian fit?


I hope this makes sense. Thanks for your help.
ID: 770166 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 770505 - Posted: 19 Jun 2008, 21:00:55 UTC - in response to Message 770166.  

Loop through time in chunks to perform the triplet search and pulse folding. The size of the chunks is based on the slew rate of the telescope and/or some fields out of the WU header. It could be the entire PoT time series at zero slew rate(as long as 131072 entries for FFT length=8 which is probably the most common case), or it could be much smaller.

<pulse_max>40960 limits the PoT size for Pulse (and IIRC Triplet) finding.
I'd like to understand what is really typical for chunk size (called AdvanceBy in the code) across the population of WU's and within WUs.

I am interested in getting the folding to work fast in a situation where there is very limited fast local memory - lets say 1024 entries (4 kbytes). Thats probably enough for the WU I'm looking at with pulse_pot_length=256, but I have no idea what the population out there looks like. If it often goes above 1024, I'd have a problem as folding would quickly become dependent on a slower memory bus. I don't think there is much way to modify the folding to hit memory less. It would pretty much require a complete pass for every pass of folding until the folds were small enough to fit in cache.

The telescope usually moves in one of 3 ways:
* It is being used to observe a specific astronomical object, so is driven to counteract Earth's motion and gives slew rates near zero.
* It is doing drift scanning, so gets its motion across the sky strictly from Earth's motion. This gives angle ranges in the 0.35 to 0.45 degree range, slew rates 0.0033 to 0.0042 degrees per second.
* It is doing basketweave scanning (azimuth at 0 or 180, elevation moving up and down), giving high rates of motion with angle range typically 1.5 to 3 degrees.

The first case gives Pulse PoT lengths from 32768 at FFT length 32 to 128 at FFT length 8192 (<pulse_fft_max>8192). Because the chirp/fft pairs at which Pulse finding is done are chosen based on the relationship of chirp rate to how much data is in one beam width, there's about twice as many pairs at FFT length 8192 as at FFT length 4096, and so on down to FFT length 32:

FFTLen: :NumCfft
32: :77
64: :155
128: :311
256: :623
512: :1245
1024: :2489
2048: :4979
4096: :9957
8192: :19913

The same kind of pattern holds for the other scan modes, except that the minimum PoT length is in the 16 to 32 range, the maximum PoT length occurs at FFT length 8 and can be calculated by dividing 131072 by the number of beam widths in the WU (angle range/0.05).
A minor additional question: What purpose does GetFixedPoT serve? Why does the PoT time series need to be a fixed length? To simplify the Gaussian fit?

I think that's right. Gaussian fitting was in version 1 of classic S@H and basically hasn't changed since. It still takes a significant fraction of the crunch time (for the drift scan WUs for which it's used).
                                                                Joe
ID: 770505 · Report as offensive
Profile Eddie Cottongim Project Donor
Volunteer tester

Send message
Joined: 15 Sep 01
Posts: 26
Credit: 6,149,499
RAC: 6
United States
Message 772794 - Posted: 24 Jun 2008, 7:35:34 UTC

Thanks for your reply. That helps a lot.

One interesting tidbit about the folding is the credit calculation:

analysis_state.FLOP_counter+=(PulsePotLen*0.1818181818182+400.0)*PulsePotLen;

I don't understand quite how the period seaches operate but I assume this line is a fair approximation of how many operations are done.

At 40,960 that comes out to around 320 megaflops per chunk, quite intensive!
ID: 772794 · Report as offensive

Message boards : Number crunching : Folding Algorithm Characteristics


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.