Joined: 13 Feb 99
For the couple of months I've been working (somewhat sporadically) on improving the algorithm for finding multiplets: groups of signals at roughly the same sky position and frequency, but possibly spread out over time. This is really the essence of SETI@home: we've covered much of the sky several times, and we have all the results databased, so we are able look for "persistent" signals (beacons that are on for years) with great sensitivity.
For a given sky position and frequency window, we may have hundreds or thousands of signals; that's because we're listening to noise. The challence is to find subsets of these groups of signals (perhaps 5 or 10) that are most likely not to be noise. This involves a number of criteria: how many signals there are, how closely packed they are in frequency and sky position, how powerful they are, and how consistent are their parameters such as chirp rate, and for some signal types, period or delay.
Our current multiplet-finding algorithm has two phases:
1) "Observation exclusion": we only want to count one signal per telescope pointing; otherwise we'd essentially include the same signal multiple times and inflate the score of the multiplet.
2) "Pruning": given a set of signals, remove those that aren't consistent in terms of chirp rate and frequency. For pulses, triplets and autocorrs, additionally remove those that aren't consistent in terms of period and delay.
In my last blog entry I described some formulas we were trying for observation exclusion. It turns out these didn't work. I found that the best approach is simple: don't use two signals separated by less than 10 minutes. I made this change, and the current online results reflect this.
When I looked that the results - which consisted almost entirely of 2-signal multiplets - I realized that the multiplet-finding algorithm is fundamentally flawed, and I suspect that the multiplets it finds are far from optimal.
There are two basic problems:
1) Observation exclusion currently keeps the highest-power signal in each 10-minute period. This will generally yield a set of signals that aren't close in position or frequency, and that aren't consistent in chirp rate. We're doing things in the wrong order: we need to decide on the "center of gravity" of the multiplet first, and then do exclusion.
2) Pruning is being done in a dumb way: given a set of signals, we find the two that are closest in chirp, and discard those outside a threshold from these two. This may throw away signals that would yield a much better multiplet.
I scratched my head for quite a while, and eventually came up with an approach that I think will work well. The gist of it:
- Do pruning first, THEN observation exclusion.
- Do both of them in a better way, i.e. one that produces high-scoring multiplets.
Let's start with chirp pruning. We're going to discard signals outside some chirp band. What band should we use? The one with the most and best signals; i.e. the one for which the sum of the powers of signals in that chirp band is greatest. This can be computed efficienly by making a histogram of chirp rates, weighted by power.
In the case of chirp pruning we do this separately for each .1-day interval. Period and delay pruning are similar except we use the entire time span.
Now we have a consistent set. Let's do observation exclusion in a way that favors keeping "close" signals.
- Compute the medians of barycentric frequency, RA, and dec.
- Give each signal a score that reflects both its power and its deviation from these medians.
- Do exclusion based on this score.
That's it. Finding the highest-score multiplet is NP-hard (I think) but this collection of heuristics should get fairly close.
Performance: this may be somewhat slower than the current algorithm. Probably not an issue. If it is, we can change the outer logic to look at signal sets per half-overlapping frequency band, rather than per signal.
Joined: 25 Nov 01
Just a wild idea for a check for how the processing handles the inevitable interference:
Do you have the luxury of enough data and enough processing to do multiple runs where you work with only:
1. Data acquired during known (guaranteed?) non-RFI periods (eg overnight local time with no aircraft and no radar?);
2. ... vs data acquired in the direction of the same stars during known RFI periods...
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
Joined: 29 Jun 99
I just crunch and donate a little bit, you're doing heavy lifting, I am in awe and have deep gratitude.
©2020 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.