The Arecibo telescope picks up many types of RFI, each with its own time/frequency characteristics. One of the main sources is aviation radar, which consists of groups of powerful broadband pulses. We developed a system called radar blanking for immediately detecting this at the telescope, and replacing those periods with noise. So the back end is not involved.
But there are many other sources of RFI that we need to identify and remove in the back end. Most of the RFI sources are terrestrial, so their frequencies don't drift because of reference frame acceleration. The same is true of RFI from geostationary satellites such as communication satellites. However, there's some RFI from non-stationary satellites, and its frequency drifts.
No RFI removal algorithm is perfect. They all make errors:
We don't have a really good way of estimating either of these, but we can get a rough idea of the false negative rate based on circumstantial evidence. We know the statistics of noise. In particular, in noise the number of signals falls off exponentially with increasing score (i.e. power, in the case of spikes). However, many RFI sources don't have this property; e.g., they have lots of signals at high power.
Graphing the number of signals as a function of score bears this out; a log plot falls off less than linearly. If we make the same graph after RFI removal, the curve should be closer to linear. This is in fact what we see with our current algorithms.
The green line (signals after RFI removal) is linear over a wider range than the red line (all signals).
To estimate the false positive rate, we could inject synthetic ET signals of various types and powers, run them through the back end, and see which of them are rejected as RFI, and also which are detected by the scoring algorithm. We haven't done this yet, but we should.
Much RFI consists of narrow-band signals: TV and radio, cell phones, etc. These signals carry information, so they're modulated in some way, which spreads out their frequency range. They're present over most of SETI@home's 17-year history, and the telescope picks them up most of the time, regardless of where it's pointing.
This is called zone RFI because it occurs in particular frequency zones. Our challenge is to identify these zones. We do this by looking for frequency ranges that repeatedly have more than their share of signals. This is done separately for each signal type and FFT length. The algorithm is as follows:
The above discussion applies to spikes and Gaussians, for which frequency is the main degree of freedom. For pulses and triplets we use a similar algorithm but with period in the place of frequency. We identify a set of period ranges where we see lots of signals during many time periods, and treat these signals as RFI. Similarly, for autocorrs we use a similar algorithm but with delay in the place of frequency.
In each case the zone-finding algorithm generates a set of bitmap files, one for each signal type and FFT length, with one bit per zone bin. The RFI removal program (see below) maps these files into memory. For each signal, it computes its zone bin, then examines the corresponding bit to see whether the signal is in an RFI zone.
This is the basis of multi-beam RFI detection. The idea is to identify pairs of signals that
What do we mean by "about the same frequency and time"? We can think of a signal as occupying a rectangle in frequency/time space, corresponding to the FFT bin(s) in which it was detected. To account for roundoff and noise we expand this rectangle: 5X in frequency and 11X in time. If S is a signal, let rect(S) denote this expanded rectangle. Then we use the following criterion: two signals S and T are considered similar if the center of rect(S) is contained in rect(T), or vice versa.
How can we compute this efficiently? For this purpose I used a wonderful data structure called an R-tree. I'd like to say I already knew about R-trees, but in fact I discovered them using a web search.
An R-tree stores a set of points or rectangles in any number of dimensions. It then lets you do various types of queries: given another rectangle, you can find which rectangles it overlaps, or which it's contained in, and so on. All this can be done efficiently: the cost of adding or removing an item is log(N) where N is the size of the set, and the cost of doing a query is Mlog(N), where M is the number of items returned.
Many implementations of R-trees are available. I tried a few, and chose the one that is part of the Boost geometry library. It was both fast and general.
The multi-beam algorithm works as follows. We scan the signals of a given type in time order, maintaining a "sliding window" whose duration is limited to that of the longest possible rectangle. We maintain two R-trees:
Next: Finding persistent signals.
©2018 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.