RFI removal

The Arecibo telescope picks up many types of RFI, each with its own time/frequency characteristics. One of the main sources is aviation radar, which consists of groups of powerful broadband pulses. We developed a system called radar blanking for immediately detecting this at the telescope, and replacing those periods with noise. So the back end is not involved.

But there are many other sources of RFI that we need to identify and remove in the back end. Most of the RFI sources are terrestrial, so their frequencies don't drift because of reference frame acceleration. The same is true of RFI from geostationary satellites such as communication satellites. However, there's some RFI from non-stationary satellites, and its frequency drifts.

Evaluating RFI removal

No RFI removal algorithm is perfect. They all make errors:

As discussed earlier, we'd like the false negative rate to be low - say one in 100,000 signals - so that the top-scoring multiplets aren't all obvious RFI. But we don't want to be so aggressive that we toss out ET signals, so the false positive rate should also be low - perhaps 1%.

We don't have a really good way of estimating either of these, but we can get a rough idea of the false negative rate based on circumstantial evidence. We know the statistics of noise. In particular, in noise the number of signals falls off exponentially with increasing score (i.e. power, in the case of spikes). However, many RFI sources don't have this property; e.g., they have lots of signals at high power.

Graphing the number of signals as a function of score bears this out; a log plot falls off less than linearly. If we make the same graph after RFI removal, the curve should be closer to linear. This is in fact what we see with our current algorithms.

The green line (signals after RFI removal) is linear over a wider range than the red line (all signals).

To estimate the false positive rate, we could inject synthetic ET signals of various types and powers, run them through the back end, and see which of them are rejected as RFI, and also which are detected by the scoring algorithm. We haven't done this yet, but we should.

Zone RFI

Much RFI consists of narrow-band signals: TV and radio, cell phones, etc. These signals carry information, so they're modulated in some way, which spreads out their frequency range. They're present over most of SETI@home's 17-year history, and the telescope picks them up most of the time, regardless of where it's pointing.

This is called zone RFI because it occurs in particular frequency zones. Our challenge is to identify these zones. We do this by looking for frequency ranges that repeatedly have more than their share of signals. This is done separately for each signal type and FFT length. The algorithm is as follows:

The above steps mark some number of zone bins as local RFI in each time window. Then, for each zone bin, count the number of windows in which it's marked as local RFI. Look at the distribution of these counts. Find the value N for which 5% of the bins are N or greater. Mark those bins as "global RFI". (For Gaussians, which are less prone to RFI, we use 2% rather than 5%).

The above discussion applies to spikes and Gaussians, for which frequency is the main degree of freedom. For pulses and triplets we use a similar algorithm but with period in the place of frequency. We identify a set of period ranges where we see lots of signals during many time periods, and treat these signals as RFI. Similarly, for autocorrs we use a similar algorithm but with delay in the place of frequency.

In each case the zone-finding algorithm generates a set of bitmap files, one for each signal type and FFT length, with one bit per zone bin. The RFI removal program (see below) maps these files into memory. For each signal, it computes its zone bin, then examines the corresponding bit to see whether the signal is in an RFI zone.

See the code for this.

Multi-beam RFI

When a signal is coming from a point in space, most of its energy goes into only one of the 7 beams in the Arecibo telescope. However, terrestrial RFI signals often are picked up in multiple beams simultaneously.

This is the basis of multi-beam RFI detection. The idea is to identify pairs of signals that

and to mark both signals as RFI.

What do we mean by "about the same frequency and time"? We can think of a signal as occupying a rectangle in frequency/time space, corresponding to the FFT bin(s) in which it was detected. To account for roundoff and noise we expand this rectangle: 5X in frequency and 11X in time. If S is a signal, let rect(S) denote this expanded rectangle. Then we use the following criterion: two signals S and T are considered similar if the center of rect(S) is contained in rect(T), or vice versa.

How can we compute this efficiently? For this purpose I used a wonderful data structure called an R-tree. I'd like to say I already knew about R-trees, but in fact I discovered them using a web search.

An R-tree stores a set of points or rectangles in any number of dimensions. It then lets you do various types of queries: given another rectangle, you can find which rectangles it overlaps, or which it's contained in, and so on. All this can be done efficiently: the cost of adding or removing an item is log(N) where N is the size of the set, and the cost of doing a query is Mlog(N), where M is the number of items returned.

Many implementations of R-trees are available. I tried a few, and chose the one that is part of the Boost geometry library. It was both fast and general.

The multi-beam algorithm works as follows. We scan the signals of a given type in time order, maintaining a "sliding window" whose duration is limited to that of the longest possible rectangle. We maintain two R-trees:

When adding a signal to the window, we check whether its center is contained in an existing rectangle, in which case we mark both signals as RFI. Then we check whether its rectangle contains any of the existing centers, and do the same thing.

See the code for RFI removal, including multi-beam.

Low chirp rate RFI

Terrestrial RFI isn't chirped, because the sender's reference frame is the same as the receiver. Celestial signals are very unlikely to have near-zero chirp rate. So we remove signals with longer FFT lengths (i.e. precise frequency measurement) and very low chirp rates.

Next: Finding persistent signals.




 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.