RFI removal

The Arecibo telescope picks up many types of RFI, each with its own time/frequency characteristics. One of the main sources is aviation radar, which consists of groups of powerful broadband pulses. We developed a system called radar blanking for immediately detecting this at the telescope, and replacing those periods with noise. So the back end is not involved.

But there are many other sources of RFI that we need to identify and remove in the back end. Most of the RFI sources are terrestrial, so their frequencies don't drift because of reference frame acceleration. The same is true of RFI from geostationary satellites such as communication satellites. However, there's some RFI from non-stationary satellites, and its frequency drifts.

Zone RFI

Much RFI consists of narrow-band signals: TV and radio, cell phones, etc. These signals carry information, so they're modulated in some way, which spreads out their frequency range. They're present over most of SETI@home's 17-year history, and the telescope picks them up most of the time, regardless of where it's pointing.

This is called zone RFI because it occurs in particular frequency zones. Our challenge is to identify these zones. We do this by looking for frequency ranges that repeatedly have more than their share of signals. This is done separately for each signal type and FFT length. The algorithm is as follows:

• Divide the signals into time "windows". Each window is at least .1 days long, and is extended if needed to contain at least the average number of signals per .05 days.
• For each window, divide the signals into "zone bins" of 2X the FFT frequency resolution (this factor compensates for the imprecision of de-chirping). Count the number of signals in each bin.
• If a zone bin's count is large enough that the chance of it occurring in noise is less than 1e-7, mark that bin as "local RFI"; zero out those bins.
• Merge zone bins in groups of two, and repeat the previous step.
• Repeat this until the merging factor exceeds the FFT length.
The above steps mark some number of zone bins as local RFI in each time window. Then, for each zone bin, count the number of windows in which it's marked as local RFI. Look at the distribution of these counts. Find the value N for which 5% of the bins are N or greater. Mark those bins as "global RFI". (For Gaussians, which are less prone to RFI, we use 2% rather than 5%).

The above discussion applies to spikes and Gaussians, for which frequency is the main degree of freedom. For pulses and triplets we use a similar algorithm but with period in the place of frequency. We identify a set of period ranges where we see lots of signals during many time periods, and treat these signals as RFI. Similarly, for autocorrs we use a similar algorithm but with delay in the place of frequency.

In each case the zone-finding algorithm generates a set of bitmap files, one for each signal type and FFT length, with one bit per zone bin. The RFI removal program (see below) maps these files into memory. For each signal, it computes its zone bin, then examines the corresponding bit to see whether the signal is in an RFI zone.

Multi-beam RFI

When a signal is coming from a point in space, most of its energy goes into only one of the 7 beams in the Arecibo telescope. However, terrestrial RFI signals often are picked up in multiple beams simultaneously.

This is the basis of multi-beam RFI detection. The idea is to identify pairs of signals that

• have the same type
• have about the same frequency and time
• were detected in different beams
and to mark both signals as RFI.

What do we mean by "about the same frequency and time"? We can think of a signal as occupying a rectangle in frequency/time space, corresponding to the FFT bin(s) in which it was detected. To account for roundoff and noise we expand this rectangle: 5X in frequency and 11X in time. If S is a signal, let rect(S) denote this expanded rectangle. Then we use the following criterion: two signals S and T are considered similar if the center of rect(S) is contained in rect(T), or vice versa.

The multi-beam algorithm works as follows. We scan the signals of a given type in time order, maintaining a "sliding window" whose duration is limited to that of the longest possible rectangle. We maintain two lists (stored as R-trees):

• The rectangles of the signals in the window.
• The centers of the rectangles of these signals.
When adding a signal to the window, we check whether its center is contained in an existing rectangle, in which case we mark both signals as RFI. Then we check whether its rectangle contains any of the existing centers, and do the same thing.

Low chirp rate RFI

Terrestrial RFI isn't chirped, because the sender's reference frame is the same as the receiver. Celestial signals are very unlikely to have near-zero chirp rate. So we remove signals with longer FFT lengths (i.e. precise frequency measurement) and very low chirp rates.

Drifting RFI

This algorithm removes narrow-band RFI whose frequency varies over time. There are several articles about it in the Nebula blog, e.g.: this one.

Evaluating RFI removal

No RFI removal algorithm is perfect. They all make errors:

• False positives: non-RFI signals (possibly ET) that are removed.
• False negatives: RFI signals that are not removed.
As discussed earlier, we'd like the false negative rate to be low - say one in 100,000 signals - so that the top-scoring multiplets aren't all obvious RFI. But we don't want to be so aggressive that we toss out ET signals, so the false positive rate should also be low - perhaps 1%.

We don't have a really good way of estimating either of these, but we can get a rough idea of the false negative rate based on circumstantial evidence. We know the statistics of noise. In particular, in noise the number of signals falls off exponentially with increasing score (i.e. power, in the case of spikes). However, many RFI sources don't have this property; e.g., they have lots of signals at high power.

Graphing the number of signals as a function of score bears this out; a log plot falls off less than linearly. If we make the same graph after RFI removal, the curve should be closer to linear. This is in fact what we see with our current algorithms.

The green line (signals after RFI removal) is linear over a wider range than the red line (all signals).

To estimate the false positive rate, we could inject synthetic ET signals of various types and powers, run them through the back end, and see which of them are rejected as RFI, and also which are detected by the scoring algorithm. We haven't done this yet, but we should.

Next: Finding persistent signals.