The Arecibo telescope picks up many types of RFI, each with its own time/frequency characteristics. One of the main sources is aviation radar, which consists of groups of powerful broadband pulses. We developed a system called radar blanking for immediately detecting this at the telescope, and replacing those periods with noise. So the back end is not involved.

But there are many other sources of RFI that we need to identify and remove in the back end. Most of the RFI sources are terrestrial, so their frequencies don't drift because of reference frame acceleration. The same is true of RFI from geostationary satellites such as communication satellites. However, there's some RFI from non-stationary satellites, and its frequency drifts.

Much RFI consists of narrow-band signals: TV and radio, cell phones, etc. These signals carry information, so they're modulated in some way, which spreads out their frequency range. They're present over most of SETI@home's 17-year history, and the telescope picks them up most of the time, regardless of where it's pointing.

This is called **zone RFI** because it occurs in particular frequency zones.
Our challenge is to identify these zones.
We do this by looking for frequency ranges that repeatedly have
more than their share of signals.
This is done separately for each signal type and FFT length.
The algorithm is as follows:

- Divide the signals into time "windows". Each window is at least .1 days long, and is extended if needed to contain at least the average number of signals per .05 days.
- For each window, divide the signals into "zone bins" of 2X the FFT frequency resolution (this factor compensates for the imprecision of de-chirping). Count the number of signals in each bin.
- If a zone bin's count is large enough that the chance of it occurring in noise is less than 1e-7, mark that bin as "local RFI"; zero out those bins.
- Merge zone bins in groups of two, and repeat the previous step.
- Repeat this until the merging factor exceeds the FFT length.

The above discussion applies to spikes and Gaussians, for which frequency is the main degree of freedom. For pulses and triplets we use a similar algorithm but with period in the place of frequency. We identify a set of period ranges where we see lots of signals during many time periods, and treat these signals as RFI. Similarly, for autocorrs we use a similar algorithm but with delay in the place of frequency.

In each case the zone-finding algorithm generates a set of bitmap files, one for each signal type and FFT length, with one bit per zone bin. The RFI removal program (see below) maps these files into memory. For each signal, it computes its zone bin, then examines the corresponding bit to see whether the signal is in an RFI zone.

This is the basis of **multi-beam RFI detection**.
The idea is to identify pairs of signals that

- have the same type
- have about the same frequency and time
- were detected in different beams

What do we mean by "about the same frequency and time"?
We can think of a signal as occupying a rectangle in frequency/time space,
corresponding to the FFT bin(s) in which it was detected.
To account for roundoff and noise we expand this rectangle:
5X in frequency and 11X in time.
If S is a signal, let rect(S) denote this expanded rectangle.
Then we use the following criterion:
**two signals S and T are considered similar if the center
of rect(S) is contained in rect(T), or vice versa**.

The multi-beam algorithm works as follows. We scan the signals of a given type in time order, maintaining a "sliding window" whose duration is limited to that of the longest possible rectangle. We maintain two lists (stored as R-trees):

- The rectangles of the signals in the window.
- The centers of the rectangles of these signals.

See the code for RFI removal, including multi-beam.

This algorithm removes narrow-band RFI whose frequency varies over time. There are several articles about it in the Nebula blog, e.g.: this one.

No RFI removal algorithm is perfect. They all make errors:

- False positives: non-RFI signals (possibly ET) that are removed.
- False negatives: RFI signals that are not removed.

We don't have a really good way of estimating either of these, but we can get a rough idea of the false negative rate based on circumstantial evidence. We know the statistics of noise. In particular, in noise the number of signals falls off exponentially with increasing score (i.e. power, in the case of spikes). However, many RFI sources don't have this property; e.g., they have lots of signals at high power.

Graphing the number of signals as a function of score bears this out; a log plot falls off less than linearly. If we make the same graph after RFI removal, the curve should be closer to linear. This is in fact what we see with our current algorithms.

The green line (signals after RFI removal) is linear over a wider range than the red line (all signals).

To estimate the false positive rate, we could inject synthetic ET signals of various types and powers, run them through the back end, and see which of them are rejected as RFI, and also which are detected by the scoring algorithm. We haven't done this yet, but we should.

Next: Finding persistent signals.

©2018 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.