## Finding persistent signals

Now we come to the part where we actually find ET! Although SETI@home is able to detect short, one-time ET transmissions, it's best at detecting beacons that remain on for many years. So we search through the entire set of non-RFI signals, looking for groups of signals that are close in sky position and frequency, but possibly spread out over time.

### Multiplets

These groups of signals are called "multiplets". We look for two kinds of multiplets, different in the size of the frequency window:

• Barycentric: for this type of multiplet, we assume that ET is sending a directional beacon, and that they modulate its frequency to cancel out chirp due to the transmitter's acceleration in that direction. In other words, a barycentric signal would be received at a fixed frequency in an inertial reference frame. ET might transmit a signal like this if they were trying to communicate with us, or if they were transmitting to their own deep-space probe. Once we compensate for our own acceleration, such a signal would be detected at more or less a fixed frequency, so we can use a small frequency window when looking for barycentric multiplets.
• Non-barycentric: for these, we assume that the transmitted frequency is not adjusted for transmitter acceleration, either because it's omnidirectional or because it's not intended for receivers in other reference frames. We need to use a wider frequency window, corresponding to the likely limits of the transmitter's radial velocity.

For each multiplet we compute the probability that it would have occurred in noise. The lower the probability, the more interesting the multiplet. The multiplet's "score" is the log of the probability (this gives better numerical resolution for extremely small probabilities). When we refer to "high-scoring" multiplets we actually mean those with large negative values.

Originally, we found multiplets separately for each signal type. However, doing this throws away sensitivity, because an ET signal could manifest as both spikes and gaussians, depending on the slew rate of the telescope. Similarly with pulses and triplets. So starting in March 2018 we look for mixed-type multiplets: spikes/gaussians, and pulses/triplets. Autocorrs are handled separately.

### Pixels

SETI@home uses a system called Healpix that divides the celestial sphere into small rectangles called "pixels". Healpix allows for different resolutions:

We use a resolution at which each pixel is about as large as a telescope beam. This resolution divides the sphere into 226 (about 67 million) pixels, of which about 16 million are visible from the Arecibo telescope.

After finding the multiplets in a given pixel, we compute a score for the pixel itself. This score is based on the number of multiplets and their scores, and on the presence of stars, and especially Sun-like stars, in the pixel. Pixel scoring lets us detect ET transmissions that manifest as several multiplets; e.g. a signal with several frequencies.

### Algorithms

I'll now describe the algorithms for finding the multiplets in a given pixel, and for scoring the pixel.

#### Assembling signals

For each signal type, we make an in-memory list of the signals within a beam width of the pixel. The area of a pixel is a bit less than the beam size of the telescope. We include all the signals in a disc that is centered at this pixel and contains parts of the 8 adjacent pixels. These lists are in the range of 100 to 100K signals.

#### Scanning signals

We sort the list of signals by their barycentric frequency. Then we do the following steps, with different frequency range parameters, to find the barycentric and non-barycentric multiplets. For barycentric the frequency range is 10 Hz; for non-barycentric, 50K Hz.

We scan the frequency-sorted list in order, maintaining a "sliding window" of signals, whose size is limited by the frequency range parameter. When we append a signal, if the new signal is outside the range, and the window has > 1 signal, we try to form a multiplet from the signals in the window. If this succeeds, we trim signals from the window up to the highest-frequency signal in the multiplet, then try to form a multiplet from the remaining signals, and so on. This ensures that multiplets are disjoint, i.e. no 2 multiplets contain the same signals.

#### Chirp pruning

If a group of signals are from the same ET source, then over short periods of time (say, an hour):

• They'd have roughly the same chirp rate.
• Their frequencies, when adjusted by this chirp rate, would be similar. For example, if the chirp rate is 1 Hz/sec, a signal one minute later should have a frequency about 60Hz greater.
The "chirp pruning" takes a set of signals and returns a subset that is consistent in these ways. It tries to find a subset that is likely to produce the highest-scoring multiplet. It works as follows:
• Divide the set of signals into .1 day segments.
• For each segment, find the 20 Hz/sec chirp range for which the sum of powers of signals in that range is greatest; discard the signals outside this range.
• Find the median chirp of the remaining signals.
• Adjust the frequencies of the signals by this chirp rate.
• Find the 10KHz range for which the sum of powers of signals in that range is greatest; discard signals outside this range.

#### Observation selection

An "observation" is a contiguous time interval during which a beam sees a given pixel. An ET signal could produce many signals during a given observation. for example, a narrow-band signal may be detected as a spike in several consecutive FFTs. But these spikes are all really the same signal. To calculate the scores of multiplets properly, we need to include only one of them in the multiplet.

In practice, during scoring we don't have the detailed pointing information we'd need to identify observations. Instead, we consider signals separated by less than 10 minutes to be in the same observation.

If there multiple signals in an observation, we want to include the one that will contribute most to a high multiplet score. The score of a multiplet (see below) reflects its compactness in frequency and sky position, and the power of its component signals. So we compute the median frequency and sky position of the signals in the multiplet, and compute a signal "score" based on proximity to these medians and on signal power. Then, from each observation, we keep the signal with the highest score and discard the others.

#### Scoring multiplets

We then compute a score for each multiplet. This score includes several factors, including:

• The scores of the component signals.
• The "tightness" of the signals, as measured by the standard deviation of sky position, chirp rate, and barycentric frequency.

#### Scoring pixels

Once we've found all the multiplets (barycentric and non-barycentric) for a pixel, we compute the score of the pixel. This includes several factors, including:

• The number and scores of the multiplets.
• Stars: the presence of stars in the pixel increases its score, and the presence of Sun-like stars increases it more.

#### Normalizing multiplet scores

The distribution of multiplet scores varies between signal types. To allow meaningful ranking of multiplets of different types, we "normalize" the scores. This is done after large numbers of pixels have been processed.

This is done as follows. For each signal type we find the set of pixels having a multiplet of that type. For each such pixel we find top-scoring multiplet. Then we compute the median, over pixels, of these top scores. The differences between these medians determine normalization factors that are added to multiplet scores.

### Code

If you know C++ pretty well, you might be interested in looking at the source code for the above algorithms. In fact, I encourage you to read the code carefully and let me know if you find any bugs or problems.

Here are some starting points (click to view the file, then search for the function):

Assembling signals: candidate_set_t::assemble().

Scanning signals: signal_disc_t::find_multiplets().

Chirp pruning: signal_disc_t::prune_chirp().

Observation selection: signal_disc_t::observation_selection().

Multiplet scoring: Code: signal_disc_t::score_multiplet().

Pixel scoring: meta_candidate_score(). Note: in the code, "meta-candidate" is synonymous with "pixel".

Multiplet score normalization: multiplet_stats.php.

Next: Birdies