Finding persistent signals

Now we come to the part where we actually find ET! Although SETI@home is able to detect short, one-time ET transmissions, it's best at detecting beacons that remain on for many years. So we search through the entire set of non-RFI signals, looking for groups of signals that are close in sky position and frequency, but possibly spread out over time.

These groups of signals are called "multiplets". We look for two kinds of multiplets, different in the size of the frequency window:

For each multiplet we compute the probability that it would have occurred in noise. The lower the probability, the more interesting the multiplet. The multiplet's "score" is the log of the probability (this gives better numerical resolution for extremely small probabilities). When we refer to "high-scoring" multiplets we actually mean those with large negative values.

Currently, we find multiplets separately for each signal type; a given multiplet has only one type of signal.


SETI@home uses a system called Healpix that divides the celestial sphere into small rectangles called "pixels". Healpix allows for different resolutions:

We use a resolution at which each pixel is about as large as a telescope beam. This resolution divides the sphere into 226 (about 67 million) pixels, of which about 16 million are visible from the Arecibo telescope.

After finding the multiplets in a given pixel, we compute a score for the pixel itself. This score is based on the number of multiplets and their scores, and on the presence of stars, and especially Sun-like stars, in the pixel. Pixel scoring lets us detect ET transmissions that are made up of several signal types, or that are spread across multiple frequencies.


I'll now describe the algorithms for finding the multiplets in a given pixel, and for scoring the pixel. The code for this is spread out across a lot of files; I'll give pointers to the important functions.

Code: run_task()

Assembling signals

For each signal type, we assemble an in-memory list of the signals within a beam width of the pixel. The area of a pixel is roughly the beam size of the telescope. We assemble all the signals in a disc that is centered at this pixel and contains parts of the 8 adjacent pixels. With Nebula, assembling these signals is extremely fast because we have an indexed binary file of signals ordered by pixel.

Code: candidate_set_t::assemble(), signal_disc_t::assemble()

Finding multiplets

We sort the list of signals by their barycentric frequency. We need to worry about performance: the signal objects are of the old classes with expensive constructors. Sorting the list directly would be slow. So instead we sort a list of pointers to the objects, then copy them just once at the end.

Code: signal_disc_t::multiplets()

Then we do the following steps, with different frequency window parameters, to find the barycentric and non-barycentric multiplets.

Code: signal_disc_t::find_multiplets()

Pick one signal per observation

An observation is a contiguous time interval during which the telescope beam includes a particular point in the sky. The SETI@home client often finds the same signal repeatedly during an observation; for example, a narrow-band signal may be detected as a spike in several consecutive FFTs. But these spikes are all really the same signal, and we want to include it only once in the multiplet.

We approximate observations by a fixed time interval: 107 seconds (the workunit duration) for Gaussians, and 12.5 seconds (the longest FFT duration) for other types.

We want to prune the signal list so that we keep only one (the highest-scoring one) per observation period. This is done as follows:

Finding candidate multiplets

We compute a set of "candidate multiplets" as follows:

Each candidate multiplet is described by first and last indices of its signals in the frequency-sorted list; it consists of the signals in that range for which the "ignore" flag is not set. This is efficient; it eliminates a possible O(N2) behavior.

We then compute a score for each candidate multiplet. This score includes many factors:

Code: signal_disc_t::score_multiplet()

Pruning the set of multiplets

Before Nebula, that was the end of multiplet-finding; all the candidate multiplets were output, and used for pixel scoring. When I looked closely, I realized that in most cases the multiplets were highly overlapped; there were sequences of multiplets like

This wasn't right; these multiplets weren't really separate, and they cause inflated pixel scores.

I discussed this with Eric and Jeff. Our first idea was to merge overlapping multiplets into bigger multiplets. But this led to giant multiplets with wide frequency range and hence lower scores.

So I came up with a different approach: output the candidate multiplets in order of decreasing score, discarding multiplets that overlap ones already output. This is implemented as follows:

This approach seems to work well.

Scoring the pixel

Once we've found all the multiplets (barycentric and non-barycentric) for a pixel, we compute the score of the pixel. This includes many factors:

Code: distill_meta_candidate(), meta_candidate_score(). Note: in the code, "meta-candidate" is synonymous with "pixel".

Next: Storing and indexing data

©2018 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.