Nebula manual

Introduction

Nebula is a software system for post-processing of data from radio SETI instruments. Its input is a collection of "detections" generated by a front-end system. Nebula has been used with two front-end systems.

SERENDIP: the front end computes FFTs of a fixed length; detections are FFT bins with power above a threshold. The data band is typically several 100 MHz. This was done at Arecibo (with the ALFA 7-beam receiver) and FAST (with a 19-beam receiver).
SETI@home: there are five types of detections, with a range of FFT lengths. The data is a 2.5 MHz band. It was collected at Arecibo with the ALFA receiver.

Nebula is designed to be extensible to other radio SETI sky surveys, with different telescope and receiver parameters (beam width, number of beams, frequency range, and so on).

Nebula consists of two related parts:

A pipeline of programs that, given a set of detections,
- Optionally add 'birdie' detections designed to simulate actual ET signals of various powers, locations, and orbital parameters.
- Remove RFI
- Find "multiplets": groups of detections that are possible artifacts of an ET signal.
The pipeline programs are mostly written in C++, with some Python and shell scripts.
A set of web pages that provide summaries and visualizations of the pipeline results. These are written in PHP. For SETI@home, these pages are here.

Nebula is developed and maintained by David Anderson. Let me know if you have any problems or suggestions.

Sky surveys and targeted searches

Nebula was designed for SETI@home, which is a commensal sky survey. The observations covered most of the visible sky, but non-uniformly: some parts were observed many more times and for longer periods than others. The birdie and multiplet-related features of Nebula are relevant only to sky surveys.

Nebula's RFI removal features can potentially be used for targeted searches as well. Two of its RFI algorithms (zone and drifting) work in all situations. One of them (multibeam) works best if there is data from different sky positions in short ranges of time, either because there is a multibeam receiver or because the pointing moves back and forth over the target.

Computing resources

Nebula runs on Linux. The programs use CPUs, not GPUs.

Typically you'll use a complex of computers, one of which is storage server (with lots of RAIDed storage), mounted via NFS over 1/10Gbit Ethernet from compute nodes.

The RFI removal program is multi-threaded. It will run faster on a host with lots of CPU cores. It's somewhat I/O intensive.

Note: RFI removal was developed for SETI@home, which has a narrow frequency band (2.5MHz). When we tried it with Arecibo/SERENDIP data (several 100 MHz) there were performance problems which we eventually solved by splitting the data into multiple bands of several 10 MHz.

The multiplet finding can optionally be parallelized across a Linux cluster, using HTCondor to manage the jobs. For SETI@home, we used a cluster of about 1000 nodes, and it took about 10 hours.

To use Nebula's web interfaces, at least one of the nodes must run Apache and be visible via HTTP from the outside Internet.

Installing Nebula

The following instructions are a starting point, but I'm sure you'll run into roadblocks; if so please contact me.

Install dependencies

Install the GSL library.
Install the Healpix library.
Install the fitsio library.
Install Boost.
Install rapidjson

Download and build Nebula

The Nebula source code is on SourceForge. To get it, run

git clone https://git.code.sf.net/p/seti-science/code seti-science-code

on a Linux system. To compile the C++ programs, run "make" in nebula/.

Download additional files

star_unload: a database of stars.

lband_sky_float_nside2048_eq.qpix : a map of background noise levels, for the Healpix resolution used by SETI@home.

Input data files

The input to Nebula is a set of files in database dump format (pipe-separated fields). For SETI@home this includes a number of files.

For SERENDIP it includes only "spike_unload" (the set of detections) and "star_unload" (a database of stars). The original SERENDIP data is in FITS files. To extract data from these files and convert it to DB dump format, run

s6_get_hits [options]
--dir X
     recursively scan subdirectories of X for FITS files (.fits suffix)
     (default: current dir)
--outfile x
     write results to x (default: s6_hits.txt)
--dir_exclude x
     exclude directories whose names include x
--file_require x
     read only files whose names include x

Nebula data directories

A Nebula "data directory" holds an input data set and derived files. Each data directory must contain:

a symbolic link "nebula" to the directory where Nebula programs are;
a subdirectory "unload" containing the DB unload files; copy star_unload here.
a subdirectory "data" (intermediate files are stored here); copy lband_sky_float_nside2048_eq.qpix here.
if SERENDIP data, an empty file "s6".

All Nebula commands expect to be run in a data directory. You can have multiple data directories on the same computer, corresponding to different experiments.

Data source parameters

Nebula is designed to work with data sources from different observatories (Arecibo, GBT, FAST), different receivers, and different recorders (S@h, SERENDIP). These data sources have different parameters (beam width, bandwidth, FFT length, etc.)

The set of parameters is defined in nebula/data_params.h. The values are set at runtime in Nebula programs (see data_params.cpp), so we don't need different versions of the executables for different experiments. If you want to add a new data source you'll need to add some code to data_params.cpp.

The Nebula pipeline

The major stages of the Nebula pipeline are

birdie generation
RFI removal
multiplet finding and scoring

The pipeline consists of many steps, each of which can take a long time. Nebula uses "make" to keep track of which steps have been done. The Makefile "nebula/makefile_pipeline" describes the sequence of steps. The successful completion of each step produces a "done" file; for example, RFI removal produces "remove_rfi_done". Each step is represented by a make "rule" that specifies

what the prerequisites are (usually the "done" file from the previous step)
the command to run the step.

To run the pipeline, run

make -f nebula/makefile_pipeline

from a data directory.

You may need to run the pipeline starting from a certain point. For example, suppose you modified the RFI removal program. In that case, do

rm remove_rfi_done
make -f nebula/makefile_pipeline

That will run the pipeline starting from RFI removal.

There are two special-purpose makefiles:

"makefile_basic": do a run without birdies.
"makefile_birdies": do a run with birdies, and put results in score_birdie/ rather than score/

Files

A list of the files created by Nebula is here.

Using Nebula's web features

Nebula's web features are based on code that's part of BOINC. To start, download BOINC:

git clone https://github.com/BOINC/boinc.git

but don't build it.

You may need to install some packages like PHP and MySQL.

Then create a BOINC project: go to boinc/tools and type

./make_project --web_only nebula

This will create a directory ~/projects/nebula and put some files there. It will also create a file 'nebula.http.conf'. As root, copy this to your Apache config directory, typically '/etc/apaches/sites-enabled/'. Restart apache.

Create a symbolic link 'seti_science/nebula/boinc' pointing to '~/projects/nebula/html/user'.

Create a symbolic link '~/projects/nebula/html/user/nebula' pointing to 'seti_science/nebula''.

©2026 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.