We've been working almost entirely on refining the drifting RFI algorithm. This - I hope - is the last big piece before we do our final Nebula run and finish the paper.
The goal, as with all RFI algorithms, is to:
- Remove all the RFI. We've bookmarked a lot of examples of drifting RFI. There's a spectrum: some examples are clearly RFI, others are less obvious. When we change the algorithm, we look at these examples to make sure we're still removing at least the obvious ones. We also look at the top-ranking spike/gaussian multiplets. It's OK if a small fraction of these are RFI; we can skip over them in the manual inspection process. But if most of them are RFI we need to change the algorithm.
- Remove only RFI. We don't want the algorithm to remove an ET signal. To check for this, we see what fraction of birdie spikes are flagged as drifting RFI. This is inevitably nonzero - some birdie detections happen to lie in regions of drifting RFI - but it shouldn't exceed a few percent. Also, we monitor the fraction of all spikes flagged as drifting RFI; 10% is plausible; 20% is probably too high.
Recall that the drifting algorithm takes a "vertex" detection D and forms two fans of triangles in time/frequency space, emanating from D in the positive and negative time directions. We count the number of detections in each triangle that are "far" from D (a couple of beam widths or more) and compute a probability for each triangle; the more detections in the triangle, the lower the probability.
Recent changes:
- Previously, we looked at the product of the probabilities of opposing triangles; if this was below a threshold, we flagged both of them as RFI. This didn't work well in some cases; we tried various alternatives. Our current approach uses two thresholds. If a triangle's probability is below 1e-8, we flag it as RFI. If a triangle and an opposing triangle are both below 1e-4, we flag them both.
- Previously, when we flagged a triangle as RFI, we flagged all the detections in the triangle, including its vertex. But there are cases (including lots of birdies) where a triangle has lots of close, non-RFI detections, that happen to be far from the vertex detection; these were erroneously getting flagged. So we changed to a "vertex-only" scheme where - if any triangle is flagged as RFI - we flag only the vertex detection, not the detections in the triangles.
- The vertex-only policy introduced a wrinkle: before we apply the drifting algorithm, we group detections into "clusters" of nearly-identical signals. From each cluster, one detection is picked as the "master". Only the master detections are used in the drifting algorithm; including the non-masters would skew the statistics. Previously, we flagged non-master detections lying in flagged triangles. With the vertex-only policy, we must do things differently: when we flag a vertex detection, we flag the other detections in its cluster. This required adding a data structure to keep track of clusters.
- Previously, "clusters" meant detections within 1 second and 1 Hz. This wasn't appropriate for short or long FFT lengths. We changed it to use rectangles based on FFT bin sizes.
With these changes, the algorithm is working pretty well. It removes only 2.9% of birdie spikes and 11.8% of all spikes, it removes the obvious examples, and few of the top multiplets are RFI. So maybe we're done with drifting.
Other news: we're talking with astronomers at FAST about using Nebula as part of SETI sky survey there. Needless to say, that would be very exciting!