Joined: 13 Feb 99
A lot has happened since my last missive. The S@h front end is now hibernated - that's a good thing. The COVID-19 epidemic is now in progress. That's a bad thing for everyone - those who get sick and die, their families and friends, those who lose their jobs and their savings, those who suddenly have no human contact, and so on.
So far I've been lucky. I'm not sick, I have my nuclear family, and I have things like Nebula to keep me busy. If there's one silver lining to all this, it's that we now have free time in which to pursue projects and interests that have been on hold. Hopefully we (i.e. Eric and I) will speed up progress on Nebula, and realize this dream that all of us have been pursuing for 20 years.
OK - back to Nebula. We did two non-earthshaking things.
Normalizing detection scores
The first involves detection scores. NOTE: "detection" is the new word what we used to call "signal": namely spikes, Gaussians, pulses, triplets, and autocorrs. "Signal" was a poor choice. Anyway: each detection has a score, which is intended to reflect the probability of it occurring in noise. For spikes this is just the power. For other types it also includes a goodness-of-fit factor. Multiplet scores include a term which is the sum of the detection scores.
Eric observed that different detection types have different score distributions - in particular, the average score of Gaussians is much lower than that of spikes. This means that multiplets containing Gaussians are ranked lower than they should be.
To address this, we now add a normalizing factor to detection scores, such that the 30 millionth highest-scoring clean (post-RFI-removal) detection of each type now has the same score (I'm a little wooly on why this is the right thing, but Eric says it is). To implement this, I extended the RFI-removal program to generate high-resolution histograms of the scores of clean detections.
No more blocks
Second, I changed the way we choose pixels to score. Recall that there are about 16M pixels in the Arecibo sky. Finding and scoring multiplets is done separately for each pixel. This can take anywhere from a few seconds to an hour of computer time, depending on how many detections are in and around the pixel. To speed things up, we do this in parallel, using lots of jobs running at the same time on the Atlas cluster generously provided by Bruce Allen of Einstein@home. The way I originally implemented things, each of these jobs processed a "block" of 65 consecutively-numbered pixels.
In our current debugging phase, we don't score the entire sky in each Nebula run; that would take too long. Instead, we score the pixels containing birdies, plus enough non-birdie pixels to see how the birdie multiplets rank relative to non-birdie multiplets; maybe 100K or so pixels
The 64-pixel block scheme doesn't work well with this; it forces us to score pixels close to birdies. If we have 1000 birdies then we have to score those 1000 blocks, or 64K total pixels. 63K of those are non-birdie, but they're not random. And when we have 4000 birdies (as we currently) do that means scoring at least 256K pixels.
I solved this by getting rid of the "block" concept. Each job now scores 64 arbitrary pixels. What we do now is make a list of the birdie pixels followed by the remaining pixels randomly ordered. Then, if we want to score 100K pixels, we just take the first 100K elements of this list and divide them into jobs of 64 pixels each.
The scoring run currently online includes each of these changes.
©2020 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.