Joined: 13 Feb 99
There, I said it: after 25 years working on SETI@home, and 5 years working on Nebula, we're finally on the home stretch. The finish line (producing a final candidate list, and writing a paper) is in view, maybe a few months off. It's been a slog, and I'm eager to have it behind me.
A full-sky Nebula run
Over the last several years (as chronicled here) we've been developing and refining algorithms for detecting RFI, generating birdies, finding signal candidates (multiplets) and scoring these candidates. For these purposes it wasn't necessary to look for multiplets in all 15 million pixels; I generally looked in only 256K per cycle, sometimes 1M. Finding multiplets uses lots of computing and produces lots of files (2 per pixel). I didn't want to overstay my welcome at the Atlas computing cluster.
But a few weeks ago we decided that these algorithms had reached the point of being good enough, and I did, for the first time, a Nebula run that scored all 15 million pixels. This went faster than I thought it would. I processed batches of 1.25M pixels at a time, and each batch took only a few hours, running on about 1000 cluster nodes.
It turned out that the "multiplet uniqueness" step - ensuring that multiplets in adjacent pixels are disjoint - would take pretty long, like a week. So I did a minor rewrite of this program to increase its efficiency; now it's down to about a day.
Also, copying the 30M or so files from Atlas (Germany) to Centurion (Berkeley) took a couple of days. It seemed to confuse rsync; I had to do a couple of tries to get everything copied.
Keeping birdies separate
Until now, we've processed and scored birdie detections and real detections together. I decided to change things so that we do two separate Nebula runs: a) using only real (non-birdie) detections, and scoring all pixels; b) using real and birdie detections, and scoring only birdie pixels. There are two reasons for this:
It took me a while to figure out a clean way to keep the two runs separate, in terms of files. I was already keeping all score-related files in a subdirectory, score/. What I settled on is:
This meant changing the scripts and PHP pages that run on Centurion to look for birdie-related data in score_birdie/, rather than in specially-named files in score/. This wasn't hard to do, and actually makes things a bit simpler.
Human rating of multiplets
The final output of Nebula is a bunch of score-ranked lists of multiplets. We already know that some fraction of these will be RFI of the sort that's apparent to a human observer, but hard to identify algorithmically. That's OK. The goal our RFI algorithms is not to remove 100% of RFI; doing so would probably remove ET signals too. The goal is to remove enough RFI that a good fraction (say, at least half) of the top-ranking multiplets are not obvious RFI.
The final (post-Nebula) stages of SETI@home are
So I extended our existing "bookmark" system to let you rate multiplets. When you bookmark a multiplet you can now give it a 0-10 rating, as well as a comment. The web page for each multiplet shows the ratings that have been reported so far. This mechanism is intended for use by our group (me, Eric, Dan, Jeff) but any SETI@home user can browse and rate multiplets. Feel free to do so!
Joined: 25 Nov 01
For a yes or no, this sounds very much closer to finding out so!
... So I extended our existing "bookmark" system to let you rate multiplets...
What weblink please? ;-)
Thanks for a good summary update. And congrats on the good progress :-)
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
Joined: 10 Jul 19
Firstly, many thanks for the latest update. It's good to know that the Centurion server is continuing to provide sterling service to the latter stagesof S@H. And yes, It is fully understood that you didn't want to overstay the welcome at the Atlas cluster in Germany.
The final output of Nebula is a bunch of score-ranked lists of multiplets. It is intended to manually examine the top few thousand multiplets, in all the various categories and score variants, and remove the ones that are obvious RFI.
Then to make a list of the multiplets that remain; re-observe those spots in the sky (hopefully using FAST), analyze the resulting data (probably using the SETI@home client running on a cluster) and see if we find detections consistent with the multiplets. If we do, maybe that's ET.
The finish line (producing a final candidate list, and writing a paper) is in view, maybe a few months off. It's been a slog, and I'm eager to have it behind me.
So at that stage, it seems that your involvement at S@H will finally be over, and ours too it appears. So do we take it then that S@H will never run again using distributed computing as before? Many of us got the impression after your last "end of experiment" report, that it might continue in some form or another.
©2021 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.