The home stretch

Message boards : Nebula : The home stretch
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile David Anderson
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 13 Feb 99
Posts: 164
Credit: 502,653
RAC: 0
Message 2079759 - Posted: 12 Jul 2021, 2:09:09 UTC
Last modified: 12 Jul 2021, 2:12:30 UTC

There, I said it: after 25 years working on SETI@home, and 5 years working on Nebula, we're finally on the home stretch. The finish line (producing a final candidate list, and writing a paper) is in view, maybe a few months off. It's been a slog, and I'm eager to have it behind me.

A full-sky Nebula run


Over the last several years (as chronicled here) we've been developing and refining algorithms for detecting RFI, generating birdies, finding signal candidates (multiplets) and scoring these candidates. For these purposes it wasn't necessary to look for multiplets in all 15 million pixels; I generally looked in only 256K per cycle, sometimes 1M. Finding multiplets uses lots of computing and produces lots of files (2 per pixel). I didn't want to overstay my welcome at the Atlas computing cluster.

But a few weeks ago we decided that these algorithms had reached the point of being good enough, and I did, for the first time, a Nebula run that scored all 15 million pixels. This went faster than I thought it would. I processed batches of 1.25M pixels at a time, and each batch took only a few hours, running on about 1000 cluster nodes.

It turned out that the "multiplet uniqueness" step - ensuring that multiplets in adjacent pixels are disjoint - would take pretty long, like a week. So I did a minor rewrite of this program to increase its efficiency; now it's down to about a day.

Also, copying the 30M or so files from Atlas (Germany) to Centurion (Berkeley) took a couple of days. It seemed to confuse rsync; I had to do a couple of tries to get everything copied.

Keeping birdies separate


Until now, we've processed and scored birdie detections and real detections together. I decided to change things so that we do two separate Nebula runs: a) using only real (non-birdie) detections, and scoring all pixels; b) using real and birdie detections, and scoring only birdie pixels. There are two reasons for this:

  • A birdie could mask (i.e. prevent us from detecting) an actual ET signal in the same pixel. The odds of this are small - with 3,000 birdies, only .02% of pixels have a birdie - but still.
  • We may want to change the number or parameters of our birdies, perhaps repeatedly. It would be good to be able to do this without rescoring all pixels (and transferring all the files).

It took me a while to figure out a clean way to keep the two runs separate, in terms of files. I was already keeping all score-related files in a subdirectory, score/. What I settled on is:

  • On Atlas, everything stays the same. When we do a run, either birdie or non-birdie, the results go in score/.
  • On Centurion, we have a new directory score_birdie/. After a birdie run, we copy score/ on Atlas to score_birdie/ on Centurion.

This meant changing the scripts and PHP pages that run on Centurion to look for birdie-related data in score_birdie/, rather than in specially-named files in score/. This wasn't hard to do, and actually makes things a bit simpler.

Human rating of multiplets


The final output of Nebula is a bunch of score-ranked lists of multiplets. We already know that some fraction of these will be RFI of the sort that's apparent to a human observer, but hard to identify algorithmically. That's OK. The goal our RFI algorithms is not to remove 100% of RFI; doing so would probably remove ET signals too. The goal is to remove enough RFI that a good fraction (say, at least half) of the top-ranking multiplets are not obvious RFI.
The final (post-Nebula) stages of SETI@home are

  • Manually examine the top few thousand multiplets, in all the various categories and score variants, and remove the ones that are obvious RFI. At least initially we'll do this as a group, on Zoom, to make sure we agree on what constitutes obvious RFI.
  • Make a list of the multiplets that remain; re-observe those spots in the sky (hopefully using FAST), analyze the resulting data (probably using the SETI@home client running on a cluster) and see if we find detections consistent with the multiplets. If we do, maybe that's ET.

So I extended our existing "bookmark" system to let you rate multiplets. When you bookmark a multiplet you can now give it a 0-10 rating, as well as a comment. The web page for each multiplet shows the ratings that have been reported so far. This mechanism is intended for use by our group (me, Eric, Dan, Jeff) but any SETI@home user can browse and rate multiplets. Feel free to do so!
ID: 2079759 · Report as offensive     Reply Quote
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 13695
Credit: 7,508,002
RAC: 20
United Kingdom
Message 2079791 - Posted: 12 Jul 2021, 12:55:16 UTC - in response to Message 2079759.  
Last modified: 12 Jul 2021, 12:55:32 UTC

Eureka?!

For a yes or no, this sounds very much closer to finding out so!

... So I extended our existing "bookmark" system to let you rate multiplets...

What weblink please? ;-)



Thanks for a good summary update. And congrats on the good progress :-)

Keep searchin'!
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 2079791 · Report as offensive     Reply Quote
The Phoenix

Send message
Joined: 10 Jul 19
Posts: 20
Credit: 21,835
RAC: 0
Message 2080807 - Posted: 25 Jul 2021, 7:03:13 UTC - in response to Message 2079759.  

Firstly, many thanks for the latest update. It's good to know that the Centurion server is continuing to provide sterling service to the latter stagesof S@H. And yes, It is fully understood that you didn't want to overstay the welcome at the Atlas cluster in Germany.

The final output of Nebula is a bunch of score-ranked lists of multiplets. It is intended to manually examine the top few thousand multiplets, in all the various categories and score variants, and remove the ones that are obvious RFI.

Then to make a list of the multiplets that remain; re-observe those spots in the sky (hopefully using FAST), analyze the resulting data (probably using the SETI@home client running on a cluster) and see if we find detections consistent with the multiplets. If we do, maybe that's ET.

The finish line (producing a final candidate list, and writing a paper) is in view, maybe a few months off. It's been a slog, and I'm eager to have it behind me.

So at that stage, it seems that your involvement at S@H will finally be over, and ours too it appears. So do we take it then that S@H will never run again using distributed computing as before? Many of us got the impression after your last "end of experiment" report, that it might continue in some form or another.
ID: 2080807 · Report as offensive     Reply Quote

Message boards : Nebula : The home stretch


 
©2021 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.