## External design review

Message boards : Nebula : External design review
Message board moderation

 To post messages, you must log in. "Oldest first Newest first Highest rated posts first

AuthorMessage
David Anderson
Volunteer moderator
Project developer
Project scientist

Joined: 13 Feb 99
Posts: 139
Credit: 502,653
RAC: 0
Message 2050024 - Posted: 22 May 2020, 5:36:20 UTC
Last modified: 26 May 2020, 6:57:54 UTC

SETI@home is a chain of algorithms, most of them pretty complex, most of them developed by trial and error. Every chain has a weakest link, and the weakness of our weakest link could determine whether we find ET. Although Nebula's "birdie" framework has been helpful in evaluating the algorithms, it's certainly not infallible.

The algorithms have accumulated over the years, and many people have contributed to them. The recent ideas have mostly come from a small group: Eric and me, with input from Dan and Jeff. Like everyone else, we make mistakes and have blind spots. What if we're missing something obvious, and one of our links is very weak?

To prevent this, I decided that we need an "external design review" - to bring in some outside experts, explain our algorithms to them, and get their feedback. We did this last week (over Zoom, like everything else these days) with experts from Harvard, UC Berkeley, and Australia. I though it went extremely well. We focused on the multiplet scoring function. The experts understood what we're currently doing, and didn't find any glaring problems with it, but each of them had some ideas for improvements.

Two key ideas emerged, which I've implemented in the last few days.

### 1) Normalize score factors

As discussed elsewhere, multiplet scores have several factors. The original idea was that each factor was a probability: it measured a particular property of the multiplet, and estimated the probability that a multiplet with that value of the property would occur in noise. Then - assuming that the properties are independent - we multiply the factors to get an overall probability. (Actually we work in log space, so we add the factors).

For various reasons, the factors have different variances. The ND factor varies (in log space) by 10 or 100, but the time factor only varies by 1 or so. So if we just add the factors, time factor doesn't make much difference. A multiplet could have a great time factor but still get a bad score.

I tried to solve this problem by optimizing the score factor weights to favor birdies, but this was unsuccessful.

So instead (on the suggestion of one of the experts) I decided to use a simple solution: normalize the factors so they all have the same variance - let the data tell us how to scale the factors. Actually I used a slightly more robust approach: scale the factors so that the 25% and 75% quantiles are the same across factors.

Note: we do this normalization separately for each multiplet "category", i.e. combination of detection type and baryness. The factor ranges are quite different across categories. For example, for bary spike/gaussian multiplets, the quantiles for ND factor are -44 and -15, while for non-bary spike/gaussian multiplets they're -148 and -65.

### 2) Score variants

We designed the score factors to measure properties that we expect to distinguish ET from noise. But what if - perhaps for a particular category - one of the properties isn't doing this. Perhaps its value is effectively random. Then - especially now that we're normalizing the factors - this could cause an ET multiplet to get a mediocre score and be missed. Again, I hoped that weight optimization would discover and fix these situations, but it didn't work.

So (also at the suggestion of one of the experts) I'm trying a simpler approach: look at all the (7) combinations of the three scoring factors. I call these "score variants". The web interface now lets you look at the top multiplet lists for any category, and any score variant.

Score variants give us a tool for understanding the score factors. It's possible we'll find that for some categories it's better to omit one or two of the factors. For the spike/multiplet categories we can see which score variant finds the most birdies, and use that one. For the other categories we don't have birdies, but we can look at the top-scoring multiplets for each variant and see - by our intuition - how much they look like ET. Or we could make a combined list of the multiplets that score highly (say in the top 501) for any of the variants.

ID: 2050024 ·
Gone with the wind
Volunteer tester

Joined: 19 Nov 00
Posts: 41704
Credit: 42,645,437
RAC: 190
Message 2050028 - Posted: 22 May 2020, 7:46:30 UTC

Glad to hear that work is still continuing on all this. Any idea when Nebula might actually kick off at Hanover?
"none so blind as those who will not see" (John Heywood 1546)

Don't drink water, that stuff rusts pipes!

You are making Proof out of Logic, by just being dubious! {Bluestar to me)
ID: 2050028 ·
ML1
Volunteer moderator
Volunteer tester

Joined: 25 Nov 01
Posts: 10494
Credit: 7,508,002
RAC: 91
Message 2050068 - Posted: 22 May 2020, 17:43:42 UTC
Last modified: 22 May 2020, 17:44:16 UTC

Sounds good.

Good to prove that normalization is good, geometric means are bad! (Except for very special cases or for just glossing over the real results...)

Silly simplistic question (something already done?):

Could a blind look at a matrix comparing all (normalised) parameters for noise vs pulses show what is most significant?

Aside: I still wonder if artifacts lurk in the 1-bit source data collection.

Good luck,

Keep searchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 2050068 ·
Jon Golding

Joined: 20 Apr 00
Posts: 98
Credit: 841,861
RAC: 1
Message 2050170 - Posted: 23 May 2020, 21:03:51 UTC - in response to Message 2050024.
Last modified: 23 May 2020, 21:05:44 UTC

Are similar design review meetings planned to look at the other functions (RFI removal, etc.)?
Great to know that the external experts broadly agree with your solutions to date and are able to bring in further refinements.
Getting closer to ET with each step.
ID: 2050170 ·

Message boards : Nebula : External design review

©2020 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.