## Multiplet scoring: back to the drawing board

Message boards : Nebula : Multiplet scoring: back to the drawing board
Message board moderation

AuthorMessage
David Anderson
Volunteer moderator
Project developer
Project scientist

Joined: 13 Feb 99
Posts: 139
Credit: 502,653
RAC: 0
Message 2046800 - Posted: 24 Apr 2020, 20:16:17 UTC

My last post discussed optimizing scoring - changing the weights of the 3 terms in the multiplet score function to make birdie multiplets (which are surrogates for ET) score higher relative to non-birdie multiplets.

My first attempt at this - to my surprise - produced negative weights. An example is shown here:

Example 1

Let me explain this page. First it shows the parameters of the optimization. "bary = True" means we're looking at barycentric multiplets. "min_weight = -1" means the weights can be negative.

The "Before" section shows the number of multiplets and their median scores (we use only the 10000 top-scoring non-birdie multiplets). "Found" says how many birdie multiplets had scores ranking among the top 1000 non-birdie multiplets.

In the graph, each point is a multiplet. The X coordinate is its original score, and the Y coordinate is its core after optimizing the weightings. Project the points onto the X axis to see the relationship of birdie to non-birdie with original scores; project onto the Y axis for weighted scores.

Here we're looking at the barycentric case. With optimized score, ALL 1369 of the birdie multiplets are found. That's good, right? Well, actually not. The optimized weight of signal_factor is -1.03. Apparently non-birdies have more of this than birdies, so giving it a negative weight penalizes non-birdies. We're "finding" the birdie multiplets by making good non-birdies look bad.

So allowing negative weights gives us nonsense results. We can disallow negative weights by adding a penalty term to the objective function: namely, if a weight is negative we add a multiple of its square. This gives us (in the barycentric case):

Example 2

This is a big improvement: the number of found birdies went from 339 to 776. The general slope of the graph is positive, but I don't like the fact that the non-birdie scores have been sort of flattened out.

What if we make the minimum weight something bigger, like .3?

Example 3

The #found is down to 463.

The objective function uses the median of the top 1000 non-birdie multiplets. From the point of "finding" birdies what matters is the 1000th-best score, so what if we use that instead? Adding this to Example 2 we get

Example 4

which, curiously, is not an improvement.

One more idea - since we're really trying to maximize the number of found birdies, which not subtract some multiple (say 10) of this from the object function?

Example 5

That finds more birdies, but in spite of the penalty one of the weight is negative, and that shows in the negative slope of the non-birdies. Let's increase the min weight to .2:

Example 6

Hmm. Now we're back down again.

Wait a minute, something's wrong here! The optimization seems to want to push the weights to either plus or minus infinity.

Well, d'oh!! For a given score factor, if the average value for birdies is greater (worse) than for non-birdies, then of course the optimization will want to give it infinitely negative weight. This is indeed the case for signal factor; see the histograms.

So this approach - at least as I've formulated it - is garbage. Maybe there's a better approach to improving the score function, but after thinking about it for a few minutes, nothing occurs to me. So I'm going to put this project on the back burner for now.

In case you're curious, here are the analogous examples for non-barycentric multiplets:

Example 1
Example 2
Example 3
Example 4
Example 5
Example 6

ID: 2046800 ·
Grumpy Swede
Volunteer tester

Joined: 1 Nov 08
Posts: 8158
Credit: 49,849,242
RAC: 294
Message 2046805 - Posted: 24 Apr 2020, 20:46:34 UTC - in response to Message 2046800.

So this approach - at least as I've formulated it - is garbage. Maybe there's a better approach to improving the score function, but after thinking about it for a few minutes, nothing occurs to me. So I'm going to put this project on the back burner for now.

Hmm, I don't understand much of what you're trying to achieve here, but do you say that you're now going to put the entire Nebula project on the back burner?

No more data gathering for SETI, and no Nebula?

Where does that leave the entire 20+ years of the SETI project? What's the plan?
ID: 2046805 ·
Raistmer
Volunteer developer
Volunteer tester

Joined: 16 Jun 01
Posts: 6242
Credit: 106,370,077
RAC: 549
Message 2046829 - Posted: 24 Apr 2020, 22:11:39 UTC - in response to Message 2046800.

Thanks for sharing, hope few days pause and smth new will come to mind.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2046829 ·
Tom M
Volunteer tester

Joined: 28 Nov 02
Posts: 4928
Credit: 276,046,078
RAC: 2,093
Message 2046929 - Posted: 25 Apr 2020, 15:23:42 UTC - in response to Message 2046800.

So this approach - at least as I've formulated it - is garbage. Maybe there's a better approach to improving the score function, but after thinking about it for a few minutes, nothing occurs to me. So I'm going to put this project on the back burner for now.

So the other research you were doing on this data, something about qualifying the sensitivity? Is that enough of a change of pace you can move forward on it while you marinate the S@H data analysis?

Tom
"I owe", "I owe", "Its off to work I go" (from a bumper sticker on a smallish Mercedes Benz)
(on the back of a Semi Tractor) "If you can read this bumper sticker, I've LOST MY TRAILER!"
ID: 2046929 ·
ML1
Volunteer moderator
Volunteer tester

Joined: 25 Nov 01
Posts: 10494
Credit: 7,508,002
RAC: 91
Message 2046990 - Posted: 25 Apr 2020, 21:04:47 UTC - in response to Message 2046805.

So this approach - at least as I've formulated it - is garbage. Maybe there's a better approach to improving the score function, but after thinking about it for a few minutes, nothing occurs to me. So I'm going to put this project on the back burner for now.

Hmm, I don't understand much of what you're trying to achieve here, but do you say that you're now going to put the entire Nebula project on the back burner?...

No...

This one thread is just for the one part of Nebula that is "optimizing scoring - changing the weights of the 3 terms in the multiplet score function".

This is all part of real science and the development of Nebula.

Keep searchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 2046990 ·
ML1
Volunteer moderator
Volunteer tester

Joined: 25 Nov 01
Posts: 10494
Credit: 7,508,002
RAC: 91
Message 2046991 - Posted: 25 Apr 2020, 21:05:48 UTC

Anyone got any good ideas for scoring those examples?

Keep searchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 2046991 ·
David Anderson
Volunteer moderator
Project developer
Project scientist

Joined: 13 Feb 99
Posts: 139
Credit: 502,653
RAC: 0
Message 2047044 - Posted: 26 Apr 2020, 3:51:06 UTC - in response to Message 2046805.

The idea of optimizing the weights of the factors in the multiplet scoring function is on the back burner.
Nothing else is on the back burner for now.
-- D
ID: 2047044 ·
Falken
Volunteer tester

Joined: 18 May 99
Posts: 20
Credit: 1,457,137
RAC: 20
Message 2048746 - Posted: 10 May 2020, 10:39:23 UTC - in response to Message 2046800.

All the example links are 404
ID: 2048746 ·
ML1
Volunteer moderator
Volunteer tester

Joined: 25 Nov 01
Posts: 10494
Credit: 7,508,002
RAC: 91
Message 2048759 - Posted: 10 May 2020, 14:36:11 UTC - in response to Message 2048746.

All the example links are 404

New data for new examples? Or the system has yet been developed further?

Hopefully a good sign of further development...

Keep searchin'!
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 2048759 ·

Message boards : Nebula : Multiplet scoring: back to the drawing board