Multiplet scoring: back to the drawing board

Author	Message
David Anderson Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 13 Feb 99 Posts: 173 Credit: 502,653 RAC: 0	Message 2046800 - Posted: 24 Apr 2020, 20:16:17 UTC Last modified: 26 Apr 2020, 3:52:20 UTC My last post discussed optimizing scoring - changing the weights of the 3 terms in the multiplet score function to make birdie multiplets (which are surrogates for ET) score higher relative to non-birdie multiplets. My first attempt at this - to my surprise - produced negative weights. An example is shown here: Example 1 Let me explain this page. First it shows the parameters of the optimization. "bary = True" means we're looking at barycentric multiplets. "min_weight = -1" means the weights can be negative. The "Before" section shows the number of multiplets and their median scores (we use only the 10000 top-scoring non-birdie multiplets). "Found" says how many birdie multiplets had scores ranking among the top 1000 non-birdie multiplets. In the graph, each point is a multiplet. The X coordinate is its original score, and the Y coordinate is its core after optimizing the weightings. Project the points onto the X axis to see the relationship of birdie to non-birdie with original scores; project onto the Y axis for weighted scores. Here we're looking at the barycentric case. With optimized score, ALL 1369 of the birdie multiplets are found. That's good, right? Well, actually not. The optimized weight of signal_factor is -1.03. Apparently non-birdies have more of this than birdies, so giving it a negative weight penalizes non-birdies. We're "finding" the birdie multiplets by making good non-birdies look bad. So allowing negative weights gives us nonsense results. We can disallow negative weights by adding a penalty term to the objective function: namely, if a weight is negative we add a multiple of its square. This gives us (in the barycentric case): Example 2 This is a big improvement: the number of found birdies went from 339 to 776. The general slope of the graph is positive, but I don't like the fact that the non-birdie scores have been sort of flattened out. What if we make the minimum weight something bigger, like .3? Example 3 The #found is down to 463. The objective function uses the median of the top 1000 non-birdie multiplets. From the point of "finding" birdies what matters is the 1000th-best score, so what if we use that instead? Adding this to Example 2 we get Example 4 which, curiously, is not an improvement. One more idea - since we're really trying to maximize the number of found birdies, which not subtract some multiple (say 10) of this from the object function? Example 5 That finds more birdies, but in spite of the penalty one of the weight is negative, and that shows in the negative slope of the non-birdies. Let's increase the min weight to .2: Example 6 Hmm. Now we're back down again. Wait a minute, something's wrong here! The optimization seems to want to push the weights to either plus or minus infinity. Well, d'oh!! For a given score factor, if the average value for birdies is greater (worse) than for non-birdies, then of course the optimization will want to give it infinitely negative weight. This is indeed the case for signal factor; see the histograms. So this approach - at least as I've formulated it - is garbage. Maybe there's a better approach to improving the score function, but after thinking about it for a few minutes, nothing occurs to me. So I'm going to put this project on the back burner for now. In case you're curious, here are the analogous examples for non-barycentric multiplets: Example 1 Example 2 Example 3 Example 4 Example 5 Example 6 ID: 2046800 · Reply Quote

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 2046829 - Posted: 24 Apr 2020, 22:11:39 UTC - in response to Message 2046800. Thanks for sharing, hope few days pause and smth new will come to mind. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 2046829 · Reply Quote

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 2046929 - Posted: 25 Apr 2020, 15:23:42 UTC - in response to Message 2046800. So this approach - at least as I've formulated it - is garbage. Maybe there's a better approach to improving the score function, but after thinking about it for a few minutes, nothing occurs to me. So I'm going to put this project on the back burner for now. So the other research you were doing on this data, something about qualifying the sensitivity? Is that enough of a change of pace you can move forward on it while you marinate the S@H data analysis? Tom A proud member of the OFA (Old Farts Association). ID: 2046929 · Reply Quote

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 2046990 - Posted: 25 Apr 2020, 21:04:47 UTC - in response to Message 2046805. So this approach - at least as I've formulated it - is garbage. Maybe there's a better approach to improving the score function, but after thinking about it for a few minutes, nothing occurs to me. So I'm going to put this project on the back burner for now. Hmm, I don't understand much of what you're trying to achieve here, but do you say that you're now going to put the entire Nebula project on the back burner?... No... This one thread is just for the one part of Nebula that is "optimizing scoring - changing the weights of the 3 terms in the multiplet score function". This is all part of real science and the development of Nebula. Keep searchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 2046990 · Reply Quote

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 2046991 - Posted: 25 Apr 2020, 21:05:48 UTC Anyone got any good ideas for scoring those examples? Keep searchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 2046991 · Reply Quote

David Anderson Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 13 Feb 99 Posts: 173 Credit: 502,653 RAC: 0	Message 2047044 - Posted: 26 Apr 2020, 3:51:06 UTC - in response to Message 2046805. The idea of optimizing the weights of the factors in the multiplet scoring function is on the back burner. Nothing else is on the back burner for now. -- D ID: 2047044 · Reply Quote

Falken Volunteer tester Send message Joined: 18 May 99 Posts: 21 Credit: 1,457,137 RAC: 4	Message 2048746 - Posted: 10 May 2020, 10:39:23 UTC - in response to Message 2046800. All the example links are 404 ID: 2048746 · Reply Quote

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 2048759 - Posted: 10 May 2020, 14:36:11 UTC - in response to Message 2048746. Last modified: 10 May 2020, 14:36:35 UTC All the example links are 404 New data for new examples? Or the system has yet been developed further? Hopefully a good sign of further development... Keep searchin'! Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 2048759 · Reply Quote

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.