Estimating sensitivity

Message boards : Nebula : Estimating sensitivity
Message board moderation

To post messages, you must log in.

Profile David Anderson
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 13 Feb 99
Posts: 157
Credit: 502,653
RAC: 0
Message 2036357 - Posted: 6 Mar 2020, 22:18:51 UTC
Last modified: 7 Mar 2020, 4:33:48 UTC

The S@h hibernation - long overdue IMHO - will free up some of Eric's time and hopefully let us finish this project before the sun goes red giant.

Recent changes - described in my previous blog entries - have focused on finding and scoring multiplets. I think we're done with big changes in those areas. Now we move on to the last stage: deciding what scientific conclusions we want to make, and figuring out how to make them.

Finding ET would be a scientific conclusion. Failing that, we want to make (and quantitatively support) a statement about the "sensitivity" of our search - a statement of the form "if there were a radio beacon in frequency range X, in part Y of the sky, with power at least Z, our search would have detected it with probability P". It's hard to prove such a statement, but the "birdie" mechanism we've created in Nebula gives us a tool for doing so.

This is complicated by the fact that S@h is looking for lots of kind of signals, and we're more sensitive to some than others. We’re looking for signals in an abstract "space" with several dimensions:

  • Frequency variation (up to 250 Hz for barycentric signals, up to 200 KHz for non-bary).
  • Pulsed or not, and pulse parameters (period and duty cycle).
  • Observation time for that sky position (i.e. pixel).

Those are the main ones. Others:

  • Range of chirp rate (due to parameters of the sender’s planet).
  • Amount of observing time with slew rate in the Gaussian range.
  • Intrinsic bandwidth of the signal.

It’s likely that our sensitivity varies widely in different areas of this space. Our goal is to estimate the sensitivity in these areas. Doing this serves several purposes:

  • Assuming we don’t find ET, sensitivity estimates are our main scientific result.
  • It can guide our algorithm development: e.g. if our sensitivity to signals with big frequency variation is poor, we could try to improve the non-bary multiplet-finding algorithm.
  • It can guide future radio SETI sky surveys by suggesting optimal scanning parameters.

The basic method for estimating sensitivity in an area A of the search space is:

  • Generate a bunch of birdies in A, with a range of powers.
  • See which of these birdies get “detected”, i.e. produce a multiplet whose score ranks in the top 1000 of non-birdie multiplets. (We assume that we'll manually example the top 1000 multiplets of each type, and that we'll be able to decide if one of them is ET).
  • Find the power P for which most of the birdies of power P or greater are detected.

I've figured out some ways to implement this. First, some general notes:

  • We can’t study sensitivity to pulsed signals since we haven't implemented pulsed birdies. Future work.
  • We can’t study factors specific to Gaussians (e.g. amount of time observed at Gaussian slew rates) since we currently don’t make Gaussians for birdies. Future work.
  • Our handling of bary and non-bary signals is somewhat different, and the multiplet scores aren’t necessarily comparable. So we’ll handle the two signal classes separately.
  • To get an accurate list of non-birdie multiplets, we need to do a complete Nebula run (RFI removal and scoring) without birdies. The presence of birdies could mask high-scoring non-birdie multiplets. Of course, birdies could mask each other; if this looks like an issue we can generate birdies so that they don’t overlap.
  • We may want to estimate sensitivity where one parameter is limited and others are not. What should be the distribution of the “free” parameters? In the case of observation, it’s the actual distribution of our observations. For planetary parameters, we may as well use the distribution that Eric defined for generating birdies, based on the statistics of stars and observed exoplanets. Same for intrinsic bandwidth.

Now let’s return to the question of exactly how to estimate sensitivity for an area A (say, barycentric signals in pixels observed less than 10 minutes). Suppose we’ve generated a set B of birdies in A. For each birdie b we have the pair

(power(b), rank(b))

where rank(b) is the rank of the highest-scoring multiplet containing signals in b, or +inf if there is none. Think of the scatter plot of these points. Ideally, rank generally decreases as power increases, and beyond some power most of the ranks are under 1000.

For a given power p, define F(p) as

F(p) = # birdies with power > p and rank < 1000 / #birdies with power > p

F is the fraction of birdies of power at least p which we detected. It’s piecewise constant. Ideally we’d like it to be monotonically increasing and asymptote to 1, but in practice neither of these is necessarily true.

Now pick a number 0 < C < 1. C is our target probability of finding ET. Let’s say that C = 0.5.

Define sensitivity(A) as the least p0 such that F(p)>C for p > p0. In other words, if there’s a signal in A with power at least p0, the probability that we’ll find it is at least C.

There may be no such p0, in which case our search is not sensitive within A. This means that no matter how powerful a signal is, our chance of finding it doesn’t go above the threshold C.

How do we generate birdies so that F(p) is statistically significant? How many birdies do we need, and what powers? I don’t currently have any concrete ideas. We could use input from a statistician. General suggestions:

  • List the areas A for which we want to estimate sensitivity.
  • Do a scoring run with some population of birdies.
  • For each A, eyeball the scatter plot.
  • If there aren’t enough birdies in A (say 50 or 100) add more.
  • If there aren’t enough birdies with rank < 1000, add more high-power birdies in A.
  • Get a rough idea of where sensitivity(A) is. Make sure there are a number of birdies with powers well below and well above this point.

But these are just heuristics. We should find a way to quantify the error in our estimates.

What areas of signal space should we study? Here’s a proposal:

Barycentric signals

All pixels
Pixels with < 1 minute observation
Pixels with 1 - 10 minutes
Pixels with > 10 minutes (replace 1, 10 min with the terciles of actual obs times)
Signals with intrinsic BW < 1 Hz (optional)
Signals with intrinsic BW > 10 Hz (optional)

Non-bary signals

All pixels, all freq variations
Freq variation < 20 KHz
Freq variation 20 KHz - 100 KHz
Freq variation > 100 KHz
2-4, in pixels with < 1 min observation
2-4 in pixels with 1 - 10 min
2-4 in pixels > 10 min

That’s a total of 19 areas, and it covers the important dimensions.
If we can show our sensitivity in each of these areas,
that will make a nice paper; I'll be happy with that.
ID: 2036357 · Report as offensive     Reply Quote
Profile lunkerlander

Send message
Joined: 23 Jul 18
Posts: 82
Credit: 1,353,232
RAC: 4
United States
Message 2036671 - Posted: 8 Mar 2020, 3:23:01 UTC - in response to Message 2036357.  
Last modified: 8 Mar 2020, 3:23:20 UTC

Thanks for this update!

Hopefully after the conclusion of this study there will be some sort of improved version of seti@home that uses updated apps and searching techniques using what you've learned from these results.
ID: 2036671 · Report as offensive     Reply Quote
Profile mohavewolfpup

Send message
Joined: 20 Oct 18
Posts: 32
Credit: 3,666,574
RAC: 24
United States
Message 2036768 - Posted: 8 Mar 2020, 16:29:49 UTC

So for a layman (Honestly, a lot of it goes right over my head. Math isn't my strong suit and kinda reads like that) How long does would this data take to accurately process until you are satisfied?

Don't worry, i'm not looking for it to be finished in 5 minutes and then seti@home is fired up again. Just more curious with the 20+ years of data you've got laying around processed, what it would take to process all that for conclusions?

Do you think there would be stages where the currently processed data may need more reworking from clients? Or it's basically "finished" and nothing else can be gleaned from it via more volunteer computing?
Historian for the Defunct Riviera Hotel and Casino, Former Classic Seti@home user for Team Art Bell. Greetings from the High Desert!
ID: 2036768 · Report as offensive     Reply Quote
Profile Jon Golding

Send message
Joined: 20 Apr 00
Posts: 103
Credit: 841,861
RAC: 0
United Kingdom
Message 2036930 - Posted: 9 Mar 2020, 13:20:33 UTC - in response to Message 2036357.  

I'm guessing that the receiving hardware had several updates over the 20 years of this project, to make incremental improvements to the signal detection sensitivity.
Perhaps (unfortunately) that's yet another parameter that needs to be taken into account? More recent observations of a pixel will have a greater sensitivity than older observations, so WU results need to be binned according to the available sensitivity at the epoch they were recorded.
Moreover, several pixels were recorded by different antennas (Arecibo, Green Bank, etc.), each of which has intrinsically different sensitivities, probably at different wavelengths too.
The whole analysis becomes hideously complex.
No wonder you need to switch off work units to allow full-time work on the data analysis.
ID: 2036930 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester

Send message
Joined: 16 Jun 01
Posts: 6316
Credit: 106,370,077
RAC: 121
Message 2037007 - Posted: 9 Mar 2020, 21:39:30 UTC - in response to Message 2036357.  

* We can’t study sensitivity to pulsed signals since we haven't implemented pulsed birdies. Future work.
* We can’t study factors specific to Gaussians (e.g. amount of time observed at Gaussian slew rates) since we currently don’t make Gaussians for birdies. Future work.

Ups %)
And how PulseFind algorithm validation done then?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2037007 · Report as offensive     Reply Quote

Send message
Joined: 29 Feb 16
Posts: 19
Credit: 1,353,463
RAC: 3
Message 2037442 - Posted: 12 Mar 2020, 2:08:22 UTC

Well, what if there ARE another WOW signal(s) waiting to be discovered, but we just assumed all of them are noise, so we consistently get bad sensitivity because those ET signals rank higher than birdies but we think they are noise or non-repeatable?
ID: 2037442 · Report as offensive     Reply Quote
Profile Rich Project Donor

Send message
Joined: 4 Sep 99
Posts: 15
Credit: 10,381,162
RAC: 53
United States
Message 2041752 - Posted: 30 Mar 2020, 21:40:23 UTC - in response to Message 2036357.  

Please go ahead and get the statistical help, do the analyses of all the work that your SETI supporters have produced, and publish. I'm sure that we all suspect the the Search For Extra Terrestrial Intelligence will go on in various forms until mankind comes to a conclusion, or conclusions, regarding the issue. Clearly, the current SETI supporters, especially those commenting in these posts, represent a support potential that should be harnessed to move the investigation along in the near and medium future. The organization and focus that the SETI project brings to the issue should be sharpened and magnified to the extent possible, using your published and professional analyses as a guide. SETI stimulates the use and advancement of many disciplines... it is truly a seminal project.
Best Regards,
ID: 2041752 · Report as offensive     Reply Quote

Message boards : Nebula : Estimating sensitivity

©2021 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.