Posts by David Anderson

1) Message boards : News : Nebula progress report (Message 2115485)
Posted 3 Mar 2023 by Profile David Anderson
Post:
Check out our latest newsletter: Final update.
2) Message boards : Nebula : Final update (Message 2115484)
Posted 3 Mar 2023 by Profile David Anderson
Post:
<div style="width:640px">Sorry for the long gap between reports. Since last April, my colleagues (Dan Werthimer, Eric Korpela, Jeff Cobb, Wei Liu) have done two re-observation sessions at FAST, looking at a few dozen of the top-scoring multiplets found by Nebula. So far it's been all barycentric spike/gaussian multiplets. With our remaining observing time we'll look at some pulse/triplet and autocorr multiplets, and some non-barycentric, as well.

We haven't examined the re-observation data yet, and it's unclear how we're going to do this. For barycentric multiplets I think the plan is to manually look at waterfall plots, possibly of the raw FFT data, possibly after processing by the SETI@home client (running on our own computers). For non-barycentric this isn't feasible - the frequency range is big and there's lots of RFI. I don't know; I'm out of the loop at this point.

Whether or not we find ET, there's still the matter of writing papers describing what we did, and (in the case of Nebula) giving sensitivity bounds. We had planned to write two papers, one about the front end and one about the back end (Nebula). These papers are about 80% done, but there has been no progress on them in over a year. I can't go into details. I've done everything I can to complete these papers, but I can't do it by myself.

My involvement in SETI@home has ended. I'd like to thank everyone who has participated, especially those who have read and commented on these Nebula reports. Since then, I worked for a while on <a href=https://github.com/panoseti/panoseti/wiki>PanoSETI</a>, an optical SETI project. I've also worked on two music-related projects: <a href=https://music-match.org/>Music Match<a> and <a href=https://github.com/davidpanderson/numula/wiki>Numula</a>. I'm also working as a contractor for <a href=http://imslp.org/wiki/Main_Page>IMSLP</a>, the online classical music score library. And I continue to run the <a href=https://boinc.berkeley.edu/>BOINC project</a>.

</div>
3) Message boards : Nebula : Update (Message 2097897)
Posted 16 Apr 2022 by Profile David Anderson
Post:
<div style="width:640px">Nebula has done its job - we have millions of "multiplets" - candidate signals - in 6 categories, ordered by 7 different score functions. Dan and Eric will, at some point, manually examine lots of these and pick 100 or so for re-observation at FAST, where we've been granted 20 hours of observing time.

To do this re-observation , we need to build a new data recorder - hardware and software. Much of this work is being done by Wei Liu, an astronomy post-doc who's visiting here for 2 years. We recently had an in-person meeting with one of the leaders of FAST to discuss the technical details of the re-observation.

I'm working on the paper about Nebula and its results. This is going slowly. I've recently started going over the paper with Bruce Allen (leader of Einstein@home), whose input and questions are very valuable.
</div>
4) Message boards : News : Nebula progress report (Message 2088645)
Posted 21 Nov 2021 by Profile David Anderson
Post:
Check out our latest newsletter: New zones, a milestone, and next steps.
5) Message boards : Nebula : New zones, a milestone, and next steps (Message 2088641)
Posted 21 Nov 2021 by Profile David Anderson
Post:
<div style="width:640px">My <a href=https://setiathome.berkeley.edu/forum_thread.php?id=85783>last missive</a> described how we changed zone RFI removal to work for pulses. In this case the "zones" are ranges of pulse period rather than frequency: a lot of the RFI is aviation radar that results in pulses with periods that are multiples of 12 seconds. To figure things out, Eric looked at histograms of pulse counts as a function of period. Since then, he has repeated this exercise for triplets, and then autocorrs. The details were different in each case, but the basic idea was the same.

Once this was done - a couple of weeks ago - I did another Nebula run on the full sky - i.e. finding and scoring multiplets in all pixels. Eric, Dan and I then examined the top-scoring pulse/triplet and autocorr multiplets, looking for RFI that should have been removed by the zone filter. Good news! Although some multiplets still had this sort of RFI, many of them didn't - enough that we'll be able to find plenty of multiplets that are worth re-observing.

So until further notice we're done with computing. No more algorithm-fiddling or scoring runs. Nebula has done its job. This is a huge milestone for SETI@home, and for me personally. I've been working on Nebula for 5 years, and it's been some of the most challenging work - algorithm design, programming, and debugging - I've done.

The immediate next step is for us to go through the top-scoring multiplets - of all the various detection types, bary/non-bary, and scoring variants - and pick 100 or so to re-observe at FAST. That's about how many we'll have time for in the 24 hours of observing time that we've been granted - we'll observe each one for a minute or two, and it can take several minutes to slew from one sky location to the next.

After that, we'll need to adapt our computers at FAST (which currently run SERENDIP spectrometers) to produce the time-domain data needed for SETI@home. Then we have to figure out how to analyze this data; we'll probably do the first part using the existing SETI@home client running on cluster nodes either in China or at the Atlas cluster in Hannover. After that we'll need take these detections and decide whether they "confirm" the corresponding multiplet. We haven't figured the details. In the case of barycentric multiplets - where we know what frequency to look at - this might involve manually looking at waterfall plots of the new detections. For non-barycentric multiplets - where the frequency could be anywhere in a wide range - we could add the new detections to the SETI@home detections, re-run Nebula (at least the multiplet-finding part) and see if it finds the original multiplets with additional detections that increase the score.
</div>
6) Message boards : News : Nebula progress report (Message 2085144)
Posted 28 Sep 2021 by Profile David Anderson
Post:
Check out our latest newsletter: Finding a Pulse.
7) Message boards : Nebula : Finding a pulse (Message 2085143)
Posted 28 Sep 2021 by Profile David Anderson
Post:
<div style="width:640px">Here, "pulse" refers to the detection type, though I should add that SETI@home itself still has a pulse in spite of the low rate of progress over the last couple of months. I've become involved in another SETI project, <a href=https://oirlab.ucsd.edu/PANOSETI.html>PanoSETI</a>. And there are other factors.

Anyway, the good news is that I think we're done with narrow-band signals (spikes and Gaussians). We're happy with RFI removal, and with multiplet finding and scoring.

So we've turned our attention to pulsed signals (pulses and triplets). Until recently we've ignored these to some extent, because the birdie mechanism doesn't extend to pulsed signals so we don't have it to guide us.

Recently, when Eric examined the top-scoring pulse/triplet multiplets, he found that they consisted of detection that were zone RFI: in particular, their periods were multiples of 12s, which is the period of some kind of radar that pollutes our data.

Now, we have a "zone" RFI filter that is supposed to find stuff like this. We wrote this filter for narrow-band signals, where the zones are in frequency space: TV station sidebands and the like. We used the same algorithm for pulses and triplets, but using zones in period (i.e. the period of the pulse, or the spacing of the triplet) rather than frequency. (Actually, the zones are in log(period)).

But it turned out - once Eric looked that the aforementioned top-scoring multiplets - that this didn't work. The reason is that the periods of pulses and triplets are (unlike spike/gaussian frequency) not smoothly distributed. They're contentrated at particular values that arise from our FFT lengths and the algorithms we use to find pulses and triplets. Our zone-finding algorithm wasn't finding RFI - it was just finding groups of detections at these periods.

So Eric put on his thinking cap, looked at a lot of histograms in log(period) space, and came up with new zone-finding algorithms for pulses and triplets. I don't know exactly what these algorithms are - Eric did the work in IDL and just sent me a list of zones.

Anyway, we're still working out some kinks, but this looks like the right way to go. We'll know when we re-run the pipeline and look at the new pulse/triplet multiplets.

In other news: our application for time at <a href=https://en.wikipedia.org/wiki/Five-hundred-meter_Aperture_Spherical_Telescope>FAST</a> to re-observe candidates was approved - I may have mentioned this in an earlier post - and we're scrambling to make a data recorder to use at FAST. We can't - for various reasons - use the same approach we used at Arecibo. After we record this data, BTW, we'll analyze it with the SETI@home client program. But we'll do this on cluster nodes, not home PCs. There won't be very much data, and it will be easier to handle that way. But who knows - maybe this collaboration will lead to an eventual un-hibernation.

</div>
8) Message boards : News : Nebula progress report (Message 2081557)
Posted 4 Aug 2021 by Profile David Anderson
Post:
Another in the All in the Timing series.
9) Message boards : Nebula : All in the Timing V (Message 2081556)
Posted 4 Aug 2021 by Profile David Anderson
Post:
<div style="width:640px">We (mostly Eric) examined lots of top-ranking multiplets from the recent all-pixels scoring run. Many of the multiplets were clearly RFI, and we'll be tweaking the filters a bit to reduce the number of such cases.

In addition, we found a multiplet whose time factor seemed way too high. We figured out why.
Here's what happened: when looking for multiplets in a pixel P, we assemble the detections from a "disk" that includes all of P and parts of the 8 adjacent pixels. For each pixel, we know the time intervals during which we observed it (more precisely: the intervals during which a beam was within a half beam-width of the pixel center). The time factor compares the time we observed P (the central pixel) with the time covered by detections. For example, if we observed for 26 seconds and the multiplet has two 13-second spikes, that would get a high score. If we observed for 1000 seconds, it would get a lower score.

The problem is that the observation times of adjacent pixels can be wildly different. In the problem case, the central pixel was observed for 0.3 seconds and one of the adjacent pixels was observed for 1000s of seconds. The multiplet included spikes from the adjacent pixel (in fact, it only had spikes from that pixel). Their duration was way more than 0.3 seconds, so the multiplet (incorrectly) got a high time factor.

How to address this? One approach is to include, in the calculation of time factor, the observation intervals of the adjacent pixels as well as those of P (or perhaps, for a given multiplet, just the pixels for which the multiplet has detections). But this is flawed: the detection disk includes only a part of each adjacent pixel; we shouldn't include times when the beam didn't overlap this part.

Another approach is to include, in the calculation of time factor, only the multiplet's detections that are in the central pixel. I think this is the way to go; it's certainly the simplest option. It would (correctly) give a low time factor score in the above example.

If a multiplet consists primarily of detections from an adjacent pixel, this approach would tend to give it a low time factor score. But that's OK - when we score the adjacent pixel we'll see a (probably better) version of this multiplet, and it would get a higher time factor in that pixel.

A related issue: during multiplet finding we do something called "observation consistency pruning", which means e.g. ignoring a 13-second spike that occurs during a 1 sec observation. We do this only for detections that lie in the central pixel, because that's the only pixel for which we've read the observation intervals. In theory we could do this for adjacent pixels too, but I don't think it's worth it.
</div>
10) Message boards : News : Nebula progress report (Message 2080046)
Posted 15 Jul 2021 by Profile David Anderson
Post:
Oops! Fixed. -- David
11) Message boards : News : Nebula progress report (Message 2079760)
Posted 12 Jul 2021 by Profile David Anderson
Post:
... wherein it's asserted that we're in the home stretch.
12) Message boards : Nebula : The home stretch (Message 2079759)
Posted 12 Jul 2021 by Profile David Anderson
Post:
<div style="width:640px">There, I said it: after 25 years working on SETI@home, and 5 years working on Nebula, we're finally on the home stretch. The finish line (producing a final candidate list, and writing a paper) is in view, maybe a few months off. It's been a slog, and I'm eager to have it behind me.
<h3>A full-sky Nebula run</h3>
Over the last several years (as chronicled here) we've been developing and refining algorithms for detecting RFI, generating birdies, finding signal candidates (multiplets) and scoring these candidates. For these purposes it wasn't necessary to look for multiplets in all 15 million pixels; I generally looked in only 256K per cycle, sometimes 1M. Finding multiplets uses lots of computing and produces lots of files (2 per pixel). I didn't want to overstay my welcome at the Atlas computing cluster.

But a few weeks ago we decided that these algorithms had reached the point of being good enough, and I did, for the first time, a Nebula run that scored all 15 million pixels. This went faster than I thought it would. I processed batches of 1.25M pixels at a time, and each batch took only a few hours, running on about 1000 cluster nodes.

It turned out that the "multiplet uniqueness" step - ensuring that multiplets in adjacent pixels are disjoint - would take pretty long, like a week. So I did a minor rewrite of this program to increase its efficiency; now it's down to about a day.

Also, copying the 30M or so files from Atlas (Germany) to Centurion (Berkeley) took a couple of days. It seemed to confuse rsync; I had to do a couple of tries to get everything copied.
<h3>Keeping birdies separate</h3>
Until now, we've processed and scored birdie detections and real detections together. I decided to change things so that we do two separate Nebula runs: a) using only real (non-birdie) detections, and scoring all pixels; b) using real and birdie detections, and scoring only birdie pixels. There are two reasons for this:
<ul>
<li> A birdie could mask (i.e. prevent us from detecting) an actual ET signal in the same pixel. The odds of this are small - with 3,000 birdies, only .02% of pixels have a birdie - but still.
<li> We may want to change the number or parameters of our birdies, perhaps repeatedly. It would be good to be able to do this without rescoring all pixels (and transferring all the files).
</ul>
It took me a while to figure out a clean way to keep the two runs separate, in terms of files. I was already keeping all score-related files in a subdirectory, score/. What I settled on is:
<ul>
<li> On Atlas, everything stays the same. When we do a run, either birdie or non-birdie, the results go in score/.
<li> On Centurion, we have a new directory score_birdie/. After a birdie run, we copy score/ on Atlas to score_birdie/ on Centurion.
</ul>
This meant changing the scripts and PHP pages that run on Centurion to look for birdie-related data in score_birdie/, rather than in specially-named files in score/. This wasn't hard to do, and actually makes things a bit simpler.
<h3>Human rating of multiplets</h3>
The final output of Nebula is a bunch of score-ranked lists of multiplets. We already know that some fraction of these will be RFI of the sort that's apparent to a human observer, but hard to identify algorithmically. That's OK. The goal our RFI algorithms is not to remove 100% of RFI; doing so would probably remove ET signals too. The goal is to remove enough RFI that a good fraction (say, at least half) of the top-ranking multiplets are not obvious RFI.
The final (post-Nebula) stages of SETI@home are
<ul>
<li> Manually examine the top few thousand multiplets, in all the various categories and score variants, and remove the ones that are obvious RFI. At least initially we'll do this as a group, on Zoom, to make sure we agree on what constitutes obvious RFI.
<li> Make a list of the multiplets that remain; re-observe those spots in the sky (hopefully using FAST), analyze the resulting data (probably using the SETI@home client running on a cluster) and see if we find detections consistent with the multiplets. If we do, maybe that's ET.
</ul>
So I extended our existing "bookmark" system to let you rate multiplets. When you bookmark a multiplet you can now give it a 0-10 rating, as well as a comment. The web page for each multiplet shows the ratings that have been reported so far. This mechanism is intended for use by our group (me, Eric, Dan, Jeff) but any SETI@home user can browse and rate multiplets. Feel free to do so!
</div>
13) Message boards : News : Nebula progress report (Message 2077682)
Posted 10 Jun 2021 by Profile David Anderson
Post:
Check out <a href=https://setiathome.berkeley.edu/forum_thread.php?id=85745&postid=2077681#2077681>recent advances in drifting RFI removal</a>.
14) Message boards : Nebula : Drifting (on a sea of...) (Message 2077681)
Posted 10 Jun 2021 by Profile David Anderson
Post:
<div style="width:640px">We've been working almost entirely on refining the drifting RFI algorithm. This - I hope - is the last big piece before we do our final Nebula run and finish the paper.
<p>
The goal, as with all RFI algorithms, is to:
<ul>
<li> <b>Remove all the RFI</b>. We've <a href=https://setiathome.berkeley.edu/nebula/bookmark.php?action=list>bookmarked</a> a lot of examples of drifting RFI. There's a spectrum: some examples are clearly RFI, others are less obvious. When we change the algorithm, we look at these examples to make sure we're still removing at least the obvious ones. We also look at the top-ranking spike/gaussian multiplets. It's OK if a small fraction of these are RFI; we can skip over them in the manual inspection process. But if most of them are RFI we need to change the algorithm.
<li> <b>Remove only RFI</b>. We don't want the algorithm to remove an ET signal. To check for this, we see what fraction of birdie spikes are flagged as drifting RFI. This is inevitably nonzero - some birdie detections happen to lie in regions of drifting RFI - but it shouldn't exceed a few percent. Also, we monitor the fraction of all spikes flagged as drifting RFI; 10% is plausible; 20% is probably too high.
</ul>
Recall that the drifting algorithm takes a "vertex" detection D and forms two fans of triangles in time/frequency space, emanating from D in the positive and negative time directions. We count the number of detections in each triangle that are "far" from D (a couple of beam widths or more) and compute a probability for each triangle; the more detections in the triangle, the lower the probability.
<p>
Recent changes:
<ul>
<li>Previously, we looked at the product of the probabilities of opposing triangles; if this was below a threshold, we flagged both of them as RFI. This didn't work well in some cases; we tried various alternatives. Our current approach uses two thresholds. If a triangle's probability is below 1e-8, we flag it as RFI. If a triangle and an opposing triangle are both below 1e-4, we flag them both.
<li>Previously, when we flagged a triangle as RFI, we flagged all the detections in the triangle, including its vertex. But there are cases (including lots of birdies) where a triangle has lots of close, non-RFI detections, that happen to be far from the vertex detection; these were erroneously getting flagged. So we changed to a "vertex-only" scheme where - if any triangle is flagged as RFI - we flag only the vertex detection, not the detections in the triangles.
<li>The vertex-only policy introduced a wrinkle: before we apply the drifting algorithm, we group detections into "clusters" of nearly-identical signals. From each cluster, one detection is picked as the "master". Only the master detections are used in the drifting algorithm; including the non-masters would skew the statistics. Previously, we flagged non-master detections lying in flagged triangles. With the vertex-only policy, we must do things differently: when we flag a vertex detection, we flag the other detections in its cluster. This required adding a data structure to keep track of clusters.
<li>Previously, "clusters" meant detections within 1 second and 1 Hz. This wasn't appropriate for short or long FFT lengths. We changed it to use rectangles based on FFT bin sizes.
</ul>
With these changes, the algorithm is working pretty well. It removes only 2.9% of birdie spikes and 11.8% of all spikes, it removes the obvious examples, and few of the top multiplets are RFI. So maybe we're done with drifting.
<p>
Other news: we're talking with astronomers at FAST about using Nebula as part of SETI sky survey there. Needless to say, that would be very exciting!
</div>
15) Message boards : News : Nebula progress report (Message 2074148)
Posted 24 Apr 2021 by Profile David Anderson
Post:
Check out recent data analysis progress: Reobservation and drifting RFI.
16) Message boards : Nebula : Reobservation and drifting RFI (Message 2074147)
Posted 24 Apr 2021 by Profile David Anderson
Post:
<div style="width:640px">We at SETI@home are fully vaxxed now, but we're still meeting on Zoom. Progress has been good. We're finally generating spikes for most birdies, and "finding" their multiplets. We're trying lower-power birdies to explore sensitivity. Here's what we've been doing/thinking recently.
<h3>Reobservation</h3>
If a radio SETI project (like SETI@home) detects something resembling an ET signal, it could also be RFI, noise, a satellite transmission, or an artifact of the project's hardware or software. One way to rule these out is to reobserve that sky location, preferably using a different telescope and data analysis system, and see if you detect the same thing. (This assumes, of course, that ET's transmitter is on for long periods.)

SETI@home, because it looks at most sky locations multiple times, has a limited form of reobservation built in. We use this (in our multiplet score function) to weed out one-time anomalies. But, to have confidence that a candidate is ET, we still need to reobserve separately. We started making plans to do this.

The system used for reobservation must be at least as sensitive as the original. Arecibo would have been fine, but alas. Another option is the FAST radio telescope in China, and we're applying for observing time there.
<h3>Drifting RFI algorithm</h3>
The drifting RFI algorithm had been erroneously removing lots of birdie spikes (~30% of them). We fixed this and made some other improvements to the algorithm:
<ul>
<li> Lower the probability thresholds (i.e. remove less).
<li> Change the notion of "average" detections per bin so that it uses the mean if the median is zero. This makes algorithm work better when there are few total detections.
<li> Change the way we measure the "entropy" of detections in a bin so that it works even when there are few detections.
</ul><h3>Observing time</h3>
I got curious about how much total Arecibo observing time we (commensally) used. It turns out to be less than I thought - about 400 days of data (per beam) in 12 calendar years, or about 9%. This is because the ALFA receiver, which we used, is just one of several Arecibo receivers, and it was only installed part of the time. Details are <a href=https://setiathome.berkeley.edu/nebula/obs.php?action=obs_totals>here</a> and <a href=https://setiathome.berkeley.edu/nebula/fullsky/day_obs.png>here</a>.

<a href=https://setiathome.berkeley.edu/forum_thread.php?id=85692>A while back</a> I discussed "bandwidth-dependent sky coverage" - the fact that for long FFT lengths (like 128K samples, or ~13 seconds) we don't have sufficiently long observations in some pixels, and therefore we have effectively covered less sky at those bandwidths.

I was calculating this in a "pessimistic" way: counting a pixel only if it was certain to have a sufficiently long interval. I.e.: you need a 26-second observation to be sure that a 13-second period of unknown start time is contained within it. Dan pointed out that if you have a shorter observation - say 14 seconds - the random 13-second interval MIGHT be contained in it. So I calculated the bandwidth-dependent sky coverage with this "optimistic" assumption. Indeed, sky coverage is greater - but at 128K FFT length, not by much. Results are <a href=https://setiathome.berkeley.edu/nebula/coverage.php>here</a>.
<h3>Frequency range pruning</h3>
Our existing RFI algorithms are run, together, on the entire set of detections. We added a new type of RFI rejection which is done later in the pipeline: namely, during multiplet finding. It applies only to spike/gaussian barycentric multiplets. It's based on the fact that the barycentric frequencies of the detections in these multiplets shouldn't vary much, but the topocentric frequencies (i.e. the actual received frequency, not adjusted by Doppler shift from Arecibo's motion) SHOULD vary because of this motion. For RFI (from terrestrial sources) it's the other way around.

We call this "frequency range pruning". What we do is to remove groups of detections that are within a .1 day interval, and for which B > 2*max(max_bw, D), where B is the RMS variation in barycentric frequency, D is the RMS variation in detection frequency, and max_bw is the max bandwidth (FFT freq resolution) of the detections. This is done at the very end (i.e. after time overlap pruning).
<h3>Miscellany</h3>
I changed score factor normalization so 25th percentile goes to -1, and the 75th goes to 1. This puts the mean somewhere around 0, so it's a easier to interpret a score.

Finally, a couple of programming things. I fixed an extremely rare crashing bug in RFI removal. I was getting a "reference" to the first item in a deque, then removing the item, then using the reference. This is a no-no. Once I saw this in the debugger it was obvious, but it took many hours to find because it happened in 1 process out of 56, about 1 hour into a 90-minute computation. This made me briefly wish I was using a language that prevented this sort of mistake (C++ does not).

Also, I discovered that the program that computes RFI zones was crashing, but I hadn't noticed it because the way I run the Nebula pipeline (using the Unix "make" program) wasn't checking the exit code. So we were using zone bitmaps from 6 months ago, though I don't think this made any difference. In any case, the makefile now checks all exit codes.
17) Message boards : Nebula : All in the Timing IV (Message 2070907)
Posted 17 Mar 2021 by Profile David Anderson
Post:
<div style="width:640px">(The title refers to <a href=https://en.wikipedia.org/wiki/All_in_the_Timing>a group of one-act plays by David Ives</a>.)

A while back, Eric and I had the idea of "observation pruning": removing spikes whose duration (a function of their FFT length) is too short for the observation in which they occur.

For example, suppose a 13-second spike S (the longest spike, hence the narrowest bandwidth) occurs at a time when the telescope beam is moving at a speed where it crosses a point in the sky in 1 second. Then it's unlikely that the source of S is a cosmic signal; it's more likely RFI or noise. And if S is from a cosmic signal, that signal will probably be detected with greater power at a shorter FFT length.

Anyway, a few weeks ago I wrote some code to do observation pruning. But - surprise! - I didn't get it right the first time.

I decided to do observation pruning in 'nebula_score', the program that finds and scores multiplets, because that's the only place where we do things on a per-pixel basis, and fetching a pixel's observation history is expensive.

For a given pixel P, nebula_score reads and processes detections from a sky "disc" that includes parts of the 8 adjacent pixels. In my first attempt, I was comparing the intervals of these detections with the observation intervals of P (for the beam in which the detection occurs). This removed too many detections - not surprisingly, since the adjacent pixels have different observation intervals.

In my second attempt, I merged (for each beam) the observation intervals of all 9 pixels and compared against that. But this didn't help at all! I had a panicky feeling that there was some insidious problem in how I was computing the intervals, or in the data itself. I started working on tools for studying this.

But then I looked at the code for a while, and I realized the problem was that, after concatenating the 9 lists of observation intervals, I wasn't time-sorting the result, as required by the pruning logic.

A few seconds later, I realized that merging the lists was the wrong idea to begin with! I should just compare each detection against the observation intervals for the pixel that it's in. D'oh!

It wasn't immediately clear how to implement this efficiently, but eventually I figured out a way that, although not quite optimal, required changing only a few lines of code. Sweet.

I tested this on a few pixels and it seems to work; it removes about 10% of spikes, which is about what I expected. I did a scoring run using the new version of nebula_score; it's online now.

This experience highlights a big problem in Nebula: we're not doing design review or code review. The only programmers are Eric and me. Each of us has other projects - Eric has ICON, I have BOINC - as well as our Nebula work. Neither of us has quite enough time to look carefully at what the other one is doing. We pay a price for this: we make mistakes, and it costs us time. Eric has been mired in birdie_gen for months now, while I lurch along with RFI and multiplet stuff. And, or course, the real fear is that we're making undetected mistakes.

In case you, Dear Reader, are in a position to help, the Nebula source code repo is <a href=https://sourceforge.net/p/seti-science/code/ci/master/tree/>here</a>. Warning: the learning curve is huge.
18) Message boards : News : Birdies and drifting RFI (Message 2069318)
Posted 25 Feb 2021 by Profile David Anderson
Post:
Check out the latest entry in the <a href=https://setiathome.berkeley.edu/forum_thread.php?id=85712>Nebula Blog</a>.
19) Message boards : Nebula : Birdies and drifting RFI (Message 2069317)
Posted 25 Feb 2021 by Profile David Anderson
Post:
<div style="width:640px">In the last month we've made progress in several areas.
<h3>Birdie generation</h3>
On property of birdies (simulated ET signals) is bandwidth. Before, we were generating almost entirely narrow-band birdies; We weren't generating enough with wider bandwidths to give us sensitivity stats. I changed things so that we generate 100 birdies in the bandwidth range corresponding to each of SETI@home's 15 FFT lengths. This uses a configuration file so we can, for example, specify different power ranges for each bandwidth range.

I did a Nebula run with this, and the results can be seen in the <a href=https://setiathome.berkeley.edu/nebula/sensitivity.php?dir=fullsky>sensitivity summary page</a>. In most cases, we're finding almost all birdies, regardless of their bandwidth and power. This isn't what we expected; it suggests that there are some remaining problems, probably in how we generate birdie detections.
<h3>Birdie detection generation</h3>
Eric made several changes to how we generate detections (spikes) for birdies:
<ul>
<li> Limit of 8 spikes per WU (since this is what the SETI@home client does).
<li> For planetary birdies (i.e. where the transmitter is on the surface of a rotating planet), assume that ET has put two transmitters at antipodal positions, so that one of them is always visible. Before we were assuming a single transmitter and that the planet is transparent to radio waves (not realistic); Before that we were assuming a single transmitter and that the planet is opaque, so that the transmitter is visible only about half the time.
<li> Make sure detections start an integer number of FFT lengths from the start of the WU (since that's what the SETI@home client does).
<li> The "time" of a detection is the midpoint of the FFT bin, not the start.
</ul>
<h3>Drifting RFI removal</h3>
The drifting RFI algorithm looks at the upper and lower "fans" of triangles in freq/time space centered at a given signal. If two nearly-opposed triangles both have a statistical excess of detections, their contents are flagged as drifting RFI.

A while back we added a requirement that the detections in each triangle be spread out in time by a certain factor; this avoided getting rid of everything between parallel RFI bands. However, it also prevented us from flagging a certain type of RFI, namely repeated clumps of detections at about the same frequency (the "8 spikes per WU" rule can produce such clumps).

I fixed this by removing the time-spread requirement for the middle 3 triangles, i.e. the ones with little drift. Ad hoc, but it did the job in the cases we looked at.

Second, I fixed a subtle bug. Originally the drifting algorithm had only an "upper" fan of triangles. As we flagged drift triangles, they were always in increasing time order. The logic for maintaining windows of triangles assumed this order. When we added a lower fan of triangles, we no longer flag triangles in time order.

For the triangles flagged for the current detection type, I don't think this actually caused any problems, other than making things slightly less efficient (we didn't flush some triangles as early was we could).

But for spike triangles - which are used to flag gaussians - it did result in a bug. The triangles were being written to a file out of order. So there were cases where a triangle wasn't being read when it was needed, and a gaussian wasn't being flagged that should have been.

The fix for this was slightly involved. We now maintain two double-ended queues, one for upper triangles and one for lower. (Upper triangles are generated in time order, and so are lower triangles, but the combination is not). When we flush triangles from the window - and, in the case of spikes, write them to a file - we alternate between the deques in a way that writes them in time order.
</div>
20) Message boards : Nebula : Did SETI@home waste computing? (Message 2068005)
Posted 10 Feb 2021 by Profile David Anderson
Post:
<div style="width:640px">As described in <a href=https://setiathome.berkeley.edu/forum_thread.php?id=85692>my previous blog post</a>, we recently realized that our search for narrow-band signals was less effective during observation periods where the telescope beams are moving fast, and that such periods comprise the majority of our data. People then pointed out that - since much of the SETI@home front-end computation involves looking for narrow-band signals - it seems like we wasted a lot of computing time.

I'd like to respond to this. I take SETI@home - and volunteer computing in general - very seriously. When people run a BOINC project, they invest time and energy, and their electric bills may increase. BOINC projects have an obligation to use their contributions efficiently.

In this case, we could have been more efficient, but the computing time spent doing long FFTs on short observations wasn't wasted. First, we detected narrow-band RFI (which generally doesn't depend on pointing) during these periods; this is important e.g. for identifying RFI "zones". Second, our current notion of "observation" is conservative; we count it as an observation only when the beam is close to the pixel center. If an ET signal is powerful, it could be detected even if the beam is farther away. So the fraction of the sky where we can detect powerful narrow-band signals is larger than the number I gave.

But it's true that it took us a long time to realize that narrow-band signals and short observations don't go well together. Here's a long-winded explanation of why this happened:

SETI@home "piggybacks" on Arecibo; other science projects (like pulsar search and Hydrogen mapping) control the telescope pointing. When we designed SETI@home, and for the first 7 years of data collection, we used a dedicated "<a href=https://www.researchgate.net/figure/The-Arecibo-Observatory-feeds-in-July-1972-The-new-430-MHz-linefeed-is-mounted-on_fig4_258787780>flat feed</a>" antenna, which has a relatively large beam. When the main antenna (in the Gregorian dome) is tracking a point in the sky, the flat feed's beam moves across the sky at about twice the sidereal rate (i.e. the rate of Earth's rotation).

Given this slew rate and beam diameter, it takes the flat feed's beam about 13 seconds to pass over a point. Frequency resolution is important when looking for narrow-band signals. We chose our longest FFT length - i.e. our best frequency resolution - accordingly: 128K samples at 9765.625 samples/sec is 13.4 seconds. We tailored our search algorithm to the data source.

In 2006 we switched from the flat feed to the new <a href=https://alfalfasurvey.wordpress.com/2009/01/13/what-is-alfa/>ALFA receiver</a>, which has some big advantages:
<ul>
<li> It uses very low noise cryogenic receivers.
<li> It has a larger collecting area.
<li> It has 7 beams, which lets us cover the sky faster and also lets us do "multi-beam" RFI detection, which rejects similar signals that occur in 2 beams at the same time.
</ul>
When we switched to ALFA we didn't know what its patterns of pointings would be like. It turned out to move faster, on average, than the flat feed had; we didn't initially know this. And because the beams are smaller, observations are shorter. Our application kept using long FFT lengths, even on data from short observations. We could have made the application more efficient by doing only shorter FFT lengths for such data.

Around the same time, SETI@home transitioned to a sort of "maintenance mode": we focused on system administration - keeping our servers and databases running - and porting the application to GPUs and Android. This kept us busy; we had full-time jobs doing non-SETI things, and SETI@home's funding sources (other than donations) had dried up. SETI@home kept accumulating detections (spikes, Gaussians etc.), but we didn't study them.

In 2016, I started Nebula, and we began to analyze detections. It took a couple of years to get the Nebula pipeline (RFI removal and multiplet finding) working. We created the "birdie" mechanism, which lets us test algorithms. Everything we looked at required rethinking and often replacement. At the same time, we undertook the huge project of downsizing our server complex.

So it wasn't until recently that - while trying to figure out why narrow-band birdies weren't producing spikes - we realized that looking for narrow-band signals works best when you have long observations. In retrospect we could have figured this out much earlier.

Anyway, the lessons I've learned from this include:
<ol>
<li> Don't start doing large-scale computing until your entire scientific pipeline is in place, or at least enough of it to let you assess the scientific value of the computing results.
<li> Run the pipeline as soon as the computing results start to come in; don't wait until the computing is all done.
<li> Make sure you have enough funding to do 1) and 2).
</ol>
</div>


Next 20


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.