Joined: 13 Feb 99
The collapse of the Arecibo telescope was bad news for radio astronomy. The main impact on SETI@home is that we'll have to find other telescopes (e.g. Green Bank or FAST) to do reobservations. In the meantime, progress continues on Nebula.
Sensitivity summary info
SETI@home's scientific goal - other than finding ET - is to make a quantitative statement about our sensitivity. In practice, this means seeing what fraction birdies we "find", and studying how this fraction depends on the power of the birdies and the observation time for their location.
I finally got around to making a page that shows this info explicitly; it's the Sensitivity summary link under Birdies. This shows, for bary and non-bary, the percent of birdies we found as a function of power, of observation time, and of their product.
Ideally, the percent should approach 100% as these quantities increase. It doesn't currently do this, which means we have more work to do. In the previous run the percentage went up to around 35%; the current run is worse because of (presumably) problems with birdie generation.
Multiplets near stars
SETI@home is more or less a sky survey; when Arecibo has stared at points they're generally hydrogen clouds or pulsars, not the kinds of stars that have habitable planets. But the beams randomly pass over stars, some of them many times. So I was interested in looking at the best-scoring multiplets that are close to (i.e. in the same pixel as) a star. For this purpose, I use the list of 118000 stars pinpointed by the ESA Hipparcos mission.
The results of this are in the Multiplets near stars page, linked under Multiplets. For each category (detection type and baryness) this shows a list of stars for which there is a multiplet of that category in same pixel. There are different lists for each score variant. Note: these are partial lists, since we only scored 256K out of 16M pixels in this run.
Detection time intervals
Detections (e.g. spikes) are not instantaneous - they occupy a time interval whose duration depends on FFT length. I had been assuming incorrectly that the "time" attribute of detections was the start of this interval; actually it's the midpoint. I fixed this. This affects, probably in a minor way, RFI removal (the multibeam algorithm) and multiplet finding (time overlap pruning).
New time factor
If a persistent signal is emanating from a point in space, we'd expect to hear it most of the time a beam is near that point. The "time factor" in a multiplet score is intended to reflect the extent to which this holds.
Our current definition of time factor doesn't take into account the duration of detections, as discussed above. This seemed wrong to me: if we have a 13-second observation, and there's a 1-millisecond detection somewhere in the middle, that shouldn't count as much as a 13-second (i.e. 128K FFT length) detection that spans the whole interval.
So I proposed a new time factor that takes detection duration into account. Suppose we've observed a pixel for a set of time intervals adding up to X seconds, and that we have a multiplet consisting of a set of detections. These detections like (locally) in a 250 Hz frequency band. We take the union of the time intervals of these detections; say this adds up to Y seconds.
The proposed time factor is the probability that, over Y seconds, there's a set of detections in a 250 Hz band whose combined time is at least X. This requires knowing the average fraction of time, over our entire data set, that a signal is present in a random 250 Hz band. I computed this; it's about 5% for spikes and Gaussians, 50% for pulses and triplets
(which generally have much longer durations).
I've coded this but didn't use it in the current scoring run; we'll try it out later and see how it works.
Scoring I/O efficiency
The administrators of the Atlas cluster politely told me that when I'm doing a scoring run (which uses 500-1000 cluster nodes) the resulting disk I/O is swamping the main file server. I thought about this. The scoring program reads the set of detections in the pixel disk, but this should be pretty efficient: 9 index reads and 9 sequential reads from a pixel-sorted file. The culprit, I guessed, was reading the result angle range for each pulse and triplet detection; this uses a memory-mapped file, but each access could potentially case a disk read.
So I changed things so that, during RFI removal, we take the result angle range (which we're already accessing at that point) and store it in an otherwise unused field of the detection records. That way we don't need to look it up in the scoring program. Hopefully this will fix the problem; it's critical that we stay in the good graces of the Atlas admins.
Eric continues to develop the way we generate spikes for birdies. Recent changes:
A nasty bug
I looked at the results of a scoring run and saw that, for a small fraction of pixels, the program was crashing. This turned out to corruption of the malloc heap - a nightmare to find, because the problem occurs long before the crash, and could be anywhere in the program.
After many highly stressful hours, I found the problem in the "weighted activity selection problem" algorithm that I added a couple of months ago. The pseudocode I based this on used FORTRAN notation where arrays go from 1..n
rather than 0..n-1.
This reminded me again that C++, even with STL and all the fancy new features, is a primitive language compared to e.g. Java or Python. It's easy to make mistakes that neither the compiler nor the runtime system can catch. Too late to change now.
Ryan Lee (a UC Berkeley student) has been working on using Nebula to analyze SERENDIP data recorded at Arecibo over the last few years. When I develop new features in Nebula I concentrate on SETI@home and sometimes they don't work for SERENDIP. I went back and fixed some of these.
©2021 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.