Scientific Newsletter - July 24, 2001
Eric Korpela, Steve Fulton
Because SETI@home is run by millions of users using many different types of computers, we often get asked how we know that everyone is getting the right answers when they process data for us.
There are several reasons why a result returned by a SETI@home volunteer might be incorrect. The most common reason we get incorrect results is processor malfunction. If a processor overheats, perhaps because there is dust buildup inside the machine, or maybe it's just a really hot day, the first part of the chip to fail will be the most complex part, the floating-point unit. A failure of the floating-point unit, which is responsible for most of the calculations performed by SETI@home, will usually not cause a computer to crash. It will cause the computer to generate incorrect results. These innocent failures are responsible for most of the incorrect results we see. The most common symptom of this problem is that every result from malfunctioning computer contains hundreds of potential signals. Of course, some valid results also contain hundreds of signals.
Fortunately, most incorrect results of this type contain values that could not result from a correct SETI@home calculation. By checking that the parameters of a signal are within the allowed bounds we can exclude most signals of this type before they cause any confusion.
There are also a few irresponsible people who are running hacked versions of the SETI@home client that also send back bad results. Usually these results have no detected signals at all. If someone sends back thousands of results, but never finds anything, even test signals, we get suspicious.
There's a third type of incorrect result that occurs, too. Sometimes, very rarely, a computer will get the wrong answer to a calculation for no apparent reason. This appears to happen about one out of every 3,000,000,000,000,000,000 calculations. If you let your computer run SETI@home for a thousand years, it would get the wrong answer once. (Of course by then your computer would have failed for some other reason). But since SETI@home gets a thousand years of CPU time every day, we see one or more of these failures per day.
Because these errors happen, it's good to have a check of the results. Fortunately, SETI@home has enough volunteers that we can process each piece of data more than once and compare the potential signals detected by different computers to each other. We use the result of the comparison to rank our results by how confident we are that they were processed correctly. The possible outcomes of the comparison of a signal are:
1. We mark the signal as fully verified if 60% or more of the results for this work unit contain a matching signal.
2. If the signal cannot be verified we mark the signal as unverified. This can happen for two reasons. Early in the project, when we had fewer users, we were unable to process every work unit multiple times, so some early work units cannot be verified. There are also many work units that were processed by more than one version of the SETI@home client. More recent versions include analysis that was not present in the early versions, so certain signals will only be found with new versions.
3. If a signal is present in more than one of the compared work units, but less than 60%, we mark it as questionable.
4. If a signal is present in only one work unit, but should have been detected in others, we mark it as an incorrect signal.
Using the results of this comparison we assign each result for a workunit a numerical score. Based upon this score, we choose the best one and copy it to our master database, where it will be examined in further stages of the SETI@home data analysis. (Don't worry, everyone who processed the work unit still receives credit for having processed it, and will share in the credit should we discover E.T.)
So far we've run the results from 327 tapes through the result verifier. This represents 47.5% of the SETI@home database.
The verification scores will be used in later processing when choosing potential candidate signal, those that are fully verified will be given higher priority than those that cannot be verified. Those that are marked as incorrect will not considered further.
|Copyright © 2016 University of California|