This paper appeared in Communications of the ACM, Vol. 45 No. 11, November 2002, pp. 56-61.
SETI@home: An Experiment in Public-Resource Computing
SETI (Search for Extraterrestrial Intelligence) is a scientific area whose goal is to detect intelligent life outside Earth [SHO98]. One approach, known as radio SETI, uses radio telescopes to listen for narrow-bandwidth radio signals from space. Such signals are not known to occur naturally, so a detection would provide evidence of extraterrestrial technology [COC59].
Radio telescope signals consist primarily of noise (from celestial sources and the receiver's electronics) and man-made signals such as TV stations, radar, and satellites. Modern radio SETI projects analyze the data digitally. This analysis generally involves three phases:
More computing power enables searches to cover greater frequency ranges with more sensitivity. Radio SETI, therefore, has an insatiable appetite for computing power.
Previous radio SETI projects have used special-purpose supercomputers, located at the telescope, to do the bulk of the data analysis. In 1995, David Gedye proposed doing radio SETI using a virtual supercomputer composed of large numbers of Internet-connected computers, and he organized the SETI@home project to explore this idea. SETI@home has not found signs of extraterrestrial life. But, together with related distributed computing and storage projects, it has succeeded in establishing the viability of public-resource computing (so-called because the computing resources are provided by the general public).
Public-resource computing is neither a panacea nor a free lunch. For many tasks, huge computing power implies huge network bandwidth, and bandwidth is typically expensive or limited. This factor limited the frequency range searched by SETI@home, since greater range implies more bits per second. Compared to other radio SETI projects, SETI@home covers a narrower frequency range but does a more thorough search in that range (see Table 1).
The first challenge for SETI@home was to find a good radio telescope. The ideal choice was the telescope in Arecibo, Puerto Rico, the world's largest and most sensitive radio telescope. Arecibo is used for various astronomical and atmospheric research, and we could not obtain its long-term exclusive use. However, in 1997 the U.C. Berkeley SERENDIP project developed a technique for piggybacking a secondary antenna at Arecibo [WER97]. As the main antenna tracks a fixed point in the sky (under the control of other researchers), the secondary antenna traverses an arc that eventually covers the entire band of sky visible to the telescope. This data source can be used for a sky survey that covers billions of stars.
We arranged for SETI@home to share SERENDIP's data source. Unlike SERENDIP, we needed to distribute data through the Internet. At that time, Arecibo's Internet connection was a 56 Kbps modem, so we decided to record data on removable tapes (35GB DLT cartridges, the largest available at the time), have them mailed from Arecibo to our lab at U.C. Berkeley, and distribute data from servers there.
We decided to record data at 5 Mbps. This rate was low enough that the recording time per tape would be a manageable 16 hours, and it would be feasible to distribute the data through our lab's 100 Mbps Internet connection. But it was high enough to do significant science. With 1-bit complex sampling this rate yields a frequency band of 2.5 MHz - enough to handle Doppler shifts for relative velocities up to 260 km/sec, or about the galactic rotation rate (radio signals are Doppler shifted in proportion to the sender's velocity relative to the receiver). Like many other radio SETI projects, we centered our band at the Hydrogen line (1.42 GHz), within a frequency range where man-made transmissions are prohibited by an international treaty.
SETI@home's computational model is simple. The signal data is divided into fixed-size work units that are distributed, via the Internet, to a client program running on numerous computers. The client program computes a result (a set of candidate signals), returns it to the server, and gets another work unit. There is no communication between clients.
SETI@home does redundant computation: each work unit is processed multiple times. This lets us detect and discard results from faulty processors and from malicious users. A redundancy level of two to three is adequate for this purpose. We generate work units at a bounded rate and never turn away a client asking for work, so the redundancy level increases with the number of clients and their average speed. These quantities have increased greatly during the life of the project. We have kept the redundancy level within the desired range by revising the client to do more computation per work unit.
The task of creating and distributing work units is done by a server complex located in our lab (see Figure 1). Our reasons for centralizing the server functions were largely pragmatic: for example, tape handling is minimized.
Figure 1: The distribution of data.
Work units are formed by dividing the 2.5 MHz signal into 256 frequency bands, each about 10 KHz wide. Each band is then divided into 107-second segments, overlapped in time by 20 seconds. This ensures that signals we're looking for (which last up to 20 seconds; see below) are contained entirely in at least one work unit. The resulting work units are 350 KB - enough data to keep a typical computer busy for about a day, but small enough to download over a slow modem in a few minutes.
We use a relational database to store information about tapes, workunits, results, users, and other aspects of the project. We developed a multithreaded data/result server to distribute work units to clients. It uses an HTTP-based protocol so that clients inside firewalls can contact it. Work units are sent in least-recently-sent order.
A garbage collector program removes work units from disk, clearing an on-disk flag in their database records. We have experimented with two policies:
Keeping the server system running has been the most difficult and expensive part of SETI@home. The sources of failure, both hardware and software, have seemed limitless. We have converged to an architecture that minimizes dependencies between server subsystems. For example, the data/result server can be run in a mode where, instead of using the database to enumerate work units to send, it gets this information from a disk file. This lets us distribute data even when the database is down.
The client program repeatedly gets a work unit from the data/result server, analyzes it, and returns the result (a list of candidate signals) to the server. It needs an Internet connection only while communicating with the server. The client can be configured to compute only when its host is idle, or to compute constantly at a low priority. The program periodically writes its state to a disk file, and reads this file on startup; hence it makes progress even if the host is frequently turned off.
Analyzing a work unit involves computing signal power as a function of frequency and time, then looking for several types of patterns in this power function: spikes (short bursts), Gaussians (narrow-bandwidth signals with a 20-second Gaussian envelope, corresponding to the telescope's beam movement across a point), pulsed signals (Gaussian signals pulsed with arbitrary period, phase, and duty cycle), and triplets (three equally-spaced spikes at the same frequency; a simple pulsed signal). Signals whose power and goodness-of-fit exceed thresholds are recorded in the output file.
Outer loops vary two parameters [KOR01]:
Figure 2: The SETI@home display, showing the power spectrum currently being computed (bottom) and the best-fit Gaussian (left).
The SETI@home client program is written in C++. The code consists of a platform-independent framework for distributed computing (6,423 lines), components with platform-specific implementations, such as the graphics library (2,058 lines in the UNIX version), SETI-specific data analysis code (6,572 lines), and SETI-specific graphics code (2,247 lines).
The client has been ported to 175 different platforms. The GNU tools, including gcc and autoconf, have greatly facilitated this task. We maintain the Windows, Macintosh, and SPARC/Solaris versions ourselves; all other porting is done by volunteers.
The client can run as a background process, as a GUI application, or as a screensaver. To support these different modes on multiple platforms, we use an architecture in which one thread does communication and data processing, a second thread handles GUI interactions, and a third thread (perhaps in a separate address space) renders graphics based on a shared-memory data structure.
Results are returned to the SETI@home server complex, where they are recorded and analyzed (see Figure 3).
Figure 3: The collection and analysis of results.
Handling a result consists of two tasks:
By buffering updates in disk files, the server system can handle periods of database outage and overload.
Eventually, each work unit has a number of results in the database. A redundancy elimination program examines each group of redundant results - which may differ in number of signals and signal parameters - and uses an approximate consensus policy to choose a "canonical" result for that work unit. Canonical results are copied to a separate database.
The final phase, back-end processing, consists of several steps. To verify the system, we check for the test signals injected at the telescope. Man-made signals (RFI) are identified and eliminated. We look for signals with similar frequency and sky coordinates detected at different times. These "repeat signals", as well as one-time signals of sufficient merit, are investigated further, potentially leading to a final cross-check by other radio SETI projects according to an accepted protocol [DOP90].
The public response to SETI@home
We announced plans for SETI@home in 1998, and 400,000 people pre-registered during the next year. In May 1999 we released the Windows and Macintosh versions of the client. Within a week about 200,000 people had downloaded and run the client, and this number has grown to over 3.83 million as of July 2002. People in 226 countries run SETI@home; about half are in the U.S.
In the 12 months starting in July 2001 SETI@home participants processed 221 million work units. The average throughput during that period was 27.36 TeraFLOPS. Overall, the computation has performed 1.7e21 floating point operations, the largest computation on record.
SETI@home relies primarily on mass-media news coverage and word-of-mouth to attract participants. Our web site (http://setiathome.berkeley.edu) explains the project, lets users download the client program, and provides scientific and technical news.
The web site shows leader boards (based on work units processed) for individuals and for groupings such as countries and email domains. Users can form teams that compete within categories; 97,000 such teams exist. Leader-board competition (between individuals, teams, owners of different computer types, and so on) has helped attract and retain participants. In addition, users are recognized on the web site and thanked by email when they pass work unit milestones.
We have tried to foster a SETI@home community in which users can exchange information and opinions. The web site lets users submit profiles and pictures of themselves, and has an online poll with questions concerning demographics, SETI, and distributed computing (of the 95,000 users who have completed this poll, for example, 93% are male). We assisted in the creation of a newsgroup, sci.astro.seti, devoted largely to SETI@home. Users have created various ancillary software, such as proxy data servers and systems for graphically displaying work progress. Our web site contains links to these contributions. Users have translated the web site into 30 languages.
We have worked to prevent our client program from acting as a vector for software viruses, and thus far we have succeeded: our code download server has not (as far as we know) been penetrated, and the client program does not download or install code. There have been two noteworthy attacks. Our web server was compromised, but only as a prank; the hackers did not, for example, install a Trojan-horse download page. Later, exploiting a design flaw in our client/server protocol, hackers obtained some user email addresses. We fixed the flaw, but not until thousands of addresses had been stolen. In addition, a user developed an email-propagated virus that downloads and installs SETI@home on the infected computer, configuring it to give credit to his SETI@home account. This might have been prevented by requiring a manual step in the install process.
Conversely, we have had to protect SETI@home from misbehaving and malicious participants. There have been many instances of this, although only a tiny fraction of participants are involved. A relatively benign example: some users modified the client executable to improve its performance on specific processors. We didn't trust the correctness of such modifications, and we didn't want SETI@home to be used in "benchmark wars", so we adopted a policy banning modifications.
We have discovered and defeated many schemes for getting undeserved work credit. Other users deliberately sent erroneous results. These activities are hard to prevent if users can run the client program under a debugger, analyze its logic, and obtain embedded encryption keys [MOL00]. Our redundancy-checking, together with the error-tolerance of our computing task, are sufficient for dealing with the problem; other mechanisms have been proposed [SAR01]. Conclusions
Public-resource computing relies on personal computers with excess capacity, such as idle CPU time. The idea of using these cycles for distributed computing was proposed by the Worm computation project at Xerox PARC, which used workstations within a research lab [SHO82], and was later explored by academic projects such as Condor.
Large-scale public-resource computing became feasible with the growth of the Internet in the 1990s. Two major public-resource projects predate SETI@home. The Great Internet Mersenne Prime Search (GIMPS), which searches for prime numbers, started in 1996. Distributed.net, which demonstrates brute-force decryption, started in 1997. More recent applications include protein folding (folding@home) and drug discovery (the Intel-United Devices Cancer Research Project).
Several efforts are underway to develop general-purpose frameworks for public-resource and other large-scale distributed computing. Projects collectively called The Grid are developing systems for resource-sharing between academic and research organizations [FOS99]. Private companies such as Platform Computing, Entropia and United Devices are developing systems for distributed computation and storage in both public and organizational settings.
On a more general level, public-resource computing is an aspect of the "peer-to-peer paradigm", which involves shifting resource-intensive functions from central servers to workstations and home PCs [ORA01].
What tasks are amenable to public-resource computing? There are several factors. First, the task should have a high computing-to-data ratio. Each SETI@home data unit takes 3.9 trillion floating-point operations, or about 10 hours on a 500 MHz Pentium II, yet involves only a 350KB download and a 1 KB upload. This high ratio keeps server network traffic at a manageable level, and imposes minimal load on client networks. Applications such as computer graphics rendering require large amounts of data per unit computation, perhaps making them unsuitable to public-resource computation. However, reductions in bandwidth costs will allay these problems, and multicast techniques can reduce cost when a large part of the data is constant across work units.
Secondly, tasks with independent parallelism are easier to handle. SETI@home work unit computations are independent, so participant computers never have to wait for or communicate with one another. If a computer fails while processing a work unit, the work unit is eventually sent to another computer. Applications that require frequent synchronization and communication between nodes have been parallelized using hardware-based approaches such as shared-memory multiprocessors, and more recently via software-based cluster computing, such as PVM [SUN90]. Public-resource computing, with its frequent computer outages and network disconnections, seems ill-suited to these applications. However, scheduling mechanisms that find and exploit groups of LAN-connected machines may eliminate these difficulties.
Thirdly, tasks that tolerate errors are more amenable to public-resource computing. For example, if a SETI@home work unit is analyzed incorrectly or not at all, it affects the overall goal only slightly. Furthermore, the omission is remedied when the telescope scans the same point in the sky.
Finally, a public-resource computing project must attract participants. There are currently enough Internet-connected computers for about 100 projects the size of SETI@home, and interesting and worthwhile projects have been proposed in areas such as global climate modeling and ecological simulation. To gain users, we believe that projects must explain and justify their goals, and must provide compelling views of local and global progress. Screensaver graphics are an excellent medium for this display, as well as providing a form of viral marketing. The success of public-resource computing projects will have the ancillary benefits of increasing public awareness of science, and democratizing, to an extent, the allocation of research resources. Acknowledgements
SETI@home was conceived by David Gedye. Woody Sullivan, Craig Kasnov, Kyle Granger, Charlie Fenton, Hiram Clawson, Peter Leiser, Eric Heien, and Steve Fulton have contributed ideas and effort to SETI@home. Thanks to our participants, and thanks to The Planetary Society, Sun Microsystems, the U.C. DiMI program, Fujifilm, Quantum, Informix, Network Appliances, and the other organizations and individuals who have supported SETI@home.