How much RAC does SETI@home need for pseudorealtime analysis? 

Message boards : Number crunching : How much RAC does SETI@home need for pseudorealtime analysis?
Author  Message 

Greetings folks, I'm new to the whole BOINC scene so please forgive my ignorance. I have a technical question about the SETI@home project itself:  
ID: 802629 ·  
Greetings folks, I'm new to the whole BOINC scene so please forgive my ignorance. I have a technical question about the SETI@home project itself: Not sure if I totally understand your question. The data is collected at Arecibo Observatory and is shipped to the labs at UC Berkeley, Space Sciences Laboratory, by courier, on 750 GB HDD's. (It used to be DLT tapes, and the project staff still refer to a recording session on disk as a "tape"). Of course the data that is received at Arecibo may have taken thousands of years to arrive at the telescope, from some distant star system. HTH, Keith ____________ Sir Arthur C Clarke 19172008  
ID: 802700 ·  
Greetings folks, I'm new to the whole BOINC scene so please forgive my ignorance. I have a technical question about the SETI@home project itself: Interesting question. To rephrase: how fast do we have to crunch to stay "caught up" with data from the telescope? The problem is that the telescope may be off line (or the receiver may be off line) for extended periods. I'm sure someone who watches can tell us how many actual "observing days" or "recording days" are available in the average year. I suspect that we're actually crunching faster, on average, than data arrives. ____________  
ID: 802708 ·  
Sorry, let me clarify: Assume for a second I had an infinite amount of money to spend on computing power (which I don't, I'm just curious), and I wanted to process the entire batch in the same amount of time as it would normally take for another batch to arrive. How much computing power (in units of RAC) would I have to purchase to do this? A hypothetical answer would be: "Well dude, we get a new batch every 'X' days. So if you wanted to be able to complete an entire batch worth of processing in X or less days, given infinite bandwidth and server capacity on our end, you would need a sustained RAC of about 'Y' to do it." EDIT: Ned Ludd worded it even better than I did the second time around too...  
ID: 802710 ·  
If what you mean is to process it as fast as the telescope can collect it (when it's actually collecting data which SAH can use), you're talking orders of magnitude more crunching horsepower than what we have currently available.  
ID: 802712 ·  
If what you mean is to process it as fast as the telescope can collect it (when it's actually collecting data which SAH can use), you're talking orders of magnitude more crunching horsepower than what we have currently available. Thank you Alinator. From the mention of Joe's name I was able to track down a post: http://setiathome.berkeley.edu/forum_thread.php?id=45805&nowrap=true#723956 From there I got some numbers to crunch... If the receiver picks up 151,793 work units / hour (per Joe), and receives for an average of 9.5 hours on any given day (per Joe), and I assume that the average multibeam workunit leads to 75 claimed credits, then we have 108,152,512.5 credits / day of work before replication. If I then assume that a work unit needs to be crunched an average of 2.5 times, I come up with an average workload of 270,381,281.25 credits / day. From http://boincstats.com/stats/project_graph.php?pr=sah: SETI@home is currently calculating with a RAC of 45,648,330. Since each host that crunches a work unit also claims credit for it, I do not need to adjust for the average 2.5 crunches per work unit. Therefore, SETI@home is currently calculating at about 17% of the rate needed for "pseudorealtime" analysis. Where I could use more clarification at this point then, assuming my calculations aren't completely out in left field, is a better approximation for the average number of credits per work unit, and the average number of times a work unit is likely to be processed before being retired. Of course this does not apply to Astropulse, which I guess is a more important figure to want to know at this point.  
ID: 802745 ·  
If what you mean is to process it as fast as the telescope can collect it (when it's actually collecting data which SAH can use), you're talking orders of magnitude more crunching horsepower than what we have currently available. before Eric's credit adjustment, I believe the average cr/task was 53cr. And therefore if the 15% reduction target is reached that could be adjusted to 46/task. And going by the tasks/workunits to be purged figures it would seem only ~8% of workunits need to be replicated more than twice.  
ID: 802777 ·  
Ah, cool. The revised estimate is then 137,973,765.28 credits / day, which means SETI@home actually has about 33% of the computing power it needs for the standard multibeam project.  
ID: 802793 ·  
if the 33% is about accurate and I take a look at the Berkeley data density page.  
ID: 802803 ·  
Hmmmm...  
ID: 802846 ·  
if the 33% is about accurate and I take a look at the Berkeley data density page. The multibeam recorder was installed in late June 2006, so about 2.17 calendar years ago. Then there was the nearly 9 months of downtime for repainting, which perhaps should be subtracted. I'd call it 17 months of operation, perhaps 520 days. Based on what's been split, roughly 36.5% of days produced some data. The MB group count divided by 520 days gives nearly 1293 groups per day on average. That's 331008 WUs per day, and reflects the MB work which has been done (and is being done now or in queue). The real question is how much data has been returned from Arecibo and not yet split. Some is kept locally and some put into storage at LBNL HPSS, we don't know even an approximate total. The MB group count multiplied by about 6.38 approximates how many AstroPulse WUs could be made from the same data; call it 2.8 million. As a very rough guess we've done maybe 50 thousand of those. The factor for Line Feed work (all called Classic on the data density page) is about 5.96 since it had more overlap than MultiBeam; there's another potential 6.9 million AP WUs. My feeling is that we have about the crunching power needed to keep up with incoming Arecibo data IF it were all Very High Angle Range (shorty) WUs and the project didn't go into meltdown. For the long term mix of angle ranges and assuming the same data is split for AP, I think we only have about one half to one third of the real time capability. OTOH, if Arecibo in future is only funded for looking for Near Earth Asteroids with the radar, there won't be any ALFA data to record. Joe  
ID: 802859 ·  
"Josef W. Segur" wrote:
OK then, we have a new round of estimates, now including Astropulse. The new assumptions are now that the average Astropulse work unit yields 750 credits and takes 25% longer to achieve then an equivalent number of multibeam work units. Two different amounts are computed, the average credit needed per day and the peak credit needed per day: RF = Replication Factor (How many times a work unit has to be processed before being retired) AP AC adjustment factor = (How long an AP credit takes to process vs. a MB credit) Average Credit Per Day Needed  331,008 MB WUs / day 331,008 * 6.38 (AP WU factor) = 2,111,831.04 AP WUs / day Average MB WU: 46 credits Average AP WU: 750 credits 15,226,368 MB credits/day * replication factor 2.08 = 31,670,845.44 1,583,873,280 AP credits/day * replication factor 2.08 * AP AC adjustment factor 1.25 = 4,118,070,528 MB workload: 31,670,845.44 MB credits / day AP workload: 4,118,070,528 credits / day SETI@Home Average Credit Needed / day = 4,149,741,373.4 SETI@Home RAC (08282008) = 45,648,330.00 Which is 1.1% of total needed Peak Credit Per Day Needed  151,793 MB WUs / hour * 9.5 hours/day = 1,442,033.5 MB WUs / day 1,442,033.5 * 6.38 = 9,200,173.73 AP WUs / day Average MB WU: 46 credits Average AP WU: 750 credits 66,333,541 MB credits/day * replication factor 2.08 = 137,973,765.28 6,900,130,297.5 AP credits/day * replication factor 2.08 * AP AC adjustment factor 1.25 = 17,940,338,773.5 MB workload: 137,973,765.28 credits / day AP workload: 17,940,338,773.5 credits / day SETI@Home Peak Credit Needed / day = 18,078,312,538.78 SETI@Home RAC (08282008) = 45,648,330.00 Which is 0.25% of total needed  In summary, we are running at up to 1.1% of the capacity required to process ALL of SETI@Home's workload as fast as new data is sampled. Also, Astropulse would represent over 99% of the entire workload. "Josef W. Segur" wrote:
The calculations don't seem to support Joe's conclusions, but I suspect that is because the average credit / AP WU prediction is way off. I estimated this value from the few AP WU units I saw go through my queue before I switched to optimized clients.  
ID: 802910 ·  
... The 6.38 multiplier is for a group of MB WUs, data for each group of 256 MB Wus can also produce 6.38 AP WUs. So to calculate from individual MB WUs it's 6.38/256 ~= 0.025. Then 331,008 MB WUs / day gives about 8275 AP WUs / day. Joe  
ID: 803193 ·  
Oh, wow! I redid the calculations using this information and they are now inline with Joe's estimates: RF = Replication Factor (How many times a work unit has to be processed before being retired) AP AC adjustment factor = (How long an AP credit takes to process vs. a MB credit) Average Credit Per Day Needed  331,008 MB WUs / day (331,008/256) * 6.38 (AP WU factor) = 8249 AP WUs / day Average MB WU: 46 credits Average AP WU: 750 credits 15,226,368 MB credits/day * replication factor 2.08 = 31,670,845.44 6,186,750 AP credits/day * replication factor 2.08 * AP AC adjustment factor 1.25 = 16,085,550 MB workload: 31,670,845.44 MB credits / day AP workload: 16,085,550 credits / day SETI@Home Average Credit Needed / day = 47,756,395.44 SETI@Home RAC (08282008) = 45,648,330.00 Which is 95.6% of total needed Peak Credit Per Day Needed  151,793 MB WUs / hour * 9.5 hours/day = 1,442,033.5 MB WUs / day (1,442,033.5/256) * 6.38 = 35,938 AP WUs / day Average MB WU: 46 credits Average AP WU: 750 credits 66,333,541 MB credits/day * replication factor 2.08 = 137,973,765.28 26,953,500 AP credits/day * replication factor 2.08 * AP AC adjustment factor 1.25 = 70,079,100 MB workload: 137,973,765.28 credits / day AP workload: 70,079,100 credits / day SETI@Home Peak Credit Needed / day = 208,052,865.28 SETI@Home RAC (08282008) = 45,648,330.00 Which is 22% of total needed  In summary, we are running at 22% of the capacity required to process ALL of SETI@Home's workload as fast as new data is normally sampled. Of this workload, Astropulse represents about 34%. However, SETI@Home currently has 95.6% of the computing power it needs over a sustained period of time.  
ID: 803255 ·  
The 6.38 multiplier is for a group of MB WUs, data for each group of 256 MB Wus can also produce 6.38 AP WUs. So to calculate from individual MB WUs it's 6.38/256 ~= 0.025. Then 331,008 MB WUs / day gives about 8275 AP WUs / day. Ah, this gives a markedly different result... But, there's still something that doesn't seem to addup with the numbers... If uses on average 2.1 results/wu, this means: 331008 SETIwu's/day gives 695117 tasks/day. With 365000 bytes/task, this means... 23.5 Mbit/s downloadusage. 8275 Astropulsewu's/day gives 17378 tasks/day. With 8 million bytes/task, this means... 12.8 Mbit/s downloadusage. But, if looks on cricketgraph, outgoing bandwidth was at around 20 Mbit/s in May 2007, and has steadily increased to around 45 Mbit/s in July, before the release of Astropulse. Now, this is by no means exact, since applicationdownloads and some other traffic also showsup on the graph. Still, the usage in July is roughly 2x that would be needed to run 700k tasks/day... If looks on Scarecrowsgraphs, September 2007 (1st. with full 'wait purge') shows on average 323396 wu in purgequeue, and 702442 results. If there's a 24hours wait before purging, these numbers is a good indication about how many wu/day and results/day. Also, it shows on average 8.449 result/second generated, something that means on average 729993 results/day. For later months, the lowest was November 2007, with 351k wu and 779k results in purgerqueue, and generated 763k result/day. For June 2008, it's increased to 553840 wu, 1164483 results, 1191560 generated/day, and based on returned result/hour, 1187016 result/day was returned. Also, these numbers indicates on average used 39 Mbit/s downloadrate. July 2008 gives a large spike, with 767670 wu, 1.6 M results, 1.25 M creation and 1.25 M returned. Since results/purged is much higher than generated and returned, it can indicate a batch of "bad" wu's or something, so wouldn't put too much meaning to these results. If looks on 1st. half of 2008, the averages is 515k wu, 1.1 M result, 1.1 M creation, and 1.1 M returned, and 37 Mbit/s. Also worth mentioning, on average 2.14 results/wu. While no idea how accurate the graphs really are, the numbers seems to fit with eachother, and seems also to be fairly close to bandwidthusage indicated on Cricketgraph... So, if 331k wu/day recorded is accurate, atleast for me it can look like SETI@home has been crunching roughly 50% more wu/day than the recordcapasity... How the addition of Astropulse will influence things is more unsure, since no idea how many "short" SETIwu's there is on average... ____________ "I make so many mistakes. But then just think of all the mistakes I don't make, although I might."  
ID: 803301 ·  
Message boards : Number crunching : How much RAC does SETI@home need for pseudorealtime analysis?
Copyright © 2016 University of California 