Message boards :
Number crunching :
What does SETI need? BOINC?
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
..that Pricing Hasn't Been Announced. Setup up RAID10 of 16 and then we might start having enough drive i/o. Hopefully the cloud solution proves viable. So they don't have to own/maintain equipment like that. Which makes me wonder if making ntpckr a BOINC project would be viable. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Wow, so we'd need 16 of these, just to start? Would that number be expected to grow down the road, or is the amount of data processed expected to be around the same amount, it just flows in and out? Not really up on what that server does, is there a link to some discussion about it? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
The whole idea is that the science database holds the canonical result of every workunit computed since the public launch of the project in 1999. That's two billion records since the switch to the BOINC platform in 2005, with up to 30 signals (of potential interest) stored for each workunit - and growing by about a million workunits a day. It grows faster and faster, as people here throw bigger and bigger GPUs into the mix. Pick any two needles out of that haystack, and see if they match. And do it again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, ... |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Oooh. I see. Hmm, this appears to be depressing news, unless another way is thought of to do this, because of the exponential growth of the data, sooner or later we'll have more data than can be stored... On Earth! lol Seriously though, is there a longer term plan, because it doesn't seem sustainable the way it is done right now. It appears that the hard bit is having to compare new results with all the ones that came before, and the need to store them all in one place so they can be compared quickly and efficiently? I assume people a lot smarter than I are working on and puzzling over this one, I hope they come up with a solution. Any thoughts as to what type of housing/backplane would be used to support that many drives with that high of data throughput? |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
The whole idea is that the science database holds the canonical result of every workunit computed since the public launch of the project in 1999. That's two billion records since the switch to the BOINC platform in 2005, with up to 30 signals (of potential interest) stored for each workunit - and growing by about a million workunits a day. It grows faster and faster, as people here throw bigger and bigger GPUs into the mix. I'm thinking it was some of the youtube videos of Jeff or Matt talking about how quickly the size grows, but I've slept since then. I want to say the db at that time was around 90TB or so... maybe it was 190TB. Edit: A quick search turned up a post Matt made last year under How could we improve things? "2. More and faster storage. If we could get, like, a few hundred usuable TB of archival (i.e. not necessarily fast) storage and, say, 50-100 TB of usuable SSD storage" I don't know the status of any of those items on the list, but they asked for any specific stuff in a while. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30701 Credit: 53,134,872 RAC: 32 |
Pick any two needles out of that haystack, and see if they match. And do it again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, ...Well, not quite. The sky has been divided into pixels. So you have the most recent result file in hand, go to the pixel and pick the needles (results) already in the pixel and generate a new score for the pixel. Now is that the highest score of all the pixels? Oh, can't wait around there is another most recent result file. This is the I/O block. You have to have a single master file because it must be current up to the nanosecond. Unfortunately this means the problem can't be shipped out via a BOINC project to a bunch of computers because the bunch of computers can't have nanosecond access to a single master file. Another way to think of the data problem is to look at the pixels like seats on a plane and you are trying to make a reservation. If there are two people who want the same seat at the same time the airline's computer reservation system has to decide who gets it. You can't have more than one copy of the seat list or two different people could be assigned the same seat. [Of course airlines intentionally overbook, but that is a different issue.] |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Thanks for the explanation. I am curious, so I did a little more searching around for pricing info, no one yet has solid pricing, but here is an interesting guess from this article - http://www.datacenterdynamics.com/servers-storage/samsung-sets-the-pace-with-a-monster-15tb-solid-state-drive/95805.fullarticle The South Korean company has released plenty of specifications, except the price. The drives will be sold through Samsung resellers so prices will filter through, but the channel is not expecting tags below £15,000. Which at todays exchange rate: 15000 British pounds = 21327.7500 US dollars Talk about a fund drive! Lets say we got 10, to have 1-2 as hot spares and whatever is needed for room for parity, plus having a little bit of extra room to grow, that would be about $200k. And that doesn't include whatever enclosure is needed to house these bad boys, talk about a high speed backplane.. Imagine how nice things would run with those babys in place. One can dream, eh? :) |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Thanks for the explanation. I am curious, so I did a little more searching around for pricing info, no one yet has solid pricing, but here is an interesting guess from this article - If pricing is expected to be £15,000 GBP I would imagine it would be closer to $17,000 USD. Probably about $17,500 as good that some of South Korea tend to be about 17% more than the same items in the UK without VAT. Once you figure in VAT the amounts are nearly the same or a bit less in the USD. I believe UK VAT is about 17.5% at the moment. Only $200k? It's not like that is about half of SETI@homes annual budget. This is probably why they are looking to a cloud solution. Given there is no cost to maintain the hardware & they only have to pay for the computing the need. I'm playing with a Google Cloud Compute instance. As they give new users 60 days and $300 in credit to do whatever they like. The 8 CPU instance costs about $7.30/day averaging 99.982% load. If that were scaled to a 64 CPU instance that comes to $58.40/day. Things get cheaper as you scale so it would actually be less than that. However at $58.40/day that is $21,316/year, <$1800/mo, with no upfront costs. I don't know the Amazon Cloud Compute rates for the instance SETI@home is using to experiment. I imagine they are getting a better rates since they went with their service. Education rates are often cheaper too. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Interesting info, but I'm slightly confused to what you are referencing, as I'm not a huge cloud fan personally, so am probably not up on the exact termiology. You say Google Cloud Compute instance, and then talk about 8 or 64 CPU instance, which to my way of thinking indicates processing (CPU) power, where I thought the discussion we were having was regarding large, fast online storage capacity. Could you please explain it out a bit for those of us who aren't up to speed on things Cloud? |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Interesting info, but I'm slightly confused to what you are referencing, as I'm not a huge cloud fan personally, so am probably not up on the exact termiology. You say Google Cloud Compute instance, and then talk about 8 or 64 CPU instance, which to my way of thinking indicates processing (CPU) power, where I thought the discussion we were having was regarding large, fast online storage capacity. Could you please explain it out a bit for those of us who aren't up to speed on things Cloud? It's been mentioned a few times but currently SETI@home is developing ntpckr in a cloud environment. Which would make purchasing/building a ntpckr server with super fast storage unnecessary. Edit: The disk i/o from a cloud storage farm is mind boggling. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
So, is it unlimited storage per CPU instance, as we were talking about GB's not CPU's initially? Not trying to be thick here, just trying to understand how it all works together. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
So, is it unlimited storage per CPU instance, as we were talking about GB's not CPU's initially? Not trying to be thick here, just trying to understand how it all works together. Generally you configure however much storage you need. I configured mine with 10GB as that was the minimum & more than I needed. Google does limit me to 65,536 GB per volume, but I do not have a limit on the number of volumes I can create. If I needed 200TB I could create 4 50TB volumes & then use the OS to set them up in RAID0. In this case RAID0 is unlikely to be an issue for data loss. As the volumes are virtual devices coming from a SAN. However configuring RAID10 using volumes from different data centers could be used for absolute redundancy. It looks like the way Google is doing pricing for storage is based on the amount of data r/w from the storage. Rather than a flat fee for the size. Much like network traffic. So far for March my disk usage is Compute Engine Storage Capacity: 1.333 Gibibyte-months $0.05 EDIT: TL;DR The amount of storage is only limited by your pocketbook. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Based upon what we estimate the amount of data we have, and guessing at the I/O (R/W) we generate, do you have a guess at a break even point of internal vs. external? Because I'd think it would create I/O every time a new result needs to be compared to all the other results, or am I not thinking of it correctly? I understand the storage now, but that I/O fee adds a new wrinkle to the calculation. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Based upon what we estimate the amount of data we have, and guessing at the I/O (R/W) we generate, do you have a guess at a break even point of internal vs. external? Because I'd think it would create I/O every time a new result needs to be compared to all the other results, or am I not thinking of it correctly? I understand the storage now, but that I/O fee adds a new wrinkle to the calculation. Amazon might have a different billing structure than Google. I haven't tested out their service myself. However it is probably a similar in some fashion. We do know that the ntpckr server was being bottlenecked by disk i/o & a server with high speed SSDs would be required. If we knew the disk bandwidth the server was using that could be used as a baseline to estimate cost. I'm not sure if ntpckr was using internal storage or connected to their Fibre Channel storage array. As far as costs we also have to keep in mind the SETI@home server equipment is housed in a campus colocation facility that has a fee per rack unit. If we built one we would probably be looking at at least a 3U server filled with SSDs. EDIT: Berkeley charges $14/mo per rack unit. So $504/yr for a 3U server. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Filipe Send message Joined: 12 Aug 00 Posts: 218 Credit: 21,281,677 RAC: 20 |
The record set by Samsung will only last so long. In this competition one with the largest, ranking changes regularly. Last year, a part of Toshiba anticipated that SSDs will reach 32 TB by the end of 2016, then 64 To the following year. In 2018, the devices even reach 128 TB. |
bluestar Send message Joined: 5 Sep 12 Posts: 7038 Credit: 2,084,789 RAC: 3 |
https://en.wikipedia.org/wiki/Bubble_memory Also this option could be possible. Should not be confused with SSD however. |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
I just did a little digging, and look what's now available! SAMSUNG SSD PM1633A 15360GB - Mfg. Part: MZILS15THMLS-00003 And here is the actual pricing: $10,311.99 Advertised Price Lease Option ($321.73 /month) Availability: 11-13 days Orders placed today will ship within 13 days So, I added 16 to the 'ol shopping cart (that should future proof us a little, I'd hope?), and it comes to the low low price of $164,991.84. Time to start the Spring Fund Drive, eh? ;-) Too bad about the 2 week wait on them, though, we need them NOW! lol I presume there would be some sort of specialized box with a ridiculously high speed/bandwidth back-plane to handle the I/O capabilities of an array of these things? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.