What does SETI need? BOINC?

Message boards : Number crunching : What does SETI need? BOINC?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1769928 - Posted: 6 Mar 2016, 1:39:53 UTC - in response to Message 1769862.  

..that Pricing Hasn't Been Announced.

Just saw one that thought it would go for upwards of $5,000.- which is nothing.



I'd say >$2,000 per Terabyte (using Fixstars 13TB SSD for comparison).

WD has some high capacity HDDs (6TB, 8TB, & 10TB) for much less $. These are Helium filled units which according to "Western Digital’s own figures put the power consumption ... at 23% less than its own conventional air-filled drives."



Say What? ~$30K per Drive? I don't know, but to me, if that is true, that's insane...

I should have been clearer. The 13TB=$1,000/TB (in 'normal' mode), then further into the article:

"it also features a proprietary “high durability” mode. With this mode enabled, the durability and longevity of the drive is promised to be tripled but in return, the capacity is cut in half. Oh dear, “only” 6.5TB remain on your $13,000 dollar, 13TB SSD—for those of you that aren’t keeping count, that is $2,000 per Terabyte."

Setup up RAID10 of 16 and then we might start having enough drive i/o.
Hopefully the cloud solution proves viable. So they don't have to own/maintain equipment like that. Which makes me wonder if making ntpckr a BOINC project would be viable.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1769928 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1770009 - Posted: 6 Mar 2016, 11:05:36 UTC - in response to Message 1769928.  

Wow, so we'd need 16 of these, just to start? Would that number be expected to grow down the road, or is the amount of data processed expected to be around the same amount, it just flows in and out? Not really up on what that server does, is there a link to some discussion about it?

ID: 1770009 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1770011 - Posted: 6 Mar 2016, 11:34:12 UTC - in response to Message 1770009.  

The whole idea is that the science database holds the canonical result of every workunit computed since the public launch of the project in 1999. That's two billion records since the switch to the BOINC platform in 2005, with up to 30 signals (of potential interest) stored for each workunit - and growing by about a million workunits a day. It grows faster and faster, as people here throw bigger and bigger GPUs into the mix.

Pick any two needles out of that haystack, and see if they match. And do it again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, ...
ID: 1770011 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1770015 - Posted: 6 Mar 2016, 11:58:29 UTC - in response to Message 1770011.  

Oooh. I see. Hmm, this appears to be depressing news, unless another way is thought of to do this, because of the exponential growth of the data, sooner or later we'll have more data than can be stored... On Earth! lol

Seriously though, is there a longer term plan, because it doesn't seem sustainable the way it is done right now. It appears that the hard bit is having to compare new results with all the ones that came before, and the need to store them all in one place so they can be compared quickly and efficiently? I assume people a lot smarter than I are working on and puzzling over this one, I hope they come up with a solution.

Any thoughts as to what type of housing/backplane would be used to support that many drives with that high of data throughput?

ID: 1770015 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1770057 - Posted: 6 Mar 2016, 17:18:07 UTC - in response to Message 1770011.  
Last modified: 6 Mar 2016, 17:27:21 UTC

The whole idea is that the science database holds the canonical result of every workunit computed since the public launch of the project in 1999. That's two billion records since the switch to the BOINC platform in 2005, with up to 30 signals (of potential interest) stored for each workunit - and growing by about a million workunits a day. It grows faster and faster, as people here throw bigger and bigger GPUs into the mix.

Pick any two needles out of that haystack, and see if they match. And do it again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, ...

I'm thinking it was some of the youtube videos of Jeff or Matt talking about how quickly the size grows, but I've slept since then.
I want to say the db at that time was around 90TB or so... maybe it was 190TB.

Edit: A quick search turned up a post Matt made last year under How could we improve things? "2. More and faster storage. If we could get, like, a few hundred usuable TB of archival (i.e. not necessarily fast) storage and, say, 50-100 TB of usuable SSD storage"

I don't know the status of any of those items on the list, but they asked for any specific stuff in a while.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1770057 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30669
Credit: 53,134,872
RAC: 32
United States
Message 1770058 - Posted: 6 Mar 2016, 17:26:33 UTC - in response to Message 1770011.  

Pick any two needles out of that haystack, and see if they match. And do it again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, and again, ...
Well, not quite. The sky has been divided into pixels. So you have the most recent result file in hand, go to the pixel and pick the needles (results) already in the pixel and generate a new score for the pixel. Now is that the highest score of all the pixels? Oh, can't wait around there is another most recent result file.

This is the I/O block. You have to have a single master file because it must be current up to the nanosecond. Unfortunately this means the problem can't be shipped out via a BOINC project to a bunch of computers because the bunch of computers can't have nanosecond access to a single master file.

Another way to think of the data problem is to look at the pixels like seats on a plane and you are trying to make a reservation. If there are two people who want the same seat at the same time the airline's computer reservation system has to decide who gets it. You can't have more than one copy of the seat list or two different people could be assigned the same seat. [Of course airlines intentionally overbook, but that is a different issue.]
ID: 1770058 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1770110 - Posted: 6 Mar 2016, 22:09:56 UTC

Thanks for the explanation. I am curious, so I did a little more searching around for pricing info, no one yet has solid pricing, but here is an interesting guess from this article -

http://www.datacenterdynamics.com/servers-storage/samsung-sets-the-pace-with-a-monster-15tb-solid-state-drive/95805.fullarticle

The South Korean company has released plenty of specifications, except the price. The drives will be sold through Samsung resellers so prices will filter through, but the channel is not expecting tags below £15,000.


Which at todays exchange rate: 15000 British pounds = 21327.7500 US dollars

Talk about a fund drive! Lets say we got 10, to have 1-2 as hot spares and whatever is needed for room for parity, plus having a little bit of extra room to grow, that would be about $200k. And that doesn't include whatever enclosure is needed to house these bad boys, talk about a high speed backplane.. Imagine how nice things would run with those babys in place. One can dream, eh? :)

ID: 1770110 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1770124 - Posted: 6 Mar 2016, 22:52:52 UTC - in response to Message 1770110.  

Thanks for the explanation. I am curious, so I did a little more searching around for pricing info, no one yet has solid pricing, but here is an interesting guess from this article -

http://www.datacenterdynamics.com/servers-storage/samsung-sets-the-pace-with-a-monster-15tb-solid-state-drive/95805.fullarticle

The South Korean company has released plenty of specifications, except the price. The drives will be sold through Samsung resellers so prices will filter through, but the channel is not expecting tags below £15,000.


Which at todays exchange rate: 15000 British pounds = 21327.7500 US dollars

Talk about a fund drive! Lets say we got 10, to have 1-2 as hot spares and whatever is needed for room for parity, plus having a little bit of extra room to grow, that would be about $200k. And that doesn't include whatever enclosure is needed to house these bad boys, talk about a high speed backplane.. Imagine how nice things would run with those babys in place. One can dream, eh? :)

If pricing is expected to be £15,000 GBP I would imagine it would be closer to $17,000 USD. Probably about $17,500 as good that some of South Korea tend to be about 17% more than the same items in the UK without VAT. Once you figure in VAT the amounts are nearly the same or a bit less in the USD. I believe UK VAT is about 17.5% at the moment.

Only $200k? It's not like that is about half of SETI@homes annual budget. This is probably why they are looking to a cloud solution. Given there is no cost to maintain the hardware & they only have to pay for the computing the need.

I'm playing with a Google Cloud Compute instance. As they give new users 60 days and $300 in credit to do whatever they like. The 8 CPU instance costs about $7.30/day averaging 99.982% load. If that were scaled to a 64 CPU instance that comes to $58.40/day. Things get cheaper as you scale so it would actually be less than that. However at $58.40/day that is $21,316/year, <$1800/mo, with no upfront costs. I don't know the Amazon Cloud Compute rates for the instance SETI@home is using to experiment. I imagine they are getting a better rates since they went with their service. Education rates are often cheaper too.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1770124 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1770130 - Posted: 6 Mar 2016, 23:47:42 UTC - in response to Message 1770124.  

Interesting info, but I'm slightly confused to what you are referencing, as I'm not a huge cloud fan personally, so am probably not up on the exact termiology. You say Google Cloud Compute instance, and then talk about 8 or 64 CPU instance, which to my way of thinking indicates processing (CPU) power, where I thought the discussion we were having was regarding large, fast online storage capacity. Could you please explain it out a bit for those of us who aren't up to speed on things Cloud?

ID: 1770130 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1770141 - Posted: 7 Mar 2016, 0:03:39 UTC - in response to Message 1770130.  
Last modified: 7 Mar 2016, 0:05:20 UTC

Interesting info, but I'm slightly confused to what you are referencing, as I'm not a huge cloud fan personally, so am probably not up on the exact termiology. You say Google Cloud Compute instance, and then talk about 8 or 64 CPU instance, which to my way of thinking indicates processing (CPU) power, where I thought the discussion we were having was regarding large, fast online storage capacity. Could you please explain it out a bit for those of us who aren't up to speed on things Cloud?

It's been mentioned a few times but currently SETI@home is developing ntpckr in a cloud environment. Which would make purchasing/building a ntpckr server with super fast storage unnecessary.

Edit: The disk i/o from a cloud storage farm is mind boggling.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1770141 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1770145 - Posted: 7 Mar 2016, 0:10:41 UTC - in response to Message 1770141.  

So, is it unlimited storage per CPU instance, as we were talking about GB's not CPU's initially? Not trying to be thick here, just trying to understand how it all works together.

ID: 1770145 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1770152 - Posted: 7 Mar 2016, 0:40:07 UTC - in response to Message 1770145.  
Last modified: 7 Mar 2016, 0:43:55 UTC

So, is it unlimited storage per CPU instance, as we were talking about GB's not CPU's initially? Not trying to be thick here, just trying to understand how it all works together.

Generally you configure however much storage you need. I configured mine with 10GB as that was the minimum & more than I needed. Google does limit me to 65,536 GB per volume, but I do not have a limit on the number of volumes I can create. If I needed 200TB I could create 4 50TB volumes & then use the OS to set them up in RAID0. In this case RAID0 is unlikely to be an issue for data loss. As the volumes are virtual devices coming from a SAN. However configuring RAID10 using volumes from different data centers could be used for absolute redundancy.

It looks like the way Google is doing pricing for storage is based on the amount of data r/w from the storage. Rather than a flat fee for the size. Much like network traffic.
So far for March my disk usage is
Compute Engine Storage Capacity: 1.333 Gibibyte-months $0.05

EDIT: TL;DR The amount of storage is only limited by your pocketbook.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1770152 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1770162 - Posted: 7 Mar 2016, 1:28:48 UTC - in response to Message 1770152.  
Last modified: 7 Mar 2016, 1:29:21 UTC

Based upon what we estimate the amount of data we have, and guessing at the I/O (R/W) we generate, do you have a guess at a break even point of internal vs. external? Because I'd think it would create I/O every time a new result needs to be compared to all the other results, or am I not thinking of it correctly? I understand the storage now, but that I/O fee adds a new wrinkle to the calculation.

ID: 1770162 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1770174 - Posted: 7 Mar 2016, 2:25:27 UTC - in response to Message 1770162.  
Last modified: 7 Mar 2016, 2:59:56 UTC

Based upon what we estimate the amount of data we have, and guessing at the I/O (R/W) we generate, do you have a guess at a break even point of internal vs. external? Because I'd think it would create I/O every time a new result needs to be compared to all the other results, or am I not thinking of it correctly? I understand the storage now, but that I/O fee adds a new wrinkle to the calculation.

Amazon might have a different billing structure than Google. I haven't tested out their service myself. However it is probably a similar in some fashion.

We do know that the ntpckr server was being bottlenecked by disk i/o & a server with high speed SSDs would be required. If we knew the disk bandwidth the server was using that could be used as a baseline to estimate cost. I'm not sure if ntpckr was using internal storage or connected to their Fibre Channel storage array.

As far as costs we also have to keep in mind the SETI@home server equipment is housed in a campus colocation facility that has a fee per rack unit. If we built one we would probably be looking at at least a 3U server filled with SSDs.

EDIT: Berkeley charges $14/mo per rack unit. So $504/yr for a 3U server.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1770174 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1770229 - Posted: 7 Mar 2016, 10:58:42 UTC

The record set by Samsung will only last so long. In this competition one with the largest, ranking changes regularly. Last year, a part of Toshiba anticipated that SSDs will reach 32 TB by the end of 2016, then 64 To the following year. In 2018, the devices even reach 128 TB.

ID: 1770229 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 7032
Credit: 2,084,789
RAC: 3
Message 1770254 - Posted: 7 Mar 2016, 14:57:12 UTC

https://en.wikipedia.org/wiki/Bubble_memory

Also this option could be possible.

Should not be confused with SSD however.
ID: 1770254 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1787637 - Posted: 15 May 2016, 4:19:37 UTC

I just did a little digging, and look what's now available!
SAMSUNG SSD PM1633A 15360GB - Mfg. Part: MZILS15THMLS-00003

And here is the actual pricing:
$10,311.99 Advertised Price
Lease Option ($321.73 /month)
Availability: 11-13 days
Orders placed today will ship within 13 days

So, I added 16 to the 'ol shopping cart (that should future proof us a little, I'd hope?), and it comes to the low low price of $164,991.84. Time to start the Spring Fund Drive, eh? ;-) Too bad about the 2 week wait on them, though, we need them NOW! lol I presume there would be some sort of specialized box with a ridiculously high speed/bandwidth back-plane to handle the I/O capabilities of an array of these things?

ID: 1787637 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : What does SETI need? BOINC?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.