Moving on... (Apr 08 2013)


log in

Advanced search

Message boards : Technical News : Moving on... (Apr 08 2013)

1 · 2 · 3 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1384
Credit: 74,079
RAC: 0
United States
Message 1354823 - Posted: 8 Apr 2013, 22:10:38 UTC

So! We made the big move to the colocation facility without too much pain and anguish. In fact, thanks to some precise planning and preparation we were pretty much back on line a day earlier than expected.

Were there any problems during the move? Nothing too crazy. Some expected confusion about the network/DNS configuration. A lot of expected struggle due to the frustrating non-standards regarding rack rails. And one unexpected nuisance where the power strips mounted in the back of the rack were blocking the external sata ports on the jbod which holds georgem/paddym's disks. However if we moved the strip, it would block other ports on other servers. It was a bit of a puzzle, eventually solved.

It feels great knowing our servers are on real backup power for the first time ever, and on a functional kvm, and behind a more rigid firewall that we control ourselves. As well, we no longer have that 100Mbit hardware limit in our way, so we can use the full gigabit of Hurricane Electric bandwidth.

Jeff and I predicted based on previous demand that we'd see, once things settled down, a bandwidth usage average of 150Mbits/second (as long as both multibeam and astropulse workunits were available). And in fact this is what we're seeing, though we are still tuning some throttle mechanisms to make sure we don't go much higher than that.

Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that. Second, we eventually hope to move off Hurricane and get on the campus network (and wantonly grabbing all the bits we can for no clear scientific reason wouldn't be setting a good example that we are in control of our needs/traffic). Third, and perhaps most importantly, it seems that our result storage server can't handle much higher a load. Yes, that seems to be our big bottleneck at this point - the ability of that server to write results to disk much faster than current demand. We expected as much. We'll look into improving the disk i/o on that system soon. And we'll see how we fare after tomorrow's outage...

What's next? We still have a couple more servers to bring down, perhaps next week, like the BOINC/CASPER web servers, and Eric's GALFA machines. None of these will have any impact on SETI@home. Meanwhile there's lots of minor annoyances. Remember that a lot of our server issues stemmed from a crazy web of cross dependencies (mostly NFS). Well in advance we started to untangle that web to get these servers on different subnets, but you can imagine we missed some pieces, and the resulting fallout of a decade's worth of scripts scattered around in a decade's worth of random locations expecting a mount to exist and not getting it. Nothing remotely tragic, and we may very well be beyond all that at this point.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile SciManStev
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4670
Credit: 77,308,431
RAC: 42,570
United States
Message 1354826 - Posted: 8 Apr 2013, 22:27:51 UTC
Last modified: 8 Apr 2013, 22:28:22 UTC

Awesome Matt!
This move has been a huge breath of fresh air for everyone! Crunching SETI just got a lot easier, and a lot more maintainable on your end!

Well done!

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile Alex Storey
Volunteer tester
Avatar
Send message
Joined: 14 Jun 04
Posts: 533
Credit: 1,576,508
RAC: 517
Greece
Message 1354827 - Posted: 8 Apr 2013, 22:43:36 UTC

Matt, you sound... happy!:D

Congrats!! And like Steve said, awesome!

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4619
Credit: 233,840,386
RAC: 351,163
Brazil
Message 1354830 - Posted: 8 Apr 2013, 22:51:49 UTC

Good news to all, now DL/UL are fast and without error, we just need now a small increase in the GPU WU limit to be totaly happy.
____________

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 11732
Credit: 5,969,877
RAC: 0
United States
Message 1354832 - Posted: 8 Apr 2013, 23:00:43 UTC - in response to Message 1354823.

(and wantonly grabbing all the bits we can for no clear scientific reason

Speaking of that, how are we doing on raw data collection vs. the speed at which we can crunch the data? I know the data page hasn't been updated in a coon's age as to what is collected.

____________

Profile rebest
Volunteer tester
Avatar
Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 30,923,449
RAC: 13,500
United States
Message 1354834 - Posted: 8 Apr 2013, 23:01:33 UTC

Thanks, Matt.

I hope that this means that you, Eric and the guys will have time to actually be creative rather than fighting fires.
____________

Join the PACK!

Profile {BDC} Thomas Dupont
Volunteer tester
Avatar
Send message
Joined: 9 Dec 11
Posts: 2604
Credit: 1,178,379
RAC: 1,140
France
Message 1354925 - Posted: 9 Apr 2013, 6:38:10 UTC

Good Game Matt ! :)
Congrats to all the roster of SETI@home !
Everyone has felt the success of the migration and the whole world is really happy.
A new era for the project is open !
And what a relief for the team in terms of logistics...
Thanks for the heads-up and thanks again for this big move.
____________
Team Founder BRIGADE DU COSMOS




BRIGADE DU COSMOS is proudly sponsored by Zenovia Digital Exchange

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37412
Credit: 500,418,920
RAC: 533,059
United States
Message 1354931 - Posted: 9 Apr 2013, 7:15:28 UTC

Matt...
Thank you for both your news post and the dedication to the project you have shown during the transition.

I should wish one bit of explanation though, regarding this statement.....

"Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that."

It would appear that the ability of the splitters to keep up with demand and the improved distribution of work with the increased bandwidth has been doing rather well since coming back up.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?

This is an important question for some of us devoted souls.
____________
******************
Seti whacko, resident evil, and town clown...

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1384
Credit: 74,079
RAC: 0
United States
Message 1355085 - Posted: 9 Apr 2013, 21:07:45 UTC - in response to Message 1354931.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 7945
Credit: 4,011,543
RAC: 862
United Kingdom
Message 1355097 - Posted: 9 Apr 2013, 21:44:20 UTC - in response to Message 1355085.

... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

Roll-on Real Time Processing :-)


(Just a few GPUs needed?...)

Next bit of research is how to sift through the huge database of results in parallel? ;-)


Happy fast crunchin',
Martin

____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 11732
Credit: 5,969,877
RAC: 0
United States
Message 1355100 - Posted: 9 Apr 2013, 21:57:10 UTC - in response to Message 1355085.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt

Thanks for the straight answer.

So just how much does it cost to put a receiver at another telescope (ballpark) so we can use more of our big fat pipe?

____________

Profile Chris S
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 29560
Credit: 8,988,698
RAC: 27,601
United Kingdom
Message 1355102 - Posted: 9 Apr 2013, 22:13:25 UTC

Thanks for the update Matt :-)


____________
Damsel Rescuer, Kitty Patron, Uli Fan, Julie Supporter, CAMRA
ES99 Admirer, Raccoon Friend, IFAW, PETA, 5% Badge


Cheopis
Send message
Joined: 17 Sep 00
Posts: 139
Credit: 9,601,819
RAC: 9,054
United States
Message 1355112 - Posted: 9 Apr 2013, 23:00:46 UTC - in response to Message 1355085.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt


Any hints on where the team might be planning on going next? More sites for the same depth of analysis, or deeper analysis of the data that we already have a pipeline for? Or more background work shifted to the remote computers? RFID / NTPCKR/ splitting?

If the answer is that you guys aren't sure yet because you are still watching how things develop, that's fine too!

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4137
Credit: 1,004,349
RAC: 238
United States
Message 1355115 - Posted: 9 Apr 2013, 23:09:17 UTC - in response to Message 1355100.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt

Thanks for the straight answer.

So just how much does it cost to put a receiver at another telescope (ballpark) so we can use more of our big fat pipe?

Or how much would it cost to increase the bandwidth recorded at Arecibo? Dan Wertheimer noted that as a desirable change in a request for donations 2 or 3 years ago, and doubling the bandwidth doubles the amount of data per observation time.
Joe

Wolverine
Avatar
Send message
Joined: 9 Jan 00
Posts: 35
Credit: 7,195,165
RAC: 1,549
Canada
Message 1355127 - Posted: 9 Apr 2013, 23:48:39 UTC

Nice work! All that and I didn't feel a thing.

Time for another $$ drive to fix that bottleneck? What is needed to eliminate it?

Hardware?

Cost??

____________

HAL9000
Volunteer tester
Avatar
Send message
Joined: 25 Mar 13
Posts: 12
Credit: 255,288
RAC: 1
Canada
Message 1355147 - Posted: 10 Apr 2013, 1:05:51 UTC

w00t, great to hear all is well ;)
____________

10.7 billion ops/sec

rob smith
Volunteer moderator
Send message
Joined: 7 Mar 03
Posts: 7680
Credit: 44,948,057
RAC: 76,538
United Kingdom
Message 1355204 - Posted: 10 Apr 2013, 5:55:39 UTC - in response to Message 1355127.

Nice work! All that and I didn't feel a thing.

Time for another $$ drive to fix that bottleneck? What is needed to eliminate it?

Hardware?

Cost??


GPUUG has a funding drive for replacement servers running take a look at this thread: http://setiathome.berkeley.edu/forum_thread.php?id=70511
Then make a donation.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Cheopis
Send message
Joined: 17 Sep 00
Posts: 139
Credit: 9,601,819
RAC: 9,054
United States
Message 1355296 - Posted: 10 Apr 2013, 13:13:48 UTC - in response to Message 1355204.
Last modified: 10 Apr 2013, 13:24:20 UTC

Nice work! All that and I didn't feel a thing.

Time for another $$ drive to fix that bottleneck? What is needed to eliminate it?

Hardware?

Cost??


GPUUG has a funding drive for replacement servers running take a look at this thread: http://setiathome.berkeley.edu/forum_thread.php?id=70511
Then make a donation.


Well, the two servers that are being collected for now are an upload and a download server. As it turns out, it seems that simply putting the servers into a better environment for upload and download processing seems to have drastically improved data upload/download handling. Sure, beefier upload/download servers could do more, but do we really need new servers for upload and download, or should we redirect funds towards a different goal? Maybe we really do need a new upload and download server, but after seeing what the old machines are doing now, perhaps the specifications for the new machines can be dropped significantly, while still leaving room for upgrades at a later time?

At the very least I hope a reconsideration of upload and download server needs is in the works before any new hardware for those roles is purchased for use in the new server facility.

Upload and download aren't ALL that the two new servers are slated to do, but it might be that a completely different server architecture might be appropriate for best performance on the other work, and leave the current upload and download servers doing what they have now proven they can do quite nicely :)

N9JFE
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 9316
Credit: 11,936,339
RAC: 13,172
United States
Message 1355299 - Posted: 10 Apr 2013, 13:43:03 UTC - in response to Message 1355085.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt

Hey, weren't we supposed to start seeing some data from somewhere else, Green Bank I think? Whatever happened with that?

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Filipe
Send message
Joined: 12 Aug 00
Posts: 109
Credit: 3,323,372
RAC: 5,040
Portugal
Message 1355323 - Posted: 10 Apr 2013, 15:26:10 UTC

Well, the two servers that are being collected for now are an upload and a download server. As it turns out, it seems that simply putting the servers into a better environment for upload and download processing seems to have drastically improved data upload/download handling. Sure, beefier upload/download servers could do more, but do we really need new servers for upload and download, or should we redirect funds towards a different goal? Maybe we really do need a new upload and download server, but after seeing what the old machines are doing now, perhaps the specifications for the new machines can be dropped significantly, while still leaving room for upgrades at a later time?

At the very least I hope a reconsideration of upload and download server needs is in the works before any new hardware for those roles is purchased for use in the new server facility.

Upload and download aren't ALL that the two new servers are slated to do, but it might be that a completely different server architecture might be appropriate for best performance on the other work, and leave the current upload and download servers doing what they have now proven they can do quite nicely :)


A "NTPCKR" dedicated server, is the way to go.

Theres is no point in having all the result seating unchecked on the database.



____________

1 · 2 · 3 · Next

Message boards : Technical News : Moving on... (Apr 08 2013)

Copyright © 2014 University of California