Moving on... (Apr 08 2013)


log in

Advanced search

Message boards : Technical News : Moving on... (Apr 08 2013)

1 · 2 · 3 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1391
Credit: 74,079
RAC: 10
United States
Message 1354823 - Posted: 8 Apr 2013, 22:10:38 UTC

So! We made the big move to the colocation facility without too much pain and anguish. In fact, thanks to some precise planning and preparation we were pretty much back on line a day earlier than expected.

Were there any problems during the move? Nothing too crazy. Some expected confusion about the network/DNS configuration. A lot of expected struggle due to the frustrating non-standards regarding rack rails. And one unexpected nuisance where the power strips mounted in the back of the rack were blocking the external sata ports on the jbod which holds georgem/paddym's disks. However if we moved the strip, it would block other ports on other servers. It was a bit of a puzzle, eventually solved.

It feels great knowing our servers are on real backup power for the first time ever, and on a functional kvm, and behind a more rigid firewall that we control ourselves. As well, we no longer have that 100Mbit hardware limit in our way, so we can use the full gigabit of Hurricane Electric bandwidth.

Jeff and I predicted based on previous demand that we'd see, once things settled down, a bandwidth usage average of 150Mbits/second (as long as both multibeam and astropulse workunits were available). And in fact this is what we're seeing, though we are still tuning some throttle mechanisms to make sure we don't go much higher than that.

Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that. Second, we eventually hope to move off Hurricane and get on the campus network (and wantonly grabbing all the bits we can for no clear scientific reason wouldn't be setting a good example that we are in control of our needs/traffic). Third, and perhaps most importantly, it seems that our result storage server can't handle much higher a load. Yes, that seems to be our big bottleneck at this point - the ability of that server to write results to disk much faster than current demand. We expected as much. We'll look into improving the disk i/o on that system soon. And we'll see how we fare after tomorrow's outage...

What's next? We still have a couple more servers to bring down, perhaps next week, like the BOINC/CASPER web servers, and Eric's GALFA machines. None of these will have any impact on SETI@home. Meanwhile there's lots of minor annoyances. Remember that a lot of our server issues stemmed from a crazy web of cross dependencies (mostly NFS). Well in advance we started to untangle that web to get these servers on different subnets, but you can imagine we missed some pieces, and the resulting fallout of a decade's worth of scripts scattered around in a decade's worth of random locations expecting a mount to exist and not getting it. Nothing remotely tragic, and we may very well be beyond all that at this point.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile SciManStevProject donor
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4905
Credit: 84,286,859
RAC: 27,052
United States
Message 1354826 - Posted: 8 Apr 2013, 22:27:51 UTC
Last modified: 8 Apr 2013, 22:28:22 UTC

Awesome Matt!
This move has been a huge breath of fresh air for everyone! Crunching SETI just got a lot easier, and a lot more maintainable on your end!

Well done!

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile Alex Storey
Volunteer tester
Avatar
Send message
Joined: 14 Jun 04
Posts: 568
Credit: 1,690,163
RAC: 363
Greece
Message 1354827 - Posted: 8 Apr 2013, 22:43:36 UTC

Matt, you sound... happy!:D

Congrats!! And like Steve said, awesome!

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5486
Credit: 316,109,584
RAC: 145,882
Brazil
Message 1354830 - Posted: 8 Apr 2013, 22:51:49 UTC

Good news to all, now DL/UL are fast and without error, we just need now a small increase in the GPU WU limit to be totaly happy.
____________

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 13168
Credit: 7,901,609
RAC: 14,171
United States
Message 1354832 - Posted: 8 Apr 2013, 23:00:43 UTC - in response to Message 1354823.

(and wantonly grabbing all the bits we can for no clear scientific reason

Speaking of that, how are we doing on raw data collection vs. the speed at which we can crunch the data? I know the data page hasn't been updated in a coon's age as to what is collected.

____________

Profile rebestProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 33,477,933
RAC: 14,736
United States
Message 1354834 - Posted: 8 Apr 2013, 23:01:33 UTC

Thanks, Matt.

I hope that this means that you, Eric and the guys will have time to actually be creative rather than fighting fires.
____________

Join the PACK!

ThomasProject donor
Volunteer tester
Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,505
RAC: 501
France
Message 1354925 - Posted: 9 Apr 2013, 6:38:10 UTC

Good Game Matt ! :)
Congrats to all the roster of SETI@home !
Everyone has felt the success of the migration and the whole world is really happy.
A new era for the project is open !
And what a relief for the team in terms of logistics...
Thanks for the heads-up and thanks again for this big move.
____________

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1391
Credit: 74,079
RAC: 10
United States
Message 1355085 - Posted: 9 Apr 2013, 21:07:45 UTC - in response to Message 1354931.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8601
Credit: 4,256,543
RAC: 1,326
United Kingdom
Message 1355097 - Posted: 9 Apr 2013, 21:44:20 UTC - in response to Message 1355085.

... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

Roll-on Real Time Processing :-)


(Just a few GPUs needed?...)

Next bit of research is how to sift through the huge database of results in parallel? ;-)


Happy fast crunchin',
Martin

____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 13168
Credit: 7,901,609
RAC: 14,171
United States
Message 1355100 - Posted: 9 Apr 2013, 21:57:10 UTC - in response to Message 1355085.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt

Thanks for the straight answer.

So just how much does it cost to put a receiver at another telescope (ballpark) so we can use more of our big fat pipe?

____________

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32626
Credit: 14,484,198
RAC: 13,248
United Kingdom
Message 1355102 - Posted: 9 Apr 2013, 22:13:25 UTC

Thanks for the update Matt :-)


____________
Damsel Rescuer, Uli Devotee, Julie Supporter, ES99 Admirer,
Raccoon Friend, Anniet fan, Shining Knight in Armour


Cheopis
Send message
Joined: 17 Sep 00
Posts: 140
Credit: 11,592,340
RAC: 943
United States
Message 1355112 - Posted: 9 Apr 2013, 23:00:46 UTC - in response to Message 1355085.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt


Any hints on where the team might be planning on going next? More sites for the same depth of analysis, or deeper analysis of the data that we already have a pipeline for? Or more background work shifted to the remote computers? RFID / NTPCKR/ splitting?

If the answer is that you guys aren't sure yet because you are still watching how things develop, that's fine too!

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4346
Credit: 1,123,600
RAC: 733
United States
Message 1355115 - Posted: 9 Apr 2013, 23:09:17 UTC - in response to Message 1355100.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt

Thanks for the straight answer.

So just how much does it cost to put a receiver at another telescope (ballpark) so we can use more of our big fat pipe?

Or how much would it cost to increase the bandwidth recorded at Arecibo? Dan Wertheimer noted that as a desirable change in a request for donations 2 or 3 years ago, and doubling the bandwidth doubles the amount of data per observation time.
Joe

Wolverine
Avatar
Send message
Joined: 9 Jan 00
Posts: 35
Credit: 7,361,717
RAC: 27
Canada
Message 1355127 - Posted: 9 Apr 2013, 23:48:39 UTC

Nice work! All that and I didn't feel a thing.

Time for another $$ drive to fix that bottleneck? What is needed to eliminate it?

Hardware?

Cost??

____________

HAL9000
Volunteer tester
Avatar
Send message
Joined: 25 Mar 13
Posts: 12
Credit: 259,905
RAC: 1
Canada
Message 1355147 - Posted: 10 Apr 2013, 1:05:51 UTC

w00t, great to hear all is well ;)
____________

10.7 billion ops/sec

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8809
Credit: 62,863,551
RAC: 74,660
United Kingdom
Message 1355204 - Posted: 10 Apr 2013, 5:55:39 UTC - in response to Message 1355127.

Nice work! All that and I didn't feel a thing.

Time for another $$ drive to fix that bottleneck? What is needed to eliminate it?

Hardware?

Cost??


GPUUG has a funding drive for replacement servers running take a look at this thread: http://setiathome.berkeley.edu/forum_thread.php?id=70511
Then make a donation.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Cheopis
Send message
Joined: 17 Sep 00
Posts: 140
Credit: 11,592,340
RAC: 943
United States
Message 1355296 - Posted: 10 Apr 2013, 13:13:48 UTC - in response to Message 1355204.
Last modified: 10 Apr 2013, 13:24:20 UTC

Nice work! All that and I didn't feel a thing.

Time for another $$ drive to fix that bottleneck? What is needed to eliminate it?

Hardware?

Cost??


GPUUG has a funding drive for replacement servers running take a look at this thread: http://setiathome.berkeley.edu/forum_thread.php?id=70511
Then make a donation.


Well, the two servers that are being collected for now are an upload and a download server. As it turns out, it seems that simply putting the servers into a better environment for upload and download processing seems to have drastically improved data upload/download handling. Sure, beefier upload/download servers could do more, but do we really need new servers for upload and download, or should we redirect funds towards a different goal? Maybe we really do need a new upload and download server, but after seeing what the old machines are doing now, perhaps the specifications for the new machines can be dropped significantly, while still leaving room for upgrades at a later time?

At the very least I hope a reconsideration of upload and download server needs is in the works before any new hardware for those roles is purchased for use in the new server facility.

Upload and download aren't ALL that the two new servers are slated to do, but it might be that a completely different server architecture might be appropriate for best performance on the other work, and leave the current upload and download servers doing what they have now proven they can do quite nicely :)

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 12627
Credit: 14,982,455
RAC: 9,136
United States
Message 1355299 - Posted: 10 Apr 2013, 13:43:03 UTC - in response to Message 1355085.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt

Hey, weren't we supposed to start seeing some data from somewhere else, Green Bank I think? Whatever happened with that?

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Filipe
Send message
Joined: 12 Aug 00
Posts: 111
Credit: 4,120,429
RAC: 235
Portugal
Message 1355323 - Posted: 10 Apr 2013, 15:26:10 UTC

Well, the two servers that are being collected for now are an upload and a download server. As it turns out, it seems that simply putting the servers into a better environment for upload and download processing seems to have drastically improved data upload/download handling. Sure, beefier upload/download servers could do more, but do we really need new servers for upload and download, or should we redirect funds towards a different goal? Maybe we really do need a new upload and download server, but after seeing what the old machines are doing now, perhaps the specifications for the new machines can be dropped significantly, while still leaving room for upgrades at a later time?

At the very least I hope a reconsideration of upload and download server needs is in the works before any new hardware for those roles is purchased for use in the new server facility.

Upload and download aren't ALL that the two new servers are slated to do, but it might be that a completely different server architecture might be appropriate for best performance on the other work, and leave the current upload and download servers doing what they have now proven they can do quite nicely :)


A "NTPCKR" dedicated server, is the way to go.

Theres is no point in having all the result seating unchecked on the database.



____________

1 · 2 · 3 · Next

Message boards : Technical News : Moving on... (Apr 08 2013)

Copyright © 2014 University of California