Moving on... (Apr 08 2013)

Message boards : Technical News : Moving on... (Apr 08 2013)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1354823 - Posted: 8 Apr 2013, 22:10:38 UTC

So! We made the big move to the colocation facility without too much pain and anguish. In fact, thanks to some precise planning and preparation we were pretty much back on line a day earlier than expected.

Were there any problems during the move? Nothing too crazy. Some expected confusion about the network/DNS configuration. A lot of expected struggle due to the frustrating non-standards regarding rack rails. And one unexpected nuisance where the power strips mounted in the back of the rack were blocking the external sata ports on the jbod which holds georgem/paddym's disks. However if we moved the strip, it would block other ports on other servers. It was a bit of a puzzle, eventually solved.

It feels great knowing our servers are on real backup power for the first time ever, and on a functional kvm, and behind a more rigid firewall that we control ourselves. As well, we no longer have that 100Mbit hardware limit in our way, so we can use the full gigabit of Hurricane Electric bandwidth.

Jeff and I predicted based on previous demand that we'd see, once things settled down, a bandwidth usage average of 150Mbits/second (as long as both multibeam and astropulse workunits were available). And in fact this is what we're seeing, though we are still tuning some throttle mechanisms to make sure we don't go much higher than that.

Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that. Second, we eventually hope to move off Hurricane and get on the campus network (and wantonly grabbing all the bits we can for no clear scientific reason wouldn't be setting a good example that we are in control of our needs/traffic). Third, and perhaps most importantly, it seems that our result storage server can't handle much higher a load. Yes, that seems to be our big bottleneck at this point - the ability of that server to write results to disk much faster than current demand. We expected as much. We'll look into improving the disk i/o on that system soon. And we'll see how we fare after tomorrow's outage...

What's next? We still have a couple more servers to bring down, perhaps next week, like the BOINC/CASPER web servers, and Eric's GALFA machines. None of these will have any impact on SETI@home. Meanwhile there's lots of minor annoyances. Remember that a lot of our server issues stemmed from a crazy web of cross dependencies (mostly NFS). Well in advance we started to untangle that web to get these servers on different subnets, but you can imagine we missed some pieces, and the resulting fallout of a decade's worth of scripts scattered around in a decade's worth of random locations expecting a mount to exist and not getting it. Nothing remotely tragic, and we may very well be beyond all that at this point.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1354823 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6657
Credit: 121,090,076
RAC: 0
United States
Message 1354826 - Posted: 8 Apr 2013, 22:27:51 UTC
Last modified: 8 Apr 2013, 22:28:22 UTC

Awesome Matt!
This move has been a huge breath of fresh air for everyone! Crunching SETI just got a lot easier, and a lot more maintainable on your end!

Well done!

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1354826 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1354827 - Posted: 8 Apr 2013, 22:43:36 UTC

Matt, you sound... happy!:D

Congrats!! And like Steve said, awesome!
ID: 1354827 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1354830 - Posted: 8 Apr 2013, 22:51:49 UTC

Good news to all, now DL/UL are fast and without error, we just need now a small increase in the GPU WU limit to be totaly happy.
ID: 1354830 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30936
Credit: 53,134,872
RAC: 32
United States
Message 1354832 - Posted: 8 Apr 2013, 23:00:43 UTC - in response to Message 1354823.  

(and wantonly grabbing all the bits we can for no clear scientific reason

Speaking of that, how are we doing on raw data collection vs. the speed at which we can crunch the data? I know the data page hasn't been updated in a coon's age as to what is collected.

ID: 1354832 · Report as offensive
Profile rebest Project Donor
Volunteer tester
Avatar

Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 45,357,093
RAC: 0
United States
Message 1354834 - Posted: 8 Apr 2013, 23:01:33 UTC

Thanks, Matt.

I hope that this means that you, Eric and the guys will have time to actually be creative rather than fighting fires.

Join the PACK!
ID: 1354834 · Report as offensive
Thomas
Volunteer tester

Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,576
RAC: 0
France
Message 1354925 - Posted: 9 Apr 2013, 6:38:10 UTC

Good Game Matt ! :)
Congrats to all the roster of SETI@home !
Everyone has felt the success of the migration and the whole world is really happy.
A new era for the project is open !
And what a relief for the team in terms of logistics...
Thanks for the heads-up and thanks again for this big move.
ID: 1354925 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1354931 - Posted: 9 Apr 2013, 7:15:28 UTC

Matt...
Thank you for both your news post and the dedication to the project you have shown during the transition.

I should wish one bit of explanation though, regarding this statement.....

"Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that."

It would appear that the ability of the splitters to keep up with demand and the improved distribution of work with the increased bandwidth has been doing rather well since coming back up.

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?

This is an important question for some of us devoted souls.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1354931 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1355085 - Posted: 9 Apr 2013, 21:07:45 UTC - in response to Message 1354931.  

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1355085 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21019
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1355097 - Posted: 9 Apr 2013, 21:44:20 UTC - in response to Message 1355085.  

... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

Roll-on Real Time Processing :-)


(Just a few GPUs needed?...)

Next bit of research is how to sift through the huge database of results in parallel? ;-)


Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1355097 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30936
Credit: 53,134,872
RAC: 32
United States
Message 1355100 - Posted: 9 Apr 2013, 21:57:10 UTC - in response to Message 1355085.  

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt

Thanks for the straight answer.

So just how much does it cost to put a receiver at another telescope (ballpark) so we can use more of our big fat pipe?

ID: 1355100 · Report as offensive
Cheopis

Send message
Joined: 17 Sep 00
Posts: 156
Credit: 18,451,329
RAC: 0
United States
Message 1355112 - Posted: 9 Apr 2013, 23:00:46 UTC - in response to Message 1355085.  

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt


Any hints on where the team might be planning on going next? More sites for the same depth of analysis, or deeper analysis of the data that we already have a pipeline for? Or more background work shifted to the remote computers? RFID / NTPCKR/ splitting?

If the answer is that you guys aren't sure yet because you are still watching how things develop, that's fine too!
ID: 1355112 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1355115 - Posted: 9 Apr 2013, 23:09:17 UTC - in response to Message 1355100.  

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt

Thanks for the straight answer.

So just how much does it cost to put a receiver at another telescope (ballpark) so we can use more of our big fat pipe?

Or how much would it cost to increase the bandwidth recorded at Arecibo? Dan Wertheimer noted that as a desirable change in a request for donations 2 or 3 years ago, and doubling the bandwidth doubles the amount of data per observation time.
                                                                  Joe
ID: 1355115 · Report as offensive
Wolverine
Avatar

Send message
Joined: 9 Jan 00
Posts: 35
Credit: 7,361,717
RAC: 0
Canada
Message 1355127 - Posted: 9 Apr 2013, 23:48:39 UTC

Nice work! All that and I didn't feel a thing.

Time for another $$ drive to fix that bottleneck? What is needed to eliminate it?

Hardware?

Cost??

ID: 1355127 · Report as offensive
HAL9000
Volunteer tester
Avatar

Send message
Joined: 25 Mar 13
Posts: 12
Credit: 259,905
RAC: 0
Canada
Message 1355147 - Posted: 10 Apr 2013, 1:05:51 UTC

w00t, great to hear all is well ;)

10.7 billion ops/sec
ID: 1355147 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22460
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1355204 - Posted: 10 Apr 2013, 5:55:39 UTC - in response to Message 1355127.  

Nice work! All that and I didn't feel a thing.

Time for another $$ drive to fix that bottleneck? What is needed to eliminate it?

Hardware?

Cost??


GPUUG has a funding drive for replacement servers running take a look at this thread: http://setiathome.berkeley.edu/forum_thread.php?id=70511
Then make a donation.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1355204 · Report as offensive
Cheopis

Send message
Joined: 17 Sep 00
Posts: 156
Credit: 18,451,329
RAC: 0
United States
Message 1355296 - Posted: 10 Apr 2013, 13:13:48 UTC - in response to Message 1355204.  
Last modified: 10 Apr 2013, 13:24:20 UTC

Nice work! All that and I didn't feel a thing.

Time for another $$ drive to fix that bottleneck? What is needed to eliminate it?

Hardware?

Cost??


GPUUG has a funding drive for replacement servers running take a look at this thread: http://setiathome.berkeley.edu/forum_thread.php?id=70511
Then make a donation.


Well, the two servers that are being collected for now are an upload and a download server. As it turns out, it seems that simply putting the servers into a better environment for upload and download processing seems to have drastically improved data upload/download handling. Sure, beefier upload/download servers could do more, but do we really need new servers for upload and download, or should we redirect funds towards a different goal? Maybe we really do need a new upload and download server, but after seeing what the old machines are doing now, perhaps the specifications for the new machines can be dropped significantly, while still leaving room for upgrades at a later time?

At the very least I hope a reconsideration of upload and download server needs is in the works before any new hardware for those roles is purchased for use in the new server facility.

Upload and download aren't ALL that the two new servers are slated to do, but it might be that a completely different server architecture might be appropriate for best performance on the other work, and leave the current upload and download servers doing what they have now proven they can do quite nicely :)
ID: 1355296 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1355299 - Posted: 10 Apr 2013, 13:43:03 UTC - in response to Message 1355085.  

Are you saying that you are not able to acquire enough data from Arecibo to continue to distribute work at the improved rates?
And if so, would more drives for data shuttle service help?
Or is it a limit on the rate that you are able to record data or the time allowed to do so?


Right... our observation time is the bottleneck in this case. When we are able to use the telescope we can record at the rates we desire, and we are easily able to shuttle all the data back to UCB on the drives we have.

- Matt

Hey, weren't we supposed to start seeing some data from somewhere else, Green Bank I think? Whatever happened with that?

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1355299 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1355323 - Posted: 10 Apr 2013, 15:26:10 UTC

Well, the two servers that are being collected for now are an upload and a download server. As it turns out, it seems that simply putting the servers into a better environment for upload and download processing seems to have drastically improved data upload/download handling. Sure, beefier upload/download servers could do more, but do we really need new servers for upload and download, or should we redirect funds towards a different goal? Maybe we really do need a new upload and download server, but after seeing what the old machines are doing now, perhaps the specifications for the new machines can be dropped significantly, while still leaving room for upgrades at a later time?

At the very least I hope a reconsideration of upload and download server needs is in the works before any new hardware for those roles is purchased for use in the new server facility.

Upload and download aren't ALL that the two new servers are slated to do, but it might be that a completely different server architecture might be appropriate for best performance on the other work, and leave the current upload and download servers doing what they have now proven they can do quite nicely :)


A "NTPCKR" dedicated server, is the way to go.

Theres is no point in having all the result seating unchecked on the database.



ID: 1355323 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30936
Credit: 53,134,872
RAC: 32
United States
Message 1355389 - Posted: 10 Apr 2013, 17:34:32 UTC - in response to Message 1355323.  

A "NTPCKR" dedicated server, is the way to go.

Theres is no point in having all the result seating unchecked on the database

They aren't
http://setiathome.berkeley.edu/ntpckr.php
It just isn't ready for prime time ...

ID: 1355389 · Report as offensive
1 · 2 · 3 · Next

Message boards : Technical News : Moving on... (Apr 08 2013)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.