Hurricane (Feb 21 2007)

Message boards : Technical News : Hurricane (Feb 21 2007)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 521306 - Posted: 22 Feb 2007, 0:38:10 UTC
Last modified: 22 Feb 2007, 0:38:27 UTC

Major success today: The final big step of our network upgrade was completed this morning. I've been purposefully vague about the details of what we're doing because it involves many parties and contractual agreements. We'll have a formal writeup at some point, but the basic gist of it is: we're moving away from using Cogent as our ISP.

Some brief history: We used to send all our traffic over through campus until our one data server accounted for 33% of the entire university's outgoing bandwidth. With the advent of broadband (and undergraduate/staff addiction to file sharing) the ethernet pipes were clogged so we were forced to buy our own plumbing. Cogent became our ISP, and we got a dedicated 100 Mbit link for what was a good deal at the time (circa 2002).

Time passed, and with inflation this deal became less and less affordable. Eventually we had to start looking elsewhere. Hurricane Electric (HE) offered us 10 times the bandwidth at one fourth the price, so we started moving in this direction. This was about 18 months ago. Why so slow? Because unlike our Cogent link, we had to have a router under our control at the PAIX, which is rather expensive. Enter Packet Clearing House (PCH), who graciously gave us space in their rack at the PAIX (and a couple routers to boot). Part of this endeavor required setting up a tunnel from the PAIX, through CENIC, through campus, and up to our lab - so campus' Communication & Network Services (CNS) were greatly involved as well.

This pretty much explains why this took so long. There were several third party entities (HE, PCH, and CNS) who were involved, and none of them (including us) had infinite resources to devote to this project. So organizing meetings, developing and revising convoluted networking diagrams, holding hands and making sure balls didn't get dropped, was slow and painful (this would be the case no matter who was involved, so there's no bitterness in this regard of course). Throw in vacation fragmentation, Court leaving, bureaucratic snags galore, and we were lucky to see any progress month to month. Nevertheless, here we are.

So where are we? As of yesterday, the upload server (and one of the two public web servers) were already on HE. We got this to work over the past couple of weeks, hence the odd DNS changes that wreaked havoc in some BOINC clients. This morning we put the download server (the one that accounts for most of the bandwidth) on HE, and removed all the "safety net" routing configuration. We plan to get other servers on HE eventually, but for now we're completely off Cogent, and hoping we won't have to fall back.

Meanwhile, Eric was up in the lab doing surgery on many servers, all in an effort to improve them (add some recently donated memory, and in one case install a new motherboard). I was doing my own surgery, finally adding the new drives to sidious. We are closer to having that became our new BOINC database server, but it took me all afternoon to get mdadm to behave and have the new RAID 10 partition survive reboot. There's surprisingly lots of great documentation on mdadm on the web, but nothing about how to make RAID 10 survive reboot (well, nothing that works). The RAID 1 devices would be fine, but ultimately I had to add some lines to /etc/rc.sysinit to make a block device before mdadm tried to assemble to RAID 0 part.

There's more, I guess, but I need to go home.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 521306 · Report as offensive
Wander Saito
Volunteer tester

Send message
Joined: 7 Jul 03
Posts: 555
Credit: 2,136,061
RAC: 0
Brazil
Message 521326 - Posted: 22 Feb 2007, 1:04:23 UTC

So all that is good news :)
I guess the only thing left to do is to congratulate all involved for a job well done. Despite the outage warning, I hardly felt any downtime in any of my rigs or accessing SAH's website. Get some rest, you certainly earn it.

Regards,
Wander
ID: 521326 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 521330 - Posted: 22 Feb 2007, 1:11:22 UTC - in response to Message 521326.  

So all that is good news :)
I guess the only thing left to do is to congratulate all involved for a job well done. Despite the outage warning, I hardly felt any downtime in any of my rigs or accessing SAH's website. Get some rest, you certainly earn it.

Regards,
Wander


I would like to add my warm support to this!
It's good to be back amongst friends and colleagues



ID: 521330 · Report as offensive
Profile JPLiz
Volunteer tester
Avatar

Send message
Joined: 19 Jul 99
Posts: 6
Credit: 560,130
RAC: 0
Canada
Message 521364 - Posted: 22 Feb 2007, 2:55:22 UTC

Well done guys.

ID: 521364 · Report as offensive
n7rfa
Volunteer tester
Avatar

Send message
Joined: 13 Apr 04
Posts: 370
Credit: 9,058,599
RAC: 0
United States
Message 521367 - Posted: 22 Feb 2007, 3:07:20 UTC

Way to go! Nice to have a faster link for less money!

Is there a Cricket Graph of the new link yet?
ID: 521367 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 521504 - Posted: 22 Feb 2007, 10:26:56 UTC - in response to Message 521367.  

Is there a Cricket Graph of the new link yet?


Try here.

Well done Matt and everyone else from me too.

ID: 521504 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 521506 - Posted: 22 Feb 2007, 10:34:52 UTC - in response to Message 521504.  

Is there a Cricket Graph of the new link yet?


Try here.

Well done Matt and everyone else from me too.

No, that looks like the old one. Since it's flatlining, and we're sending/receiving OK at the moment, I think we can take it the new link is working well.

We'll just have to see if someone adds a new link on this page, perhaps below the 668 reference in the top-left column.
ID: 521506 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 521510 - Posted: 22 Feb 2007, 10:52:01 UTC - in response to Message 521506.  
Last modified: 22 Feb 2007, 10:55:48 UTC

Is there a Cricket Graph of the new link yet?


Try here.

Well done Matt and everyone else from me too.

No, that looks like the old one. Since it's flatlining, and we're sending/receiving OK at the moment, I think we can take it the new link is working well.

We'll just have to see if someone adds a new link on this page, perhaps below the 668 reference in the top-left column.


This one on the main page is not flatlining, I agree that most of the graphs on http://fragment1.berkeley.edu/newcricket/cricket/inr-668.html are now flatlining.

I used to get confused by Cricket graphs until I figured that they are Berkeley local time not UTC, (my local time is UTC in the winter). Keith.
ID: 521510 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 521554 - Posted: 22 Feb 2007, 12:55:42 UTC

Congrats on a heck of a job well done!

And thanks for filling us in on the hidden stuff you guys do, it's truly illuminating.

Goodonyer, Matt and Eric :)
ID: 521554 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 521697 - Posted: 22 Feb 2007, 19:56:36 UTC

Congratulations! Well done! I´m surprised how small the disturbance was for that huge change!

Matt,

mdadm means software-raid, right? Is it the money-problem why you don´t use a raid-controller or are there other reasons?
mic.


ID: 521697 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 521707 - Posted: 22 Feb 2007, 20:33:51 UTC - in response to Message 521697.  

Congratulations! Well done! I´m surprised how small the disturbance was for that huge change!


Jeff and Court deserve about 98% of the credit. I purposefully stayed out of the loop until after Court left and then I was forced into the loop.

mdadm means software-raid, right? Is it the money-problem why you don´t use a raid-controller or are there other reasons?


We've been bitten by hardware raid in the past (causing random database corruption in one case, not effectively reporting failures/errors in several others), so I'm a bit biased. In my experience, I feel the control and access of software raid far outweighs any performance loss from not using hardware. I'm the kind of guy that hates hidden levels of anything - I like to know exactly what's going on how and when and where and why at all times, and you don't seem to get any of that from a hardware raid device. This is also why I despise object oriented programming. I grew up having to peek and poke Pet and Apple computers to get anything to work, so I can't stand having these details hidden from me. Hiding important details in hardware and black-box software defies true understanding, and therefore turns computing from an art into, well, something that isn't very different from bureaucracy. Wow. That was a mini-rant. A had a couple sips of coffee this morning. Sorry.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 521707 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 521809 - Posted: 22 Feb 2007, 23:03:29 UTC - in response to Message 521697.  

Congratulations! Well done! I´m surprised how small the disturbance was for that huge change!

That simply means that someone managed timeouts and etc. in DNS well.

Since the dominant DNS server out there is "BIND" and the "B" in BIND stands for "Berkeley" I'm not surprised.

... but I see many cases where it isn't done correctly, and they definitely did everything exactly right -- and deserve all the compliments.
ID: 521809 · Report as offensive
Profile Fuzzy Hollynoodles
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 9659
Credit: 251,998
RAC: 0
Message 521847 - Posted: 22 Feb 2007, 23:49:05 UTC - in response to Message 521707.  
Last modified: 22 Feb 2007, 23:55:22 UTC

... This is also why I despise object oriented programming. I grew up having to peek and poke Pet and Apple computers to get anything to work, so I can't stand having these details hidden from me. Hiding important details in hardware and black-box software defies true understanding, and therefore turns computing from an art into, well, something that isn't very different from bureaucracy. Wow. That was a mini-rant. A had a couple sips of coffee this morning. Sorry.

- Matt


Opening up a can of worms again, huh? ;-)

Drink some more coffee.


"I'm trying to maintain a shred of dignity in this world." - Me

ID: 521847 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 522060 - Posted: 23 Feb 2007, 8:55:19 UTC - in response to Message 521367.  

Is there a Cricket Graph of the new link yet?

This looks like a likely candidate.

Grant
Darwin NT
ID: 522060 · Report as offensive
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 522096 - Posted: 23 Feb 2007, 12:00:47 UTC - in response to Message 522060.  
Last modified: 23 Feb 2007, 12:04:21 UTC

Is there a Cricket Graph of the new link yet?

This looks like a likely candidate.

Goot bet! Found it on the inr-250 page (gigabitethernet2_3: 169.229.0.190: SETI@Home_P2P_to_sslringeva1fes_Gi0/1).

Peter
ID: 522096 · Report as offensive
n7rfa
Volunteer tester
Avatar

Send message
Joined: 13 Apr 04
Posts: 370
Credit: 9,058,599
RAC: 0
United States
Message 522124 - Posted: 23 Feb 2007, 13:57:56 UTC - in response to Message 522060.  
Last modified: 23 Feb 2007, 14:03:59 UTC

Is there a Cricket Graph of the new link yet?

This looks like a likely candidate.

Thanks! That's the one! It also matches the picture in this thread.

I am wondering if the labels aren't reversed.

Before the Outbound traffic was very high in relation to the Inbound traffic. Now I see that the Inbound traffic is 10+ times the Outbound traffic.
ID: 522124 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 522437 - Posted: 24 Feb 2007, 1:41:01 UTC - in response to Message 522124.  

I am wondering if the labels aren't reversed.

Before the Outbound traffic was very high in relation to the Inbound traffic. Now I see that the Inbound traffic is 10+ times the Outbound traffic.

I think it's a case of the port being monitored.
Previously it was the port connected to the outside world. This time it's the port connected to the server- so the outgoing traffic from the server shows as ingoing traffic to the switch (and the incoming traffic to the server shows as outgoing traffic from the switch).
Grant
Darwin NT
ID: 522437 · Report as offensive
n7rfa
Volunteer tester
Avatar

Send message
Joined: 13 Apr 04
Posts: 370
Credit: 9,058,599
RAC: 0
United States
Message 522699 - Posted: 24 Feb 2007, 15:52:41 UTC - in response to Message 522437.  

I am wondering if the labels aren't reversed.

Before the Outbound traffic was very high in relation to the Inbound traffic. Now I see that the Inbound traffic is 10+ times the Outbound traffic.

I think it's a case of the port being monitored.
Previously it was the port connected to the outside world. This time it's the port connected to the server- so the outgoing traffic from the server shows as ingoing traffic to the switch (and the incoming traffic to the server shows as outgoing traffic from the switch).

While that may be, I still find is rather interesting (to say the least) that the Inbound and Outbound values have almost exactly reversed.

After all, they send out ~350kb WUs and receive in 15-30kb Results. They receive an HTML request and send out a page of data. I would naturally expect that the Outbound traffic whould be higher than the Inbound traffic.
ID: 522699 · Report as offensive
Profile Adam Weichel

Send message
Joined: 30 Jul 02
Posts: 22
Credit: 25,877,509
RAC: 46
Canada
Message 522704 - Posted: 24 Feb 2007, 16:18:42 UTC

Nice work! Big infrastructure / ISP moves are always a pain, no matter how much planning. The smoothness of the transfer is a testament to just how professional and competent the S@H tech staff are.

Congrats on a very successful migration.
Computer nut, Distributed Computing freak, Jeeper and Dodge Ram driver.

Life is worth living... and worth discovering.

I run VMWare ESXi Free - why don't you?
ID: 522704 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 522933 - Posted: 25 Feb 2007, 0:04:26 UTC - in response to Message 522699.  

After all, they send out ~350kb WUs and receive in 15-30kb Results. They receive an HTML request and send out a page of data. I would naturally expect that the Outbound traffic whould be higher than the Inbound traffic.

As i suggested above- i expect it's just a case of the port being monitored as to what is considered inbound & what is considereed outbound.
Grant
Darwin NT
ID: 522933 · Report as offensive
1 · 2 · 3 · Next

Message boards : Technical News : Hurricane (Feb 21 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.