Message boards :
Technical News :
Hurricane (Feb 21 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Major success today: The final big step of our network upgrade was completed this morning. I've been purposefully vague about the details of what we're doing because it involves many parties and contractual agreements. We'll have a formal writeup at some point, but the basic gist of it is: we're moving away from using Cogent as our ISP. Some brief history: We used to send all our traffic over through campus until our one data server accounted for 33% of the entire university's outgoing bandwidth. With the advent of broadband (and undergraduate/staff addiction to file sharing) the ethernet pipes were clogged so we were forced to buy our own plumbing. Cogent became our ISP, and we got a dedicated 100 Mbit link for what was a good deal at the time (circa 2002). Time passed, and with inflation this deal became less and less affordable. Eventually we had to start looking elsewhere. Hurricane Electric (HE) offered us 10 times the bandwidth at one fourth the price, so we started moving in this direction. This was about 18 months ago. Why so slow? Because unlike our Cogent link, we had to have a router under our control at the PAIX, which is rather expensive. Enter Packet Clearing House (PCH), who graciously gave us space in their rack at the PAIX (and a couple routers to boot). Part of this endeavor required setting up a tunnel from the PAIX, through CENIC, through campus, and up to our lab - so campus' Communication & Network Services (CNS) were greatly involved as well. This pretty much explains why this took so long. There were several third party entities (HE, PCH, and CNS) who were involved, and none of them (including us) had infinite resources to devote to this project. So organizing meetings, developing and revising convoluted networking diagrams, holding hands and making sure balls didn't get dropped, was slow and painful (this would be the case no matter who was involved, so there's no bitterness in this regard of course). Throw in vacation fragmentation, Court leaving, bureaucratic snags galore, and we were lucky to see any progress month to month. Nevertheless, here we are. So where are we? As of yesterday, the upload server (and one of the two public web servers) were already on HE. We got this to work over the past couple of weeks, hence the odd DNS changes that wreaked havoc in some BOINC clients. This morning we put the download server (the one that accounts for most of the bandwidth) on HE, and removed all the "safety net" routing configuration. We plan to get other servers on HE eventually, but for now we're completely off Cogent, and hoping we won't have to fall back. Meanwhile, Eric was up in the lab doing surgery on many servers, all in an effort to improve them (add some recently donated memory, and in one case install a new motherboard). I was doing my own surgery, finally adding the new drives to sidious. We are closer to having that became our new BOINC database server, but it took me all afternoon to get mdadm to behave and have the new RAID 10 partition survive reboot. There's surprisingly lots of great documentation on mdadm on the web, but nothing about how to make RAID 10 survive reboot (well, nothing that works). The RAID 1 devices would be fine, but ultimately I had to add some lines to /etc/rc.sysinit to make a block device before mdadm tried to assemble to RAID 0 part. There's more, I guess, but I need to go home. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Wander Saito Send message Joined: 7 Jul 03 Posts: 555 Credit: 2,136,061 RAC: 0 |
So all that is good news :) I guess the only thing left to do is to congratulate all involved for a job well done. Despite the outage warning, I hardly felt any downtime in any of my rigs or accessing SAH's website. Get some rest, you certainly earn it. Regards, Wander |
John Clark Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0 |
So all that is good news :) I would like to add my warm support to this! It's good to be back amongst friends and colleagues |
JPLiz Send message Joined: 19 Jul 99 Posts: 6 Credit: 560,130 RAC: 0 |
|
n7rfa Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0 |
Way to go! Nice to have a faster link for less money! Is there a Cricket Graph of the new link yet? |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
Is there a Cricket Graph of the new link yet? Try here. Well done Matt and everyone else from me too. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Is there a Cricket Graph of the new link yet? No, that looks like the old one. Since it's flatlining, and we're sending/receiving OK at the moment, I think we can take it the new link is working well. We'll just have to see if someone adds a new link on this page, perhaps below the 668 reference in the top-left column. |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
Is there a Cricket Graph of the new link yet? This one on the main page is not flatlining, I agree that most of the graphs on http://fragment1.berkeley.edu/newcricket/cricket/inr-668.html are now flatlining. I used to get confused by Cricket graphs until I figured that they are Berkeley local time not UTC, (my local time is UTC in the winter). Keith. |
littlegreenmanfrommars Send message Joined: 28 Jan 06 Posts: 1410 Credit: 934,158 RAC: 0 |
Congrats on a heck of a job well done! And thanks for filling us in on the hidden stuff you guys do, it's truly illuminating. Goodonyer, Matt and Eric :) |
speedimic Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0 |
Congratulations! Well done! I´m surprised how small the disturbance was for that huge change! Matt, mdadm means software-raid, right? Is it the money-problem why you don´t use a raid-controller or are there other reasons? mic. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Congratulations! Well done! I´m surprised how small the disturbance was for that huge change! Jeff and Court deserve about 98% of the credit. I purposefully stayed out of the loop until after Court left and then I was forced into the loop. mdadm means software-raid, right? Is it the money-problem why you don´t use a raid-controller or are there other reasons? We've been bitten by hardware raid in the past (causing random database corruption in one case, not effectively reporting failures/errors in several others), so I'm a bit biased. In my experience, I feel the control and access of software raid far outweighs any performance loss from not using hardware. I'm the kind of guy that hates hidden levels of anything - I like to know exactly what's going on how and when and where and why at all times, and you don't seem to get any of that from a hardware raid device. This is also why I despise object oriented programming. I grew up having to peek and poke Pet and Apple computers to get anything to work, so I can't stand having these details hidden from me. Hiding important details in hardware and black-box software defies true understanding, and therefore turns computing from an art into, well, something that isn't very different from bureaucracy. Wow. That was a mini-rant. A had a couple sips of coffee this morning. Sorry. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Congratulations! Well done! I´m surprised how small the disturbance was for that huge change! That simply means that someone managed timeouts and etc. in DNS well. Since the dominant DNS server out there is "BIND" and the "B" in BIND stands for "Berkeley" I'm not surprised. ... but I see many cases where it isn't done correctly, and they definitely did everything exactly right -- and deserve all the compliments. |
Fuzzy Hollynoodles Send message Joined: 3 Apr 99 Posts: 9659 Credit: 251,998 RAC: 0 |
... This is also why I despise object oriented programming. I grew up having to peek and poke Pet and Apple computers to get anything to work, so I can't stand having these details hidden from me. Hiding important details in hardware and black-box software defies true understanding, and therefore turns computing from an art into, well, something that isn't very different from bureaucracy. Wow. That was a mini-rant. A had a couple sips of coffee this morning. Sorry. Opening up a can of worms again, huh? ;-) Drink some more coffee. "I'm trying to maintain a shred of dignity in this world." - Me |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13744 Credit: 208,696,464 RAC: 304 |
|
Pepo Send message Joined: 5 Aug 99 Posts: 308 Credit: 418,019 RAC: 0 |
Is there a Cricket Graph of the new link yet? Goot bet! Found it on the inr-250 page (gigabitethernet2_3: 169.229.0.190: SETI@Home_P2P_to_sslringeva1fes_Gi0/1). Peter |
n7rfa Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0 |
Is there a Cricket Graph of the new link yet? Thanks! That's the one! It also matches the picture in this thread. I am wondering if the labels aren't reversed. Before the Outbound traffic was very high in relation to the Inbound traffic. Now I see that the Inbound traffic is 10+ times the Outbound traffic. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13744 Credit: 208,696,464 RAC: 304 |
I am wondering if the labels aren't reversed. I think it's a case of the port being monitored. Previously it was the port connected to the outside world. This time it's the port connected to the server- so the outgoing traffic from the server shows as ingoing traffic to the switch (and the incoming traffic to the server shows as outgoing traffic from the switch). Grant Darwin NT |
n7rfa Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0 |
I am wondering if the labels aren't reversed. While that may be, I still find is rather interesting (to say the least) that the Inbound and Outbound values have almost exactly reversed. After all, they send out ~350kb WUs and receive in 15-30kb Results. They receive an HTML request and send out a page of data. I would naturally expect that the Outbound traffic whould be higher than the Inbound traffic. |
Adam Weichel Send message Joined: 30 Jul 02 Posts: 22 Credit: 25,877,509 RAC: 46 |
Nice work! Big infrastructure / ISP moves are always a pain, no matter how much planning. The smoothness of the transfer is a testament to just how professional and competent the S@H tech staff are. Congrats on a very successful migration. Computer nut, Distributed Computing freak, Jeeper and Dodge Ram driver. Life is worth living... and worth discovering. I run VMWare ESXi Free - why don't you? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13744 Credit: 208,696,464 RAC: 304 |
After all, they send out ~350kb WUs and receive in 15-30kb Results. They receive an HTML request and send out a page of data. I would naturally expect that the Outbound traffic whould be higher than the Inbound traffic. As i suggested above- i expect it's just a case of the port being monitored as to what is considered inbound & what is considereed outbound. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.