Hurricane (Feb 21 2007)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 521306 - Posted: 22 Feb 2007, 0:38:10 UTC Last modified: 22 Feb 2007, 0:38:27 UTC Major success today: The final big step of our network upgrade was completed this morning. I've been purposefully vague about the details of what we're doing because it involves many parties and contractual agreements. We'll have a formal writeup at some point, but the basic gist of it is: we're moving away from using Cogent as our ISP. Some brief history: We used to send all our traffic over through campus until our one data server accounted for 33% of the entire university's outgoing bandwidth. With the advent of broadband (and undergraduate/staff addiction to file sharing) the ethernet pipes were clogged so we were forced to buy our own plumbing. Cogent became our ISP, and we got a dedicated 100 Mbit link for what was a good deal at the time (circa 2002). Time passed, and with inflation this deal became less and less affordable. Eventually we had to start looking elsewhere. Hurricane Electric (HE) offered us 10 times the bandwidth at one fourth the price, so we started moving in this direction. This was about 18 months ago. Why so slow? Because unlike our Cogent link, we had to have a router under our control at the PAIX, which is rather expensive. Enter Packet Clearing House (PCH), who graciously gave us space in their rack at the PAIX (and a couple routers to boot). Part of this endeavor required setting up a tunnel from the PAIX, through CENIC, through campus, and up to our lab - so campus' Communication & Network Services (CNS) were greatly involved as well. This pretty much explains why this took so long. There were several third party entities (HE, PCH, and CNS) who were involved, and none of them (including us) had infinite resources to devote to this project. So organizing meetings, developing and revising convoluted networking diagrams, holding hands and making sure balls didn't get dropped, was slow and painful (this would be the case no matter who was involved, so there's no bitterness in this regard of course). Throw in vacation fragmentation, Court leaving, bureaucratic snags galore, and we were lucky to see any progress month to month. Nevertheless, here we are. So where are we? As of yesterday, the upload server (and one of the two public web servers) were already on HE. We got this to work over the past couple of weeks, hence the odd DNS changes that wreaked havoc in some BOINC clients. This morning we put the download server (the one that accounts for most of the bandwidth) on HE, and removed all the "safety net" routing configuration. We plan to get other servers on HE eventually, but for now we're completely off Cogent, and hoping we won't have to fall back. Meanwhile, Eric was up in the lab doing surgery on many servers, all in an effort to improve them (add some recently donated memory, and in one case install a new motherboard). I was doing my own surgery, finally adding the new drives to sidious. We are closer to having that became our new BOINC database server, but it took me all afternoon to get mdadm to behave and have the new RAID 10 partition survive reboot. There's surprisingly lots of great documentation on mdadm on the web, but nothing about how to make RAID 10 survive reboot (well, nothing that works). The RAID 1 devices would be fine, but ultimately I had to add some lines to /etc/rc.sysinit to make a block device before mdadm tried to assemble to RAID 0 part. There's more, I guess, but I need to go home. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 521306 ·

Wander Saito Volunteer tester Send message Joined: 7 Jul 03 Posts: 555 Credit: 2,136,061 RAC: 0	Message 521326 - Posted: 22 Feb 2007, 1:04:23 UTC So all that is good news :) I guess the only thing left to do is to congratulate all involved for a job well done. Despite the outage warning, I hardly felt any downtime in any of my rigs or accessing SAH's website. Get some rest, you certainly earn it. Regards, Wander ID: 521326 ·

John Clark Volunteer tester Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0	Message 521330 - Posted: 22 Feb 2007, 1:11:22 UTC - in response to Message 521326. So all that is good news :) I guess the only thing left to do is to congratulate all involved for a job well done. Despite the outage warning, I hardly felt any downtime in any of my rigs or accessing SAH's website. Get some rest, you certainly earn it. Regards, Wander I would like to add my warm support to this! It's good to be back amongst friends and colleagues ID: 521330 ·

JPLiz Volunteer tester Send message Joined: 19 Jul 99 Posts: 6 Credit: 560,130 RAC: 0	Message 521364 - Posted: 22 Feb 2007, 2:55:22 UTC Well done guys. ID: 521364 ·

n7rfa Volunteer tester Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0	Message 521367 - Posted: 22 Feb 2007, 3:07:20 UTC Way to go! Nice to have a faster link for less money! Is there a Cricket Graph of the new link yet? ID: 521367 ·

Keith T. Volunteer tester Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9	Message 521504 - Posted: 22 Feb 2007, 10:26:56 UTC - in response to Message 521367. Is there a Cricket Graph of the new link yet? Try here. Well done Matt and everyone else from me too. ID: 521504 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 521506 - Posted: 22 Feb 2007, 10:34:52 UTC - in response to Message 521504. Is there a Cricket Graph of the new link yet? Try here. Well done Matt and everyone else from me too. No, that looks like the old one. Since it's flatlining, and we're sending/receiving OK at the moment, I think we can take it the new link is working well. We'll just have to see if someone adds a new link on this page, perhaps below the 668 reference in the top-left column. ID: 521506 ·

Keith T. Volunteer tester Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9	Message 521510 - Posted: 22 Feb 2007, 10:52:01 UTC - in response to Message 521506. Last modified: 22 Feb 2007, 10:55:48 UTC Is there a Cricket Graph of the new link yet? Try here. Well done Matt and everyone else from me too. No, that looks like the old one. Since it's flatlining, and we're sending/receiving OK at the moment, I think we can take it the new link is working well. We'll just have to see if someone adds a new link on this page, perhaps below the 668 reference in the top-left column. This one on the main page is not flatlining, I agree that most of the graphs on http://fragment1.berkeley.edu/newcricket/cricket/inr-668.html are now flatlining. I used to get confused by Cricket graphs until I figured that they are Berkeley local time not UTC, (my local time is UTC in the winter). Keith. ID: 521510 ·

littlegreenmanfrommars Volunteer tester Send message Joined: 28 Jan 06 Posts: 1410 Credit: 934,158 RAC: 0	Message 521554 - Posted: 22 Feb 2007, 12:55:42 UTC Congrats on a heck of a job well done! And thanks for filling us in on the hidden stuff you guys do, it's truly illuminating. Goodonyer, Matt and Eric :) ID: 521554 ·

speedimic Volunteer tester Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0	Message 521697 - Posted: 22 Feb 2007, 19:56:36 UTC Congratulations! Well done! IÃ‚Â´m surprised how small the disturbance was for that huge change! Matt, mdadm means software-raid, right? Is it the money-problem why you donÃ‚Â´t use a raid-controller or are there other reasons? mic. ID: 521697 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 521707 - Posted: 22 Feb 2007, 20:33:51 UTC - in response to Message 521697. Congratulations! Well done! IÃ‚Â´m surprised how small the disturbance was for that huge change! Jeff and Court deserve about 98% of the credit. I purposefully stayed out of the loop until after Court left and then I was forced into the loop. mdadm means software-raid, right? Is it the money-problem why you donÃ‚Â´t use a raid-controller or are there other reasons? We've been bitten by hardware raid in the past (causing random database corruption in one case, not effectively reporting failures/errors in several others), so I'm a bit biased. In my experience, I feel the control and access of software raid far outweighs any performance loss from not using hardware. I'm the kind of guy that hates hidden levels of anything - I like to know exactly what's going on how and when and where and why at all times, and you don't seem to get any of that from a hardware raid device. This is also why I despise object oriented programming. I grew up having to peek and poke Pet and Apple computers to get anything to work, so I can't stand having these details hidden from me. Hiding important details in hardware and black-box software defies true understanding, and therefore turns computing from an art into, well, something that isn't very different from bureaucracy. Wow. That was a mini-rant. A had a couple sips of coffee this morning. Sorry. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 521707 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 521809 - Posted: 22 Feb 2007, 23:03:29 UTC - in response to Message 521697. Congratulations! Well done! IÃ‚Â´m surprised how small the disturbance was for that huge change! That simply means that someone managed timeouts and etc. in DNS well. Since the dominant DNS server out there is "BIND" and the "B" in BIND stands for "Berkeley" I'm not surprised. ... but I see many cases where it isn't done correctly, and they definitely did everything exactly right -- and deserve all the compliments. ID: 521809 ·

Fuzzy Hollynoodles Volunteer tester Send message Joined: 3 Apr 99 Posts: 9659 Credit: 251,998 RAC: 0	Message 521847 - Posted: 22 Feb 2007, 23:49:05 UTC - in response to Message 521707. Last modified: 22 Feb 2007, 23:55:22 UTC ... This is also why I despise object oriented programming. I grew up having to peek and poke Pet and Apple computers to get anything to work, so I can't stand having these details hidden from me. Hiding important details in hardware and black-box software defies true understanding, and therefore turns computing from an art into, well, something that isn't very different from bureaucracy. Wow. That was a mini-rant. A had a couple sips of coffee this morning. Sorry. - Matt Opening up a can of worms again, huh? ;-) Drink some more coffee. "I'm trying to maintain a shred of dignity in this world." - Me ID: 521847 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13744 Credit: 208,696,464 RAC: 304	Message 522060 - Posted: 23 Feb 2007, 8:55:19 UTC - in response to Message 521367. Is there a Cricket Graph of the new link yet? This looks like a likely candidate. Grant Darwin NT ID: 522060 ·

Pepo Volunteer tester Send message Joined: 5 Aug 99 Posts: 308 Credit: 418,019 RAC: 0	Message 522096 - Posted: 23 Feb 2007, 12:00:47 UTC - in response to Message 522060. Last modified: 23 Feb 2007, 12:04:21 UTC Is there a Cricket Graph of the new link yet? This looks like a likely candidate. Goot bet! Found it on the inr-250 page (gigabitethernet2_3: 169.229.0.190: SETI@Home_P2P_to_sslringeva1fes_Gi0/1). Peter ID: 522096 ·

n7rfa Volunteer tester Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0	Message 522124 - Posted: 23 Feb 2007, 13:57:56 UTC - in response to Message 522060. Last modified: 23 Feb 2007, 14:03:59 UTC Is there a Cricket Graph of the new link yet? This looks like a likely candidate. Thanks! That's the one! It also matches the picture in this thread. I am wondering if the labels aren't reversed. Before the Outbound traffic was very high in relation to the Inbound traffic. Now I see that the Inbound traffic is 10+ times the Outbound traffic. ID: 522124 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13744 Credit: 208,696,464 RAC: 304	Message 522437 - Posted: 24 Feb 2007, 1:41:01 UTC - in response to Message 522124. I am wondering if the labels aren't reversed. Before the Outbound traffic was very high in relation to the Inbound traffic. Now I see that the Inbound traffic is 10+ times the Outbound traffic. I think it's a case of the port being monitored. Previously it was the port connected to the outside world. This time it's the port connected to the server- so the outgoing traffic from the server shows as ingoing traffic to the switch (and the incoming traffic to the server shows as outgoing traffic from the switch). Grant Darwin NT ID: 522437 ·

n7rfa Volunteer tester Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0	Message 522699 - Posted: 24 Feb 2007, 15:52:41 UTC - in response to Message 522437. I am wondering if the labels aren't reversed. Before the Outbound traffic was very high in relation to the Inbound traffic. Now I see that the Inbound traffic is 10+ times the Outbound traffic. I think it's a case of the port being monitored. Previously it was the port connected to the outside world. This time it's the port connected to the server- so the outgoing traffic from the server shows as ingoing traffic to the switch (and the incoming traffic to the server shows as outgoing traffic from the switch). While that may be, I still find is rather interesting (to say the least) that the Inbound and Outbound values have almost exactly reversed. After all, they send out ~350kb WUs and receive in 15-30kb Results. They receive an HTML request and send out a page of data. I would naturally expect that the Outbound traffic whould be higher than the Inbound traffic. ID: 522699 ·

Adam Weichel Send message Joined: 30 Jul 02 Posts: 22 Credit: 25,877,509 RAC: 46	Message 522704 - Posted: 24 Feb 2007, 16:18:42 UTC Nice work! Big infrastructure / ISP moves are always a pain, no matter how much planning. The smoothness of the transfer is a testament to just how professional and competent the S@H tech staff are. Congrats on a very successful migration. Computer nut, Distributed Computing freak, Jeeper and Dodge Ram driver. Life is worth living... and worth discovering. I run VMWare ESXi Free - why don't you? ID: 522704 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13744 Credit: 208,696,464 RAC: 304	Message 522933 - Posted: 25 Feb 2007, 0:04:26 UTC - in response to Message 522699. After all, they send out ~350kb WUs and receive in 15-30kb Results. They receive an HTML request and send out a page of data. I would naturally expect that the Outbound traffic whould be higher than the Inbound traffic. As i suggested above- i expect it's just a case of the port being monitored as to what is considered inbound & what is considereed outbound. Grant Darwin NT ID: 522933 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.