Unexpected Crisis du Jour (Mar 13 2007)


log in

Advanced search

Message boards : Technical News : Unexpected Crisis du Jour (Mar 13 2007)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 530887 - Posted: 13 Mar 2007, 22:42:21 UTC

We had the usual database outage, this time exercising the new replica. We stopped the project and confirmed all the table counts matched. That gave me warm fuzzies. We then simultaneously compressed the tables on the master while backing up to disk from the replica. Doing these things in parallel would have normally shortened the length of the outage...

But Jeff and I took this opportunity to clean up the closet. It's a mess in there and we're trying to get rid of unused junk to make way for new stuff. Today we kept it simple: remove the switch/firewall used for our (now defunct) Cogent link, and move the current set of routers/switches into one general location on the rack so wires won't be all over the place. The latter required power cycling the router which is our end of the tunnel from our current ISP (Hurricane Electric). Upon reboot, packet traffic wasn't passing through at all.

Well, that's not entirely true - packets were going through (in both directions) but more or less stopping dead after that. It was a total mystery. A five minute reboot became a four hour detective case. Jeff and I pored through IOS manuals and configurations, testing this, rebooting that, and googling our way into and out of several red herrings.

Long story short, after a few hours we noticed traffic was back to normal and had been for some time. Hunh? Apparently one of our tests tickled something into working, so we rebooted the router again bringing us back into the mystery state. We finally found the magic bullet: pinging from inside the router to the next physical hop down on campus opened the floodgates. Why? That's still a mystery, but at least we know a fix when we get jammed again. Probably has something to do with router configuration somewhere expected an established connection before passing packets along.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Cherokee150
Send message
Joined: 11 Nov 99
Posts: 111
Credit: 25,351,167
RAC: 19,839
United States
Message 530945 - Posted: 14 Mar 2007, 0:05:08 UTC - in response to Message 530887.

We had the usual database outage, this time exercising the new replica. We stopped the project and confirmed all the table counts matched. That gave me warm fuzzies. We then simultaneously compressed the tables on the master while backing up to disk from the replica. Doing these things in parallel would have normally shortened the length of the outage...

But Jeff and I took this opportunity to clean up the closet. It's a mess in there and we're trying to get rid of unused junk to make way for new stuff. Today we kept it simple: remove the switch/firewall used for our (now defunct) Cogent link, and move the current set of routers/switches into one general location on the rack so wires won't be all over the place. The latter required power cycling the router which is our end of the tunnel from our current ISP (Hurricane Electric). Upon reboot, packet traffic wasn't passing through at all.

Well, that's not entirely true - packets were going through (in both directions) but more or less stopping dead after that. It was a total mystery. A five minute reboot became a four hour detective case. Jeff and I pored through IOS manuals and configurations, testing this, rebooting that, and googling our way into and out of several red herrings.

Long story short, after a few hours we noticed traffic was back to normal and had been for some time. Hunh? Apparently one of our tests tickled something into working, so we rebooted the router again bringing us back into the mystery state. We finally found the magic bullet: pinging from inside the router to the next physical hop down on campus opened the floodgates. Why? That's still a mystery, but at least we know a fix when we get jammed again. Probably has something to do with router configuration somewhere expected an established connection before passing packets along.

- Matt


Wow! And I thought I was having a bad day, not to mention the Stock Market.
It's days like this that make you almost believe in superstition

Wander Saito
Volunteer tester
Send message
Joined: 7 Jul 03
Posts: 555
Credit: 2,136,061
RAC: 0
Brazil
Message 530964 - Posted: 14 Mar 2007, 0:38:24 UTC

LOL... computing is not a exact science :)

Regards,
Wander
____________

Profile hiamps
Volunteer tester
Avatar
Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 530981 - Posted: 14 Mar 2007, 1:08:42 UTC

Looks like it is acting up again...
____________
Official Abuser of Boinc Buttons...
And no good credit hound!

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46544
Credit: 36,893,262
RAC: 5,040
United States
Message 531249 - Posted: 14 Mar 2007, 17:09:39 UTC - in response to Message 530945.

We had the usual database outage, this time exercising the new replica. We stopped the project and confirmed all the table counts matched. That gave me warm fuzzies. We then simultaneously compressed the tables on the master while backing up to disk from the replica. Doing these things in parallel would have normally shortened the length of the outage...

But Jeff and I took this opportunity to clean up the closet. It's a mess in there and we're trying to get rid of unused junk to make way for new stuff. Today we kept it simple: remove the switch/firewall used for our (now defunct) Cogent link, and move the current set of routers/switches into one general location on the rack so wires won't be all over the place. The latter required power cycling the router which is our end of the tunnel from our current ISP (Hurricane Electric). Upon reboot, packet traffic wasn't passing through at all.

Well, that's not entirely true - packets were going through (in both directions) but more or less stopping dead after that. It was a total mystery. A five minute reboot became a four hour detective case. Jeff and I pored through IOS manuals and configurations, testing this, rebooting that, and googling our way into and out of several red herrings.

Long story short, after a few hours we noticed traffic was back to normal and had been for some time. Hunh? Apparently one of our tests tickled something into working, so we rebooted the router again bringing us back into the mystery state. We finally found the magic bullet: pinging from inside the router to the next physical hop down on campus opened the floodgates. Why? That's still a mystery, but at least we know a fix when we get jammed again. Probably has something to do with router configuration somewhere expected an established connection before passing packets along.

- Matt


Wow! And I thought I was having a bad day, not to mention the Stock Market.
It's days like this that make you almost believe in superstition

Yeah and I thought I was having a bad day yesterday when My PC4 went down with psu-itiss(Needed a reactivation as a result of changing ram, psu and about 4 hours with MS to various phone numbers, Not all of them hearable, nightmare I tell You), But PC4 is back up with 1 stick of ram(1GB) and an OCZ 700w psu, Hopefully It will last better than the Tt toughpower 750w psu that preceeded It. But I'll just have to get an 850w OCZ as PC4 has a slightly less overclocked Quad core(3.2GHz vs 3.24GHz), Such is life. :D
____________
My Facebook, War Commander, 2015

Profile littlegreenmanfrommars
Volunteer tester
Avatar
Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 531594 - Posted: 15 Mar 2007, 4:10:42 UTC

Dare I say that sounds like DNS acting up?

Once you pinged, DNS then had a record of that IP. Maybe, perhaps, possibly...???
____________

Message boards : Technical News : Unexpected Crisis du Jour (Mar 13 2007)

Copyright © 2014 University of California