Get Out of My House (Jan 18 2011)


log in

Advanced search

Message boards : Technical News : Get Out of My House (Jan 18 2011)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1068032 - Posted: 18 Jan 2011, 22:02:28 UTC

Nothing like coming back from a long holiday weekend and having one of your main production servers croak as soon as you arrive. It's a sunny day outside and I was stuck wearing my fleece jacket and fingerless gloves inside a well air-conditioned server closet.

So what happened? Not sure exactly, but bruno (the upload server, as well as the main boincadm administrative server) was all hung up as soon as we started the normal Tuesday outage. I had to reboot it, and that was that - it wouldn't come up properly again.

It seems to be a multiple-part problem. There was a disk failure, and the 3ware card in this system has always given us trouble. What kind of trouble? Well, if you reboot the system (without a full power cycle) random drives go missing. That's kind of a problem, no? I don't think this is a single broken card - a labmate has similar problems with the same model in his system (I forget the model #, but it's 24-channels). Anyway, the big RAID10 holding all the results was tagged as degraded and rebuilding now.

That's fine, except the OS (which is on separate partitions and not under the jurisdiction of the 3ware card) isn't booting either. Jeez! The good news is I can boot of a Fedora live CD and see both the root and upload storage drives, so there's no data loss. It just won't boot!

The other good news is that, if we need it, we have a backup system already: synergy! It might be getting pulled into prime time sooner than expected. It doesn't have nearly the large number of disk spindles as on bruno, but this might not be an issue - there's still plenty of disk space on it. And a lot of memory for potential file system caching. It's still undecided if we're going to make synergy the new bruno, but I'm at least copying everything there now just to be safe.

I might still be able to get bruno up this afternoon, but if not, looks like we're down for the evening (it'll take that long to copy everything over to synergy).

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,487,204
RAC: 11,278
United States
Message 1068038 - Posted: 18 Jan 2011, 22:13:58 UTC - in response to Message 1068032.

Ain't it grand knowing you have a spare, brand new computer just waiting to show what it can do? :-) Good luck with Bruno but if you can't get it up we'll see you tomorrow!
____________


PROUD MEMBER OF Team Starfire World BOINC

QSilver
Send message
Joined: 26 May 99
Posts: 228
Credit: 4,633,861
RAC: 3,026
United States
Message 1068040 - Posted: 18 Jan 2011, 22:17:54 UTC

Thanks for the update, Matt. And good luck!
____________

Profile Jim_SProject donor
Avatar
Send message
Joined: 23 Feb 00
Posts: 4520
Credit: 18,524,974
RAC: 5,671
United States
Message 1068042 - Posted: 18 Jan 2011, 22:23:31 UTC

Thanks for the update Matt. And I'll second that good luck!!!
____________

I Desire Peace and Justice, Jim Scott (Mod-Ret.)

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 1068043 - Posted: 18 Jan 2011, 22:25:21 UTC

Never been a big fan of the 3Ware cards of days past - they were bought by LSI to capture market share and are partnered with Intel so there is plenty of development budget behind them now.

It is good that there is a "spare" around that would be up to the task to take over where bruno left off.

I sense another fundrasier in our future - I still have access to another barebones server from Intel at a discounted price.

Todd
____________

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7069
Credit: 60,279,620
RAC: 18,636
Germany
Message 1068046 - Posted: 18 Jan 2011, 22:44:25 UTC

Matt , Todd ;-) , thanks for the news!

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4087
Credit: 32,993,533
RAC: 5,959
United Kingdom
Message 1068051 - Posted: 18 Jan 2011, 23:11:41 UTC - in response to Message 1068032.

Thanks for the update Matt, good luck with Bruno,

Claggy

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12488
Credit: 6,799,443
RAC: 6,424
United States
Message 1068056 - Posted: 18 Jan 2011, 23:29:44 UTC

Good luck on Bruno and thanks for keeping us informed.

____________

Profile Zapped SparkyProject donor
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 30 Aug 08
Posts: 7592
Credit: 1,257,054
RAC: 1,315
United Kingdom
Message 1068058 - Posted: 18 Jan 2011, 23:43:58 UTC

That's some trouble, good luck, hope you can get Bruno going again.

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1068065 - Posted: 19 Jan 2011, 0:01:52 UTC

By the way, no dice on bruno. I tried a bunch of things, it still won't boot - even though I can see all the drives/data when I boot from CD. And the RAID card refuses to NOT tag the RAID as degraded.

We're probably going to be down for at least another day or two. I'll post something to the front page shortly.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,487,204
RAC: 11,278
United States
Message 1068071 - Posted: 19 Jan 2011, 0:27:07 UTC - in response to Message 1068065.

Well. I guess that means Hello Synergy!! If it's half as good as it appears it should be able to handle the new assignment with no problems. After checking my tasks on my BOINC Manager I see I can easily handle a couple of days downtime.

Even if I don't make it until you get back up I have some new toys coming in that I can install during the down time. I'll be here ready to go whenever you get back.
____________


PROUD MEMBER OF Team Starfire World BOINC

Saaby900T
Send message
Joined: 24 Dec 10
Posts: 76
Credit: 4,971,171
RAC: 0
United States
Message 1068081 - Posted: 19 Jan 2011, 1:15:34 UTC

Can we Still get new tasks?

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12488
Credit: 6,799,443
RAC: 6,424
United States
Message 1068095 - Posted: 19 Jan 2011, 2:14:06 UTC
Last modified: 19 Jan 2011, 2:15:10 UTC

Time for the spring cleaning. Reseat all the connectors from head to tail. I know that will take a couple hours, especially with the the connectors in the drives. Don't forget the ground strap.

It that fails, I guess Synergy is Bruno.

In any case it sounds like you have the data and that is the important thing.


Or is Bruno jealous of Synergy?
____________

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3372
Credit: 2,070,243
RAC: 2,178
Canada
Message 1068100 - Posted: 19 Jan 2011, 2:24:53 UTC

Alas, poor Bruno! I knew him, Horatio, a server of infinite jest, of most excellent fancy. He hath bore me on his back a thousand times, and now he done broke.

Hey Matt, any hints on how you are picking your thread titles these days?

____________

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,059
RAC: 12
United States
Message 1068101 - Posted: 19 Jan 2011, 2:26:41 UTC

so...

Bruno just...

....

....Froze up? *ducks and runs*
____________

Janice

Swibby Bear
Send message
Joined: 1 Aug 01
Posts: 236
Credit: 7,276,504
RAC: 102
United States
Message 1068110 - Posted: 19 Jan 2011, 3:01:54 UTC

If I recall, the current Bruno is the former Bambi, and the old Bruno was ditched when Oscar and Carolyn came in. Our heads are in a swirl!

Good luck, Matt

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 992
Credit: 207,654,737
RAC: 0
United States
Message 1068115 - Posted: 19 Jan 2011, 3:22:10 UTC - in response to Message 1068065.
Last modified: 19 Jan 2011, 3:23:46 UTC

By the way, no dice on bruno. I tried a bunch of things, it still won't boot - even though I can see all the drives/data when I boot from CD. And the RAID card refuses to NOT tag the RAID as degraded.
...
- Matt


is the boot load working?
how far into boot do you get?
____________

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46261
Credit: 36,671,366
RAC: 5,266
Message 1068123 - Posted: 19 Jan 2011, 3:59:51 UTC

Matt, Chin Up You'll figure out what Gremlin is bugging Ya and squash It flat in no time once Ya do figure It all out. Me I just have a learning curve with Vista Business x64 and with RealTemp 3.58, It isn't pretty, As I had to start It up with Task Scheduler instead of the Startup folder, It should start when the PC does now, I may upgrade to 7 Pro sooner than I thought, At least I have the DVD already. Good Luck.
____________
My Facebook, War Commander, 2015

-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1068131 - Posted: 19 Jan 2011, 4:34:00 UTC

Thanks for the heads up Matt, explains why all my uploads have been backed off 3 hours. Kind of figured it was just an extended outage on the upload server. Oh well thank goodness you have another server to throw in there.
____________
Traveling through space at ~67,000mph!

Harri Liljeroos
Avatar
Send message
Joined: 29 May 99
Posts: 46
Credit: 19,004,364
RAC: 12,292
Finland
Message 1068183 - Posted: 19 Jan 2011, 7:37:04 UTC - in response to Message 1068100.
Last modified: 19 Jan 2011, 7:37:29 UTC

Hey Matt, any hints on how you are picking your thread titles these days?


The titles seem to be names of Kate Bush songs.
____________

1 · 2 · Next

Message boards : Technical News : Get Out of My House (Jan 18 2011)

Copyright © 2014 University of California