Get Out of My House (Jan 18 2011)

Message boards : Technical News : Get Out of My House (Jan 18 2011)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1068032 - Posted: 18 Jan 2011, 22:02:28 UTC

Nothing like coming back from a long holiday weekend and having one of your main production servers croak as soon as you arrive. It's a sunny day outside and I was stuck wearing my fleece jacket and fingerless gloves inside a well air-conditioned server closet.

So what happened? Not sure exactly, but bruno (the upload server, as well as the main boincadm administrative server) was all hung up as soon as we started the normal Tuesday outage. I had to reboot it, and that was that - it wouldn't come up properly again.

It seems to be a multiple-part problem. There was a disk failure, and the 3ware card in this system has always given us trouble. What kind of trouble? Well, if you reboot the system (without a full power cycle) random drives go missing. That's kind of a problem, no? I don't think this is a single broken card - a labmate has similar problems with the same model in his system (I forget the model #, but it's 24-channels). Anyway, the big RAID10 holding all the results was tagged as degraded and rebuilding now.

That's fine, except the OS (which is on separate partitions and not under the jurisdiction of the 3ware card) isn't booting either. Jeez! The good news is I can boot of a Fedora live CD and see both the root and upload storage drives, so there's no data loss. It just won't boot!

The other good news is that, if we need it, we have a backup system already: synergy! It might be getting pulled into prime time sooner than expected. It doesn't have nearly the large number of disk spindles as on bruno, but this might not be an issue - there's still plenty of disk space on it. And a lot of memory for potential file system caching. It's still undecided if we're going to make synergy the new bruno, but I'm at least copying everything there now just to be safe.

I might still be able to get bruno up this afternoon, but if not, looks like we're down for the evening (it'll take that long to copy everything over to synergy).

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1068032 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1068038 - Posted: 18 Jan 2011, 22:13:58 UTC - in response to Message 1068032.  

Ain't it grand knowing you have a spare, brand new computer just waiting to show what it can do? :-) Good luck with Bruno but if you can't get it up we'll see you tomorrow!


PROUD MEMBER OF Team Starfire World BOINC
ID: 1068038 · Report as offensive
QSilver

Send message
Joined: 26 May 99
Posts: 232
Credit: 6,452,764
RAC: 0
United States
Message 1068040 - Posted: 18 Jan 2011, 22:17:54 UTC

Thanks for the update, Matt. And good luck!
ID: 1068040 · Report as offensive
Profile Jim_S
Avatar

Send message
Joined: 23 Feb 00
Posts: 4705
Credit: 64,560,357
RAC: 31
United States
Message 1068042 - Posted: 18 Jan 2011, 22:23:31 UTC

Thanks for the update Matt. And I'll second that good luck!!!

I Desire Peace and Justice, Jim Scott (Mod-Ret.)
ID: 1068042 · Report as offensive
Profile Todd Hebert
Volunteer tester
Avatar

Send message
Joined: 16 Jun 00
Posts: 648
Credit: 228,292,957
RAC: 0
United States
Message 1068043 - Posted: 18 Jan 2011, 22:25:21 UTC

Never been a big fan of the 3Ware cards of days past - they were bought by LSI to capture market share and are partnered with Intel so there is plenty of development budget behind them now.

It is good that there is a "spare" around that would be up to the task to take over where bruno left off.

I sense another fundrasier in our future - I still have access to another barebones server from Intel at a discounted price.

Todd
ID: 1068043 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1068046 - Posted: 18 Jan 2011, 22:44:25 UTC

Matt , Todd ;-) , thanks for the news!

ID: 1068046 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1068051 - Posted: 18 Jan 2011, 23:11:41 UTC - in response to Message 1068032.  

Thanks for the update Matt, good luck with Bruno,

Claggy
ID: 1068051 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 31015
Credit: 53,134,872
RAC: 32
United States
Message 1068056 - Posted: 18 Jan 2011, 23:29:44 UTC

Good luck on Bruno and thanks for keeping us informed.

ID: 1068056 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1068058 - Posted: 18 Jan 2011, 23:43:58 UTC

That's some trouble, good luck, hope you can get Bruno going again.
ID: 1068058 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1068065 - Posted: 19 Jan 2011, 0:01:52 UTC

By the way, no dice on bruno. I tried a bunch of things, it still won't boot - even though I can see all the drives/data when I boot from CD. And the RAID card refuses to NOT tag the RAID as degraded.

We're probably going to be down for at least another day or two. I'll post something to the front page shortly.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1068065 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1068071 - Posted: 19 Jan 2011, 0:27:07 UTC - in response to Message 1068065.  

Well. I guess that means Hello Synergy!! If it's half as good as it appears it should be able to handle the new assignment with no problems. After checking my tasks on my BOINC Manager I see I can easily handle a couple of days downtime.

Even if I don't make it until you get back up I have some new toys coming in that I can install during the down time. I'll be here ready to go whenever you get back.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1068071 · Report as offensive
Saaby900T

Send message
Joined: 24 Dec 10
Posts: 76
Credit: 4,971,171
RAC: 0
United States
Message 1068081 - Posted: 19 Jan 2011, 1:15:34 UTC

Can we Still get new tasks?
ID: 1068081 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 31015
Credit: 53,134,872
RAC: 32
United States
Message 1068095 - Posted: 19 Jan 2011, 2:14:06 UTC
Last modified: 19 Jan 2011, 2:15:10 UTC

Time for the spring cleaning. Reseat all the connectors from head to tail. I know that will take a couple hours, especially with the the connectors in the drives. Don't forget the ground strap.

It that fails, I guess Synergy is Bruno.

In any case it sounds like you have the data and that is the important thing.


Or is Bruno jealous of Synergy?
ID: 1068095 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 1068100 - Posted: 19 Jan 2011, 2:24:53 UTC

Alas, poor Bruno! I knew him, Horatio, a server of infinite jest, of most excellent fancy. He hath bore me on his back a thousand times, and now he done broke.

Hey Matt, any hints on how you are picking your thread titles these days?


ID: 1068100 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1068101 - Posted: 19 Jan 2011, 2:26:41 UTC

so...

Bruno just...

....

....Froze up? *ducks and runs*
Janice
ID: 1068101 · Report as offensive
Swibby Bear

Send message
Joined: 1 Aug 01
Posts: 246
Credit: 7,945,093
RAC: 0
United States
Message 1068110 - Posted: 19 Jan 2011, 3:01:54 UTC

If I recall, the current Bruno is the former Bambi, and the old Bruno was ditched when Oscar and Carolyn came in. Our heads are in a swirl!

Good luck, Matt
ID: 1068110 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1068115 - Posted: 19 Jan 2011, 3:22:10 UTC - in response to Message 1068065.  
Last modified: 19 Jan 2011, 3:23:46 UTC

By the way, no dice on bruno. I tried a bunch of things, it still won't boot - even though I can see all the drives/data when I boot from CD. And the RAID card refuses to NOT tag the RAID as degraded.
...
- Matt


is the boot load working?
how far into boot do you get?
ID: 1068115 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66362
Credit: 55,293,173
RAC: 49
United States
Message 1068123 - Posted: 19 Jan 2011, 3:59:51 UTC

Matt, Chin Up You'll figure out what Gremlin is bugging Ya and squash It flat in no time once Ya do figure It all out. Me I just have a learning curve with Vista Business x64 and with RealTemp 3.58, It isn't pretty, As I had to start It up with Task Scheduler instead of the Startup folder, It should start when the PC does now, I may upgrade to 7 Pro sooner than I thought, At least I have the DVD already. Good Luck.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 1068123 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1068131 - Posted: 19 Jan 2011, 4:34:00 UTC

Thanks for the heads up Matt, explains why all my uploads have been backed off 3 hours. Kind of figured it was just an extended outage on the upload server. Oh well thank goodness you have another server to throw in there.
Traveling through space at ~67,000mph!
ID: 1068131 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 4868
Credit: 85,281,665
RAC: 126
Finland
Message 1068183 - Posted: 19 Jan 2011, 7:37:04 UTC - in response to Message 1068100.  
Last modified: 19 Jan 2011, 7:37:29 UTC

Hey Matt, any hints on how you are picking your thread titles these days?


The titles seem to be names of Kate Bush songs.
ID: 1068183 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Get Out of My House (Jan 18 2011)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.