Message boards :
Technical News :
Get Out of My House (Jan 18 2011)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Nothing like coming back from a long holiday weekend and having one of your main production servers croak as soon as you arrive. It's a sunny day outside and I was stuck wearing my fleece jacket and fingerless gloves inside a well air-conditioned server closet. So what happened? Not sure exactly, but bruno (the upload server, as well as the main boincadm administrative server) was all hung up as soon as we started the normal Tuesday outage. I had to reboot it, and that was that - it wouldn't come up properly again. It seems to be a multiple-part problem. There was a disk failure, and the 3ware card in this system has always given us trouble. What kind of trouble? Well, if you reboot the system (without a full power cycle) random drives go missing. That's kind of a problem, no? I don't think this is a single broken card - a labmate has similar problems with the same model in his system (I forget the model #, but it's 24-channels). Anyway, the big RAID10 holding all the results was tagged as degraded and rebuilding now. That's fine, except the OS (which is on separate partitions and not under the jurisdiction of the 3ware card) isn't booting either. Jeez! The good news is I can boot of a Fedora live CD and see both the root and upload storage drives, so there's no data loss. It just won't boot! The other good news is that, if we need it, we have a backup system already: synergy! It might be getting pulled into prime time sooner than expected. It doesn't have nearly the large number of disk spindles as on bruno, but this might not be an issue - there's still plenty of disk space on it. And a lot of memory for potential file system caching. It's still undecided if we're going to make synergy the new bruno, but I'm at least copying everything there now just to be safe. I might still be able to get bruno up this afternoon, but if not, looks like we're down for the evening (it'll take that long to copy everything over to synergy). - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Ain't it grand knowing you have a spare, brand new computer just waiting to show what it can do? :-) Good luck with Bruno but if you can't get it up we'll see you tomorrow! PROUD MEMBER OF Team Starfire World BOINC |
QSilver Send message Joined: 26 May 99 Posts: 232 Credit: 6,452,764 RAC: 0 |
Thanks for the update, Matt. And good luck! |
Jim_S Send message Joined: 23 Feb 00 Posts: 4705 Credit: 64,560,357 RAC: 31 |
Thanks for the update Matt. And I'll second that good luck!!! I Desire Peace and Justice, Jim Scott (Mod-Ret.) |
Todd Hebert Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0 |
Never been a big fan of the 3Ware cards of days past - they were bought by LSI to capture market share and are partnered with Intel so there is plenty of development budget behind them now. It is good that there is a "spare" around that would be up to the task to take over where bruno left off. I sense another fundrasier in our future - I still have access to another barebones server from Intel at a discounted price. Todd |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Matt , Todd ;-) , thanks for the news! |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, good luck with Bruno, Claggy |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30936 Credit: 53,134,872 RAC: 32 |
Good luck on Bruno and thanks for keeping us informed. |
Dimly Lit Lightbulb 😀 Send message Joined: 30 Aug 08 Posts: 15399 Credit: 7,423,413 RAC: 1 |
That's some trouble, good luck, hope you can get Bruno going again. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
By the way, no dice on bruno. I tried a bunch of things, it still won't boot - even though I can see all the drives/data when I boot from CD. And the RAID card refuses to NOT tag the RAID as degraded. We're probably going to be down for at least another day or two. I'll post something to the front page shortly. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Well. I guess that means Hello Synergy!! If it's half as good as it appears it should be able to handle the new assignment with no problems. After checking my tasks on my BOINC Manager I see I can easily handle a couple of days downtime. Even if I don't make it until you get back up I have some new toys coming in that I can install during the down time. I'll be here ready to go whenever you get back. PROUD MEMBER OF Team Starfire World BOINC |
Saaby900T Send message Joined: 24 Dec 10 Posts: 76 Credit: 4,971,171 RAC: 0 |
Can we Still get new tasks? |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30936 Credit: 53,134,872 RAC: 32 |
Time for the spring cleaning. Reseat all the connectors from head to tail. I know that will take a couple hours, especially with the the connectors in the drives. Don't forget the ground strap. It that fails, I guess Synergy is Bruno. In any case it sounds like you have the data and that is the important thing. Or is Bruno jealous of Synergy? |
Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0 |
Alas, poor Bruno! I knew him, Horatio, a server of infinite jest, of most excellent fancy. He hath bore me on his back a thousand times, and now he done broke. Hey Matt, any hints on how you are picking your thread titles these days? |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
so... Bruno just... .... ....Froze up? *ducks and runs* Janice |
Swibby Bear Send message Joined: 1 Aug 01 Posts: 246 Credit: 7,945,093 RAC: 0 |
If I recall, the current Bruno is the former Bambi, and the old Bruno was ditched when Oscar and Carolyn came in. Our heads are in a swirl! Good luck, Matt |
RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0 |
By the way, no dice on bruno. I tried a bunch of things, it still won't boot - even though I can see all the drives/data when I boot from CD. And the RAID card refuses to NOT tag the RAID as degraded. is the boot load working? how far into boot do you get? |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66219 Credit: 55,293,173 RAC: 49 |
Matt, Chin Up You'll figure out what Gremlin is bugging Ya and squash It flat in no time once Ya do figure It all out. Me I just have a learning curve with Vista Business x64 and with RealTemp 3.58, It isn't pretty, As I had to start It up with Task Scheduler instead of the Startup folder, It should start when the PC does now, I may upgrade to 7 Pro sooner than I thought, At least I have the DVD already. Good Luck. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
Thanks for the heads up Matt, explains why all my uploads have been backed off 3 hours. Kind of figured it was just an extended outage on the upload server. Oh well thank goodness you have another server to throw in there. Traveling through space at ~67,000mph! |
Harri Liljeroos Send message Joined: 29 May 99 Posts: 4673 Credit: 85,281,665 RAC: 126 |
Hey Matt, any hints on how you are picking your thread titles these days? The titles seem to be names of Kate Bush songs. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.