Happy Lupercalia! (Feb 14 2011) |
![]() |
| log in |
Message boards : Technical News : Happy Lupercalia! (Feb 14 2011)
1 · 2 · Next
| Author | Message |
|---|---|
|
Slow, steady progress... We're hoping to have everything copied from gowron onto thumper by tomorrow. Yeah, I know it's going slowly, but there's lots of bottlenecks (degraded RAID, NFS, tons of small files as opposed to a few big ones). After the usual outage we might actually have thumper ready to be the temporary workunit storage server so we can get back to business while doing the necessary upgrades on gowron (which make take as much as a week, unobtrusively running in the background). | |
| ID: 1077328 · | |
|
Thanks for the update there Matt. | |
| ID: 1077334 · | |
It would be nice to see a list of them and what they do. For security reasons, I tend to only name and define the systems that are already public facing, or otherwise already known. - Matt ____________ -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude | |
| ID: 1077349 · | |
|
Thank You For the Update!!!!!! | |
| ID: 1077351 · | |
|
Hi Matt, and THANKS for the update. Glad to hear that things are looking up, and that things might be up and running by tomorrow. As usual, I'll be here waiting. I might hold off a day or two until the, "TRAFFIC" dies down a little bit though. I have a feeling that the servers will be swamped for a while. | |
| ID: 1077354 · | |
Thank You For the Update!!!!!! GOOD IDEA ! Later on, when and IF you have time, maybe you could tell us what all is needed, and the cost of what is needed. Sure wouldn't hurt to let us know about it. | |
| ID: 1077356 · | |
If you just count the unix-based machines, there are currently 26 systems all told. Combining all the stuff inside, we have roughly 100 CPUs, 500GB RAM, and 150 TB raw storage. There are also several appliances (routers, switches, UPSes, kvms, remote controlled power strips, etc. etc.). Wow, that's a fair bit of equipment. Thanks for all the work we DON'T hear about that you do. | |
| ID: 1077363 · | |
|
Matt, thanks for the news! | |
| ID: 1077405 · | |
|
Maybe you should implement the slow ramp-up that was the rule back during the three-day outage era... | |
| ID: 1077406 · | |
|
Just an FYI. I was up in Oakland during the last big one during the World Series. Anyway, all our equipment went down even though we had UPS's on everything. What caused it? Seems that the batteries in the UPS's are not eternal and have to be replaced like every 3 - 5 years! Could there be any relation to what is causing your issues? | |
| ID: 1077407 · | |
|
Thank you for all the hard work. I still have some W.U.'s left to do, so will wait. | |
| ID: 1077411 · | |
Usually in these threads I'm griping about public facing servers, or ones causing the BOINC back end to jam up for one reason or another. I rarely mention the mundane, day-to-day, garden variety IT stuff. Well hopefully Matt sometime in the immediate future we can hear more about your mundane, day-to-day, operations and less about the major issues. Growing pains are difficult and when mixing it with older hardware like you are dealing with can be.....less than perfect, to put it mildly. Best of luck gentlemen and hopefully the IT gods will bless you and the gremlins let you stomp them for a change. ____________ Traveling through space at ~67,000mph! | |
| ID: 1077436 · | |
|
Thanks for the update Matt. Trust you had a good gig the other night to re-align the neurons. Always good and appreciative thoughts toward you and your colleagues. Cheers mate. | |
| ID: 1077450 · | |
Just an FYI. I was up in Oakland during the last big one during the World Series. Anyway, all our equipment went down even though we had UPS's on everything. What caused it? Seems that the batteries in the UPS's are not eternal and have to be replaced like every 3 - 5 years! Could there be any relation to what is causing your issues? Are you talking the Earthquake series in '89 or last year? The routinely spaced hits at the same wall-clock time two weeks apart that Matt is talking about don't seem to be battery-related to me... (assuming, of course, that Matt has checked the possibility that someone [who comes in alternate weeks...] is routinely turning off the circuit that the UPS in question is plugged into...) ____________ . | |
| ID: 1077758 · | |
Just an FYI. I was up in Oakland during the last big one during the World Series. Anyway, all our equipment went down even though we had UPS's on everything. What caused it? Seems that the batteries in the UPS's are not eternal and have to be replaced like every 3 - 5 years! Could there be any relation to what is causing your issues? The cleaning crew unplugging something to crank up their vacuum cleaners? ____________ BOINC WIKI | |
| ID: 1077769 · | |
The routinely spaced hits at the same wall-clock time two weeks apart that Matt is talking about don't seem to be battery-related to me... (assuming, of course, that Matt has checked the possibility that someone [who comes in alternate weeks...] is routinely turning off the circuit that the UPS in question is plugged into...) In the school where I work it used to be standard procedure to turn off the power to all the computers in the computer lab every night, via 4 master switches in the lab director's office (now they keep them on 24/7). One night, I saw the custodian turn on the power so he could plug in the vacuum... which of course turned on all the computers. When he was done, he turned them all off again without doing a proper shutdown of Windows. At the time, they were, probably, Pentium 3s running Windows NT. It didn't seem to do them any harm, though (except maybe an unquantifiable shortening of their lives, but they were retired long before that became an issue). David ____________ David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. | |
| ID: 1077921 · | |
Regularly, at the same time, every other weekend? Doesn't seem likely... The SSL doesn't have carpeted floors, so substitute "floor polisher(s)" for "vacuum cleaner"... ;-) ____________ . | |
| ID: 1077952 · | |
|
How tightly timed are the failures? | |
| ID: 1078087 · | |
|
I think the timing of the reboots is kinda suspicious, and those kitties of Kittieman look kinda suspicious hmmm... It's the cats they must be aliens in disguise and Kittieman is the overlord they are holding the data back cuz they know the next wow signal is ready to be crunched.... | |
| ID: 1078300 · | |
Message boards : Technical News : Happy Lupercalia! (Feb 14 2011)
| Copyright © 2013 University of California |