Message boards :
News :
Major Power Outage at SSL
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next
Author | Message |
---|---|
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66332 Credit: 55,293,173 RAC: 49 |
Great job getting everything back up guys! So say We all? Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Am I reading this correctly that when the power went out the servers all No, they have UPS's so the servers should gracefully shutdown, Claggy |
Jeffrey Petro Send message Joined: 24 Apr 12 Posts: 2 Credit: 41,248 RAC: 0 |
Claggy, That's fair...I guess you and I just read certain things differently... like when I read... A mixture of "all hands on deck" and incredible luck that nothing really got corrupted/fried when the power suddenly disappeared. There are some RAID resyncs happening at the moment, but looking good thus far... for example, I do not get a warm fuzzy feeling that servers shut down 'gracefully'...lol |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
A UPS should allow a system to make a regular shutdown.I had a power failure just now and my system went down not gracefully, UPS notwithstanding. So I restarted only the router, not the system, when power restarted, since I know that power outages are repetitive here. So when the power failed another time the router stayed alive. Evidently the battery in my UPS is not capable of securing my system, only the router. Anyway I restarted the system which made a full filesystem check (Linux) and is working again. Tollio |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
They do have UPSes and remote control power switches (or those are part of a UPS or two.. it allows them to SSH in and power cycle a machine). I remember reading a few years ago that all the UPSes were basically just power strips at one point as the battery capacity in them was basically zero. Even with a large 3000VA UPS, having 4-6 servers on it, some of which have 30+ HDDs, you're talking 5 minutes on brand new batteries for a graceful shutdown, and you can't just tell them all to shut down all at once. Some have to go down before others. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
W5DMG - Dave Send message Joined: 19 May 99 Posts: 155 Credit: 33,162,251 RAC: 0 |
Tis nice to see this back online, thanks for all the hard work guys. Also thanks to all at Lunatics at keeping us informed. My uploads are working, no reporting as of yet. But I know everything will be back to normal shortly. |
Terry Byatt (R.T.Fishall) Send message Joined: 4 Jan 00 Posts: 19 Credit: 2,262,059 RAC: 2 |
Thanks guys for all the work in getting it back up and running again. However, it is a shame the Berkley Uni web site could not find space in it's news to say what had happened to seti@home! Does not boad well for inter-stellar contact if we can't get the communications right here does it? |
John Clark Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0 |
Good to see you all back online, and, so far nothing corrupted. As soon as I get home next week, I hope, I can get my rigs rebooted. It's good to be back amongst friends and colleagues |
Dimly Lit Lightbulb 😀 Send message Joined: 30 Aug 08 Posts: 15399 Credit: 7,423,413 RAC: 1 |
To get it all up and back on line from scratch in two hours was a major feat of teamwork, I take my hat off to you and the lads, well done! When I got the GPUUG newsletter about the powerline short my first thought was: uh-oh. And from power restored to Seti being online in two hours? Wow. I'll be keeping my fingers crossed for the resyncs. Well, until my fingers hurt at least :) Member of the People Encouraging Niceness In Society club. |
Dave Send message Joined: 29 Mar 02 Posts: 778 Credit: 25,001,396 RAC: 0 |
Thanks guys for all the work in getting it back up and running again. However, it is a shame the Berkley Uni web site could not find space in it's news to say what had happened to seti@home! They did find space - it was here: http://ucbsystems.org/category/active/unscheduled-outage/ |
Jimmy Gondek Send message Joined: 1 Oct 06 Posts: 20 Credit: 715,874 RAC: 0 |
...being a retired telecom I know you folks had you hands full getting things back up and running! Kudos to everyone for a fine, fine job!... :) |
TPCBF Send message Joined: 18 May 99 Posts: 54 Credit: 4,594,980 RAC: 0 |
Well, after some initial problems uploading finished WUs (ok, only 5 of them), they are now gone and reported too, just sitting in PV jail now, as usual. Got one new one as well, so it looks from here are if things are back to normal... ;-) Ralf |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
They do have UPSes and remote control power switches (or those are part of a UPS or two.. it allows them to SSH in and power cycle a machine). I remember reading a few years ago that all the UPSes were basically just power strips at one point as the battery capacity in them was basically zero.I know that this was kind of a freak occurrence, but it does appear to show a weakness in the systems. Would it possibly make sense to direct some of our fund raising contributions towards an even more robust UPS setup? Most UPS's of this size allow you to daisy chain multiple batteries together to allow more time to shut things down before they run out of juice. This makes sense to me, what do you guys there in the thick of it think? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Attn Matt There are thousands of WUs that have "Too any errors (may have bug) WU cancelled" from late on the 15th, early on the 16th of May as a result of download errors, but of those that haven't been cancelled many have been downloaded OK today. We suspect it's a result of the power failure- the Scheduler & download servers were still up, but the WU storage wasn't (or at least wasn't accessable). Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Most UPS's of this size allow you to daisy chain multiple batteries together to allow more time to shut things down before they run out of juice. This makes sense to me, what do you guys there in the thick of it think? It would be good if they had enough UPS capacity to keep all systems up (inc routers etc) and enable a controlled shutdown. It would probably require the purchase of more UPSs, as well as batteries. In my case i just replaced my 7AH UPS batteries with a couple of cheap car batteries. Run time at full load went from a few minutes to about 6 hours. Grant Darwin NT |
mg_man1 Send message Joined: 3 Apr 99 Posts: 5 Credit: 41,714,879 RAC: 0 |
im still tring to get all the results uploaded to you guys and i need new work as well as my pc finished all that was on my pc. |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Lol kind of bush, but very effective! :D I have a couple APC 1100 UPS's that I have been looking at replacing batteries in, for a hundred + a crack, and you could get a decent deep cycle battery for that kind of ching. If it wasn't in my house, like in my workshop or something, I might just consider it. But back on topic, I doubt they'd do that, even being quite cost effective. Maybe Matt will chime in and let us know if he feels the upgrade is a good use of funds at this time. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
[very long UPS-related thoughts] You can go with larger AH batteries, but more modern units have a limit to how much they'll charge, and for how long they'll charge. For example, the APC 1400 that I'm using had two 6AH batteries in it, and I replaced them with 9AH batteries that I had left over from ordering the wrong kit for two Tripp-Lite 1400s a few years ago. 50% more run time. I have heard of people using car batteries, and it does work most of the time (some of those batteries are 60-150AH), but the problem with those is unless they are sealed/maintenance free, you are very very strongly advised against using them indoors, due to the acid fumes during discharge and charge cycles. Also, you can't keep rack-mount stuff neat and tidy if you are using batteries that won't fit within the chassis of the particular unit. Regarding being able to remotely shut down in that situation.. I don't think they could anyway. Network connectivity went down as well, not just the servers themselves. Network went down and there was no way to tell the servers to shut down unless someone was already there in the lab when it happened. However, most OSes have the ability to plug into a UPS and monitor the condition of it. For example, Windows 7 sees mine just fine and I have it set to shut down when the battery reaches 50%. That gives me about 10 minutes from the power going out to the OS doing a graceful shutdown. For servers, I would set them to somewhere around 30 seconds from when AC is lost to beginning the shutdown routine. Only problem with that is if you have 5 servers hooked up to one unit, they can't all plug into the status port. You could, though, hook the server up that is last to go down and set up a script on it that when AC is lost, ssh to the other machines and tell them 'init 0' and whatever else you have to do ('umount /all/remote/mounts', etc). Even then, there's still two things to find out. How long does that particular unit last when all the servers are on and consuming power, and how long does it take them to do a graceful shutdown? As you start shutting them down, the run time will increase, but then you need to find out by how much. [/very long UPS-related thoughts] Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
The basic premise of a commercial UPS system is to hold the equipment online long enough for backup generators to come online. They are surge/spike/brownout resistant as well. In smaller or home use, "long enough to turn things off" is the primary selling point. Side note: I see the AP data is showing pretty much non-existant, is this a residual problem or just a side effect of Jocelyn apparently taking some time off? Janice |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
I think it's because the AP data is shown is still for the little bit of v505 still in the wild. It does not reflect the v6 info yet. "Time is simply the mechanism that keeps everything from happening all at once." |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.