Power (May 22 2012) |
![]() |
| log in |
Message boards : Technical News : Power (May 22 2012)
1 · 2 · Next
| Author | Message |
|---|---|
|
During the normal weekly outage last week I took the opportunity to convert georgem not only into the workunit storage server, but a single workunit download server (as opposed to using vader and anakin, which are mounting georgem's disks over the network). This was a bust. I believe I had apache cranked way too high and the kernel crashed. Before it completely went down for the count there were some NFS inconsistencies causing corrupt workunits to be generated on georgem, which only happened for a short time and we didn't notice until they were already sent out. | |
| ID: 1235167 · | |
|
Thanks for all your efforts getting everything back up, | |
| ID: 1235172 · | |
|
Thanks for the update Matt. I hope the server switching is figured out and works the next time you try. | |
| ID: 1235173 · | |
|
You can go back to feeling Heroic Matt. | |
| ID: 1235239 · | |
|
Thanks for everything Matt. | |
| ID: 1235252 · | |
|
I don't suppose that the electricians pulled a gigabit ethernet cable up the conduit for Seti, since they were mucking around there anyway? I would have been happy to run over to Fry's to get the wire for them! (Well, maybe not, since I live in Pennsylvania.) | |
| ID: 1235299 · | |
|
Thanks not only for briefing us on the power outage, but also for the explanation of the completely separate and unrelated to the outage batch of bad WUs. | |
| ID: 1235425 · | |
|
One thing you might want to consider to protect the servers in such a situation where power goes out and then comes on momentarily which usually does the damage. In laboratories where this had occured in the past I have recommended installing a Self Latching Relay on the main power line to the instruments. The relay has a Set and Reset switch on it, any electrician can make this up and only requires a contactor and a couple of switches. When the power goes off, the contactor drops out, if power comes back on the contactor stays dropped out. This way a person has to press the switch to energise the contactor and put power back onto the circuits. A person decides when it is appropriate to connect power back to the instruments / servers. Takes all the worry out of the situation if power failure occurs when the lab is not occupied. I had another customer take it 1 step further and included a temperature control so if the aircon fails the power disconnects before everything overheats. That's my 2 cents worth, keep up the good work.......Cheers from Aus | |
| ID: 1235668 · | |
One thing you might want to consider to protect the servers in such a situation where power goes out and then comes on momentarily which usually does the damage. In laboratories where this had occured in the past I have recommended installing a Self Latching Relay on the main power line to the instruments. The relay has a Set and Reset switch on it, any electrician can make this up and only requires a contactor and a couple of switches. When the power goes off, the contactor drops out, if power comes back on the contactor stays dropped out. This way a person has to press the switch to energise the contactor and put power back onto the circuits. A person decides when it is appropriate to connect power back to the instruments / servers. Takes all the worry out of the situation if power failure occurs when the lab is not occupied. I had another customer take it 1 step further and included a temperature control so if the aircon fails the power disconnects before everything overheats. That's my 2 cents worth, keep up the good work.......Cheers from Aus Sounds like a good idea but you don't want a short interruption that the UPS covers to mean someone has to drive in from 30 miles away to press a reset button. What you likely want is the sense on the UPS side so they don't go down for short drops as the UPS should handle that. Then another sense which will auto reconnect when the UPS batteries reach say 90% of charge indicating that the power has been on for a while and hence hopefully stable. Of course connecting the UPS to the systems to do a graceful shutdown before they run out of battery is another necessary step. Unfortunately I don't think in that situation that you can get a remote power on when juice is available. Something someone needs to work on for the kernel/bios. ____________ | |
| ID: 1235675 · | |
One thing you might want to consider to protect the servers in such a situation where power goes out and then comes on momentarily which usually does the damage. In laboratories where this had occured in the past I have recommended installing a Self Latching Relay on the main power line to the instruments. The relay has a Set and Reset switch on it, any electrician can make this up and only requires a contactor and a couple of switches. When the power goes off, the contactor drops out, if power comes back on the contactor stays dropped out. This way a person has to press the switch to energise the contactor and put power back onto the circuits. A person decides when it is appropriate to connect power back to the instruments / servers. Takes all the worry out of the situation if power failure occurs when the lab is not occupied. I had another customer take it 1 step further and included a temperature control so if the aircon fails the power disconnects before everything overheats. That's my 2 cents worth, keep up the good work.......Cheers from Aus Lots of USB controlled switches available these days. Login & use switch remotely. ____________ Grant Darwin NT. | |
| ID: 1235710 · | |
|
It was the comment | |
| ID: 1236252 · | |
|
For the various UPSes: | |
| ID: 1236515 · | |
|
^ that was along the same lines of my suggestion over in the news thread regarding the power failure. Someone asked about UPSes and I explained it fairly well, I believe.. (seen here).. | |
| ID: 1236825 · | |
|
Ahhhh... But does Matt ever get around to reading the replies?... | |
| ID: 1236837 · | |
|
As far as a hardwired latching power lockout upon fail...... | |
| ID: 1236839 · | |
|
Having trouble with uploads, regardless of what the "system Status" says about the upload server being "online": all my uploads are quitting after about 40 seconds with an "HTTP error" - both production and Beta... | |
| ID: 1237389 · | |
|
Yes, I noticed it about 4 hours ago, I have several machines that are not uploading. I'm guessing it's a problem with the upload servers and of course since it is the weekend it is not noticed yet. lol Figure I'll just keep crunching and wait | |
| ID: 1237398 · | |
Yes, I noticed it about 4 hours ago, I have several machines that are not uploading. I'm guessing it's a problem with the upload servers and of course since it is the weekend it is not noticed yet. lol Figure I'll just keep crunching and wait This is being discussed over in Number Crunching. ____________ | |
| ID: 1237450 · | |
|
It's been kicked, Thanks whoever did this on a weekend. | |
| ID: 1237456 · | |
Ahhhh... But does Matt ever get around to reading the replies?... Knowing Matts total dedication to this project over many years, I am quite sure that he does, and I am also sure that he is grateful for any suggestions that are made. | |
| ID: 1237498 · | |
Message boards : Technical News : Power (May 22 2012)
| Copyright © 2013 University of California |