Message boards :
Technical News :
Power - the Reunion Tour (Jun 11 2012)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Kind of a bumpy weekend. So we moved that database (which handles the seti.berkeley.edu website) from Dan's new but oddly crashy desktop on my new desktop. Then over the weekend MY new desktop started crashing at random. You'd think this is now clearly related to the database, but Dan's desktop continued to crash after moving the mysql database off of it. And upon further inspection both systems sometimes crash before the OS is even loaded. So this looks like a hardware problem after all. Funny how both of these new systems are failing in the same manner. We think it has to do with the power outages from a couple weeks ago sending some jolts into these perhaps more sensitive systems. But speaking of outages, completely separate from those previous power issues which have since been fixed, there was a brand new problem affecting just this building (and all the projects within it, including SETI@home/BOINC). This one was worse, starting in the middle of the night, and by the time anybody could do anything power was up and down several times, and some outlets delivering half power, etc. The repairs were much faster, and we were stable again around noon, but upon turning everything back on we found we completely lost thinman, the main web server. Totally dead. However, quite luckily, we happened to have a spare old frankenstein machine kicking around, and I was able to do a "brain transplant" i.e. swap the drives from thinman to this other machine. Now this other machine thinks it is thinman and is working quite well as a web server. Dodged a major bullet there. I also happened to have my old desktop nearby, so I'm using that as I diagnose the new crashy one. Not sure who is responsible for all these damages and lost time, but it definitely shouldn't be us. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, Claggy |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30987 Credit: 53,134,872 RAC: 32 |
Thanks for the update, and let us know if you need a petition drive to make the powers that be held responsible for the damage. Actually wouldn't surprise me if the first outage stressed something the the building and it went. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Now would be a great time to get the funds for those whole-closet UPS devices. How much could that possibly cost the school? ;) |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Now would be a great time to get the funds for those whole-closet UPS devices. How much could that possibly cost the school? ;) Or at the very least, some line conditioners, which are usually built-in to UPS units. Line conditioners will clean up noisy power, and also most of the time handles very strong surges just fine. May help with keeping weird power scenarios from taking out machines.. or dirty/noisy power may be what is causing those strange and random crashes. One of my long-since retired crunchers continues to do other things for me around the house and it was acting weird and would randomly crash. Sometimes it would be weeks before it did it, other times it would be repeatedly for an hour or so. I ran memtest on it and discovered the RAM needed more voltage. Instead of the 2.6 that it wanted, I already had the board set for 2.8, so I had to crank it to 2.9, and that fixed it. Might just be a power issue, either internal or external. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
... Dan's new but oddly crashy desktop on my new desktop. Then over the weekend MY new desktop started crashing at random. You'd think this is now clearly related to the database, but Dan's desktop continued to crash after moving the mysql database off of it. And upon further inspection both systems sometimes crash before the OS is even loaded. One relatively newer possibility, in addition to the usual checks, that's quick & easy to eliminate. There's been a general trend evolving lately, to supply XMP profile (or other high frequency with tight latency) memory defaulting to 'normal undervolts'. After a typical 14 hour or so burnin period the crashy symptoms appear, & gradually worsen over time. Heavy RAM usage patterns in particular then throw either controller or RAM modules over the edge, while memtests often show clear. The quick check is to make sure the DIMM voltage matches the XMP profile spec, and that VID (memory controller in the CPU) is set to about 70% of that (which is for impedance matching purposes, maximising signal integrity & stopping the memory controller sinking excessive current). Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
I have wondered this out loud before, but doesn't the campus have some kind of comprehensive insurance coverage that might cover the loss of equipment in cases like this? I find it hard to believe that lab and computer equipment might not be covered. Even most basic homeowner's insurance covers this kind of thing for example, in the case of a lightning strike. It might be worthwhile to ask some serious questions of the proper authorities..... Just sayin'. "Time is simply the mechanism that keeps everything from happening all at once." |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13847 Credit: 208,696,464 RAC: 304 |
No UPS will last for a 5 or 6 hour outage, They can, but it takes big batteries. The main use for UPSs is protection from surges, brownouts & power falures. If the failure is long enough, then it allows the hardware to be shut down normally. Larger UPS units are designed to keep systems up till such time as a backup generator can come online, and then keep things up when that shuts down & the system switches back to mains power. Grant Darwin NT |
Cheopis Send message Joined: 17 Sep 00 Posts: 156 Credit: 18,451,329 RAC: 0 |
I do not think it is reasonable to try to get a UPS system that will do more than protect the machines, and allow them enough time to gracefully power off after a short timeframe running with no power. Maybe 10 minutes. Power conditioning and voltage regulation, if they are not already a part of the lab's UPS system, should be considered. Every time you have an outage like this one (especially in an older building), some other part of the electrical system gets stressed. You might have cascading problems every few weeks for the next year before everything is all ironed out. |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
We've floated the idea of power stabilizing hardware to the lab, I'll let anyone know if they decide they'd like some of the same. It's heartbreaking that our two new workstations got crippled but given the past few weeks it's understanding. We'll replace the damaged components ASAP once Matt et al figure out the issues. Executive Director GPU Users Group Inc. - brad@gpuug.org |
edjcox Send message Joined: 20 May 99 Posts: 96 Credit: 5,878,353 RAC: 0 |
Even some small UPS equipment for the PC's would help keep the power gremlins from disturbing circuitry and such and shortening lifespan. I have all my gear at home on UPS for graceful shutdown and power conditioning at all times... Find out who your campuis engineer is and raise hell ... Let people know they are destroying equipment with their shenanigans. This should bye upchanneled as mush as possible to let management know this is costing them money, time, equipment... Never engage stupid people at their level, they then have the home court advantage..... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.