Message boards :
Technical News :
Meh (Nov 09 2009)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Our master mysql database server (mork) crashed on Sunday. The first crash when we brought mork on line way back when was a "fluke" - the crash a few weeks ago was explainable (or so we thought) - but now we're in the realm of "grave concern" about this particular server. However, the result of each crash is just an annoying chunk of downtime - the actual data remain intact after recovery, and recovery goes along without too much ado. Maybe we have just been lucky so far. I could see a flat out crash being a bit more disastrous. Eric did the remote work of initial and post-reboot cleanup, Dan actually came up to the lab to physically power cycle the machine, which Jeff walked him through over the phone. I assumed we'd all just wait until the next day when we're all back at the lab to set things right (after all, we've have longer unexpected outages before). When I returned from prior obligations to find the projects up I was pleased by the heroic effort. Still, I quickly noticed that the splitters were in a funny state which required my intervention or else we would have immediately run out of work to send out, so I fixed all that. Anyway, we'll have to do some extra recovery tasks tomorrow during the regular outage. This will include putting a debug kernel on mork and some other crash-test stuff that may hopefully give us clues if mork decides to disappear again. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
I hate flaky hardware; I can appreciate the effort involved. If the debug kernel doesn't save the errors before crashing, you could always do that trick of redirecting the console & stderr to a serial port. (Have a laptop or computer record the serial data.) Here's the quick-n-dirty HOW-TO link if you need it. http://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/configure-kernel-grub.html |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
Is any of the hardware in your server closet of the vintage where it could be prone to the "Capacitor Plague" http://en.wikipedia.org/wiki/Capacitor_plague? |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
Happy New Year to all the staff. Thanks for working on a holiday to get the project back on line. I suspect it was Mork which crashed again today. Any news on the hardware side of things? Could a PSU or UPS be causing power spikes due to insufficant filtering, maybe some capacitors just on the limits of tolerance? [edit]changed "suffering from spikes" to "causing spikes".[/edit] |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.