Meh (Nov 09 2009)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 946245 - Posted: 10 Nov 2009, 0:24:48 UTC Our master mysql database server (mork) crashed on Sunday. The first crash when we brought mork on line way back when was a "fluke" - the crash a few weeks ago was explainable (or so we thought) - but now we're in the realm of "grave concern" about this particular server. However, the result of each crash is just an annoying chunk of downtime - the actual data remain intact after recovery, and recovery goes along without too much ado. Maybe we have just been lucky so far. I could see a flat out crash being a bit more disastrous. Eric did the remote work of initial and post-reboot cleanup, Dan actually came up to the lab to physically power cycle the machine, which Jeff walked him through over the phone. I assumed we'd all just wait until the next day when we're all back at the lab to set things right (after all, we've have longer unexpected outages before). When I returned from prior obligations to find the projects up I was pleased by the heroic effort. Still, I quickly noticed that the splitters were in a funny state which required my intervention or else we would have immediately run out of work to send out, so I fixed all that. Anyway, we'll have to do some extra recovery tasks tomorrow during the regular outage. This will include putting a debug kernel on mork and some other crash-test stuff that may hopefully give us clues if mork decides to disappear again. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 946245 ·

DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2	Message 946295 - Posted: 10 Nov 2009, 4:00:18 UTC - in response to Message 946245. I hate flaky hardware; I can appreciate the effort involved. If the debug kernel doesn't save the errors before crashing, you could always do that trick of redirecting the console & stderr to a serial port. (Have a laptop or computer record the serial data.) Here's the quick-n-dirty HOW-TO link if you need it. http://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/configure-kernel-grub.html ID: 946295 ·

Keith T. Volunteer tester Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9	Message 946374 - Posted: 10 Nov 2009, 12:44:02 UTC Is any of the hardware in your server closet of the vintage where it could be prone to the "Capacitor Plague" http://en.wikipedia.org/wiki/Capacitor_plague? ID: 946374 ·

Keith T. Volunteer tester Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9	Message 959975 - Posted: 1 Jan 2010, 23:49:09 UTC Last modified: 2 Jan 2010, 0:02:09 UTC Happy New Year to all the staff. Thanks for working on a holiday to get the project back on line. I suspect it was Mork which crashed again today. Any news on the hardware side of things? Could a PSU or UPS be causing power spikes due to insufficant filtering, maybe some capacitors just on the limits of tolerance? [edit]changed "suffering from spikes" to "causing spikes".[/edit] ID: 959975 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.