Message boards :
Technical News :
Oh yeah.. That.. (Aug 04 2009)
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 ![]() |
Tuesday is our usual outage day, as many of you are firmly aware. Today was the usual drill, except we have two replica databases to deal with. We set the "alter table" scripts on these two systems simultaneously, prepared to laugh at how much faster mork will perform than sidious. And it was doing great, even faster than the master database (jocelyn)... until it crashed. And it was the worst kind of crash - the system simply froze, requiring a hard reset, and there was not a trace of any evidence anywhere upon reboot about what happened. So now we have the completely opposite of a warm fuzzy feeling about mork, but nevertheless even with this setback, and the ensuing innodb database recovery, it still wrapped up all its tasks around the same time as the master database, and so both master/replica are back online and serving requests. I didn't need to temporarily turn off the "show tasks" pages because we can handle them, even right after an outage. The old replica (sidious) is still chugging away on its table compression tasks, and will probably be done with those around midnight. Meanwhile the rest of the day I've been gathering data and making plots to better understand the radars that clobber our Arecibo data. Selecting thresholds is rather difficult, as it changes from file to file where the baby ends and the bathwater begins. Sigh. But we're close, and can do a rough enough job of getting most of the radar out without losing too much data. People asked about the NTPCkr pages. Oh yeah.. That.. Jeff and I were pushing on those last month, then I disappeared on vacation, and then we both were at the OSCON in San Jose, and then the new replica server finally started working so that's been occupying our time, along with scrounging data together to process. Sorry about the delays. I know we're close to publishing something. This is kind of an important addition to the web site so we want to make it kinda works before embarrassing ourselves with broken/misleading information. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 ![]() |
Thanks for the update. Claggy |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 ![]() |
Hard locks are tough creature to tame. If it happens again, someone will have to start at the hardware/bios level to figure out what can be turned off or changed for stability. Hopefully, it's just a software bug. Is the disk subsystem on Sidious that much slower than Mork? What are you going to do with Sidious once it's no longer a replica DB? |
![]() ![]() Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 ![]() |
Thanks for the Update Matt - nice work from ALL of you @ Berkeley . . . ![]() Science Status Page . . . |
![]() ![]() Send message Joined: 11 Sep 06 Posts: 3093 Credit: 2,652,287 RAC: 0 ![]() |
Matt, Its great to hear your doing a little bit with the NitPicker again. Every masterpiece takes time to perfect. But look at it this way, the NitPicker is probably the most science information SETI@home has ever added to this website, so its worth the wait to get everything perfect! Looking at the 10th Anniversary videos, this NitPicker is going to be very cool! Thanks Matt, John. ![]() |
![]() ![]() Send message Joined: 30 Jun 09 Posts: 619 Credit: 15,732 RAC: 0 ![]() |
Cheers for the update, keep up the good work! A little scary with the processable data situation... But as you've said in previous posts once you have the new systems and software in place you can start filtering out the RFI and doing the Radar Blanking on the archived data. "Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein Vextor Homepage | Vextor Blog |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6533 Credit: 196,805,888 RAC: 57 ![]() ![]() |
Sounds like this quirky compaq server I have. Runs windows server 2003 just fine. Unless it is SP2. Then I get random reboots w/o any clue as to what is going on. After months of trying to trace down any driver, service, hardware issue. I just said the hell with it and have left it running SP1. The odd bit is that it was running for some time on SP2 w/o any errors. Just one day it said bloop. and was a pain in my side ever since. SETI@home classic workunits: 93,865 CPU time: 863,447 hours |
Cosmic_Ocean ![]() Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 ![]() ![]() |
Sounds like this quirky compaq server I have. Runs windows server 2003 just fine. Unless it is SP2. Then I get random reboots w/o any clue as to what is going on. After months of trying to trace down any driver, service, hardware issue. I just said the hell with it and have left it running SP1. The odd bit is that it was running for some time on SP2 w/o any errors. Just one day it said bloop. and was a pain in my side ever since. Could have been the 'automatic reboot' option when there is a Blue Screen of Death. A lot of people confuse "random reboot" with "there was a BSOD, but I didn't get to see it in time." Not trying to prove anyone wrong here, but that is something to look into, and I would say out of my experience, 95% of the time, a BSOD is caused by a driver. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
©2023 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.