Oh yeah.. That.. (Aug 04 2009)

Message boards : Technical News : Oh yeah.. That.. (Aug 04 2009)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 923614 - Posted: 4 Aug 2009, 22:52:27 UTC

Tuesday is our usual outage day, as many of you are firmly aware. Today was the usual drill, except we have two replica databases to deal with. We set the "alter table" scripts on these two systems simultaneously, prepared to laugh at how much faster mork will perform than sidious.

And it was doing great, even faster than the master database (jocelyn)... until it crashed. And it was the worst kind of crash - the system simply froze, requiring a hard reset, and there was not a trace of any evidence anywhere upon reboot about what happened. So now we have the completely opposite of a warm fuzzy feeling about mork, but nevertheless even with this setback, and the ensuing innodb database recovery, it still wrapped up all its tasks around the same time as the master database, and so both master/replica are back online and serving requests. I didn't need to temporarily turn off the "show tasks" pages because we can handle them, even right after an outage. The old replica (sidious) is still chugging away on its table compression tasks, and will probably be done with those around midnight.

Meanwhile the rest of the day I've been gathering data and making plots to better understand the radars that clobber our Arecibo data. Selecting thresholds is rather difficult, as it changes from file to file where the baby ends and the bathwater begins. Sigh. But we're close, and can do a rough enough job of getting most of the radar out without losing too much data.

People asked about the NTPCkr pages. Oh yeah.. That.. Jeff and I were pushing on those last month, then I disappeared on vacation, and then we both were at the OSCON in San Jose, and then the new replica server finally started working so that's been occupying our time, along with scrounging data together to process. Sorry about the delays. I know we're close to publishing something. This is kind of an important addition to the web site so we want to make it kinda works before embarrassing ourselves with broken/misleading information.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 923614 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 923620 - Posted: 4 Aug 2009, 22:57:32 UTC - in response to Message 923614.  

Thanks for the update.

Claggy
ID: 923620 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 923637 - Posted: 4 Aug 2009, 23:45:00 UTC - in response to Message 923614.  

Hard locks are tough creature to tame. If it happens again, someone will have to start at the hardware/bios level to figure out what can be turned off or changed for stability. Hopefully, it's just a software bug.

Is the disk subsystem on Sidious that much slower than Mork? What are you going to do with Sidious once it's no longer a replica DB?
ID: 923637 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 923650 - Posted: 5 Aug 2009, 0:23:02 UTC

Thanks for the Update Matt - nice work from ALL of you @ Berkeley . . .
BOINC Wiki . . .

Science Status Page . . .
ID: 923650 · Report as offensive
Profile Johnney Guinness
Volunteer tester
Avatar

Send message
Joined: 11 Sep 06
Posts: 3093
Credit: 2,652,287
RAC: 0
Ireland
Message 923718 - Posted: 5 Aug 2009, 8:31:22 UTC

Matt,
Its great to hear your doing a little bit with the NitPicker again. Every masterpiece takes time to perfect. But look at it this way, the NitPicker is probably the most science information SETI@home has ever added to this website, so its worth the wait to get everything perfect!

Looking at the 10th Anniversary videos, this NitPicker is going to be very cool!

Thanks Matt,
John.
ID: 923718 · Report as offensive
Profile Kai
Volunteer tester
Avatar

Send message
Joined: 30 Jun 09
Posts: 619
Credit: 15,732
RAC: 0
United Kingdom
Message 923792 - Posted: 5 Aug 2009, 17:25:43 UTC

Cheers for the update, keep up the good work!

A little scary with the processable data situation... But as you've said in previous posts once you have the new systems and software in place you can start filtering out the RFI and doing the Radar Blanking on the archived data.
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein
Vextor Homepage | Vextor Blog
ID: 923792 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 923826 - Posted: 5 Aug 2009, 19:50:47 UTC

Sounds like this quirky compaq server I have. Runs windows server 2003 just fine. Unless it is SP2. Then I get random reboots w/o any clue as to what is going on. After months of trying to trace down any driver, service, hardware issue. I just said the hell with it and have left it running SP1. The odd bit is that it was running for some time on SP2 w/o any errors. Just one day it said bloop. and was a pain in my side ever since.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 923826 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 923873 - Posted: 5 Aug 2009, 21:40:23 UTC - in response to Message 923826.  

Sounds like this quirky compaq server I have. Runs windows server 2003 just fine. Unless it is SP2. Then I get random reboots w/o any clue as to what is going on. After months of trying to trace down any driver, service, hardware issue. I just said the hell with it and have left it running SP1. The odd bit is that it was running for some time on SP2 w/o any errors. Just one day it said bloop. and was a pain in my side ever since.

Could have been the 'automatic reboot' option when there is a Blue Screen of Death. A lot of people confuse "random reboot" with "there was a BSOD, but I didn't get to see it in time."

Not trying to prove anyone wrong here, but that is something to look into, and I would say out of my experience, 95% of the time, a BSOD is caused by a driver.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 923873 · Report as offensive

Message boards : Technical News : Oh yeah.. That.. (Aug 04 2009)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.