Database replica thread

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 32768 - Posted: 4 Oct 2004, 23:42:41 UTC I'll start with the message I just posted on the main site: "There will be an outage tomorrow morning (17:00 UTC) for several hours for database reconfiguration. Specifically, the transitioners and validators are falling way behind, making the crediting process quite slow. Last week we successfully created a replica database on a system with a much faster disk array than the master database system. Tomorrow, we hope to swap the two, as the database is seemingly I/O bound and faster disks will mean faster database queries which in turn will mean the validator/transitioner queues should quickly drain." Since I get lots of complaints about our front page news being too technical, I figured, for now, I'd move further (i.e. nerdy) discussions on this matter off to this forum, which is full of people who actually care about such things. FYI, currently we have four main servers, three of them attached to separate disk arrays: 1. master database - attached to a bunch of disks with software raid 5. 2. slave/replica database - attached to a bunch of disks with more spindles than the master, and it's raid 10. 3. scheduling server - i.e. the upload/download server, attached to the snapappliance (which is the fastest of the disk arrays, but its configuration is in flux so we are not doing anything else with this just yet). 4. web server The last big hardware shift, happening behind the scenes, was making the replica database, which took a while to get going (as we had to scrape the hardware together and reinstall the OS, set up the raid, copy the data, resync the data, etc.). Now that is up and running it's pretty cool. Getting this going allowed us to get back to dumping xml stats for other web sites to slurp up and display without slowing down the master production database. We were hoping to wait a while to let us break in the replica datbase before swapping it with the master, but with the validator/transitioners lagging behind, there's no better time like the present to move forward. The outage tomorrow is made a little trickier as the two databases are on similar server machines, but have different amounts of RAM. So some memory exchange will have to occur during the outage. No biggie - just more down time. Questions? I'll try to answer them if I have time. I make no guarantees. - Matt BOINC/SETI@home ID: 32768 ·

Arm Send message Joined: 12 Sep 03 Posts: 308 Credit: 15,584,777 RAC: 0	Message 32772 - Posted: 4 Oct 2004, 23:50:03 UTC Last modified: 4 Oct 2004, 23:50:17 UTC Thanks for the info, Matt. No questions from me. Im sure you're doing the right thing. Outage? I have downloaded enough WUs to survive those few hours :)) Good luck tomorrow and thank you for respecting us! S@h Berkeley's Staff Friends Club Â© ID: 32772 ·

Ulrich Metzner Volunteer tester Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13	Message 32773 - Posted: 4 Oct 2004, 23:51:27 UTC - in response to Message 32772. > Good luck tomorrow and thank you for respecting us! > I second that :) greetz, Uli ID: 32773 ·

Papa Zito Send message Joined: 7 Feb 03 Posts: 257 Credit: 624,881 RAC: 0	Message 32779 - Posted: 5 Oct 2004, 0:15:00 UTC - in response to Message 32768. > > 1. master database - attached to a bunch of disks with software raid 5. Why software RAID? ------------------------------------ The game High/Low is played by tossing two nuclear warheads into the air. The one whose bomb explodes higher wins. This game is usually played by people of low intelligence, hence the name High/Low. ID: 32779 ·

'bosh Volunteer tester Send message Joined: 7 Feb 03 Posts: 46 Credit: 3,314,901 RAC: 0	Message 32809 - Posted: 5 Oct 2004, 1:47:47 UTC - in response to Message 32768. > 1. master database - attached to a bunch of disks with software raid 5. I thought you guys have stopped using software raid after the problems that you had with it the "last time". That begs the question why (use it)? ID: 32809 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 32812 - Posted: 5 Oct 2004, 1:57:01 UTC Why software raid? Good question. Answer: No hardware raid! That's the short answer (and all I have time for as I'm actually not at the lab right now). When I say we have no money, I mean it. We use what we get generously donated to us and sometimes have to get creative. - Matt BOINC/SETI@home ID: 32812 ·

Shontzomania Send message Joined: 20 Sep 04 Posts: 4 Credit: 4,147 RAC: 0	Message 32818 - Posted: 5 Oct 2004, 2:08:21 UTC Ah, the old "fix coding and SQL problems with more hardware" motif! (Just kidding...been a DBA for 10 years and I can never resist the opportunity to use that line!) In all seriousness, good luck with the migration. I can only imagine that a DB with the (I am guessing) INSANE amount of I/O thrashing you guys take would be an absolute nightmare to keep in line. Just remember to take out a few "SELECT *"'s while you are at it! :) I would also like to say that I would be happy to lend any assistance if anybody ever needs an extra set of eyes to look at a SQL script/procedure. Cheers, Doug ID: 32818 ·

Everette Dobbins Send message Joined: 13 Jan 00 Posts: 291 Credit: 22,594,655 RAC: 0	Message 32820 - Posted: 5 Oct 2004, 2:12:56 UTC - in response to Message 32768. > I'll start with the message I just posted on the main site: > > "There will be an outage tomorrow morning (17:00 UTC) for several hours for > database reconfiguration. Specifically, the transitioners and validators are > falling way behind, making the crediting process quite slow. Last week we > successfully created a replica database on a system with a much faster disk > array than the master database system. Tomorrow, we hope to swap the two, as > the database is seemingly I/O bound and faster disks will mean faster database > queries which in turn will mean the validator/transitioner queues should > quickly drain." > > Since I get lots of complaints about our front page news being too technical, > I figured, for now, I'd move further (i.e. nerdy) discussions on this matter > off to this forum, which is full of people who actually care about such > things. > > FYI, currently we have four main servers, three of them attached to separate > disk arrays: > > 1. master database - attached to a bunch of disks with software raid 5. > 2. slave/replica database - attached to a bunch of disks with more spindles > than the master, and it's raid 10. > 3. scheduling server - i.e. the upload/download server, attached to the > snapappliance (which is the fastest of the disk arrays, but its configuration > is in flux so we are not doing anything else with this just yet). > 4. web server > > The last big hardware shift, happening behind the scenes, was making the > replica database, which took a while to get going (as we had to scrape the > hardware together and reinstall the OS, set up the raid, copy the data, resync > the data, etc.). Now that is up and running it's pretty cool. Getting this > going allowed us to get back to dumping xml stats for other web sites to slurp > up and display without slowing down the master production database. > > We were hoping to wait a while to let us break in the replica datbase before > swapping it with the master, but with the validator/transitioners lagging > behind, there's no better time like the present to move forward. > > The outage tomorrow is made a little trickier as the two databases are on > similar server machines, but have different amounts of RAM. So some memory > exchange will have to occur during the outage. No biggie - just more down > time. > > Questions? I'll try to answer them if I have time. I make no guarantees. > > - Matt > BOINC/SETI@home > Can participants donate hardware. I have a 160 Gig Maxtor Hard Drive its to much for running this program. Can I donate it to the Berkley Set@home BOINC ? ID: 32820 ·

Stephen Balch Send message Joined: 20 Apr 00 Posts: 141 Credit: 13,912 RAC: 0	Message 32822 - Posted: 5 Oct 2004, 2:29:05 UTC Matt, Thanks for the information. Some of us, at least, really do appreciate being kept informed as to what is happening. I wish I could donate a few thousand dollars to you guys, but I'm back in school and have very limited funds myself. Best of luck for tomorrow. Stephen <P>"I want to go dancing on the moon, I want to frolic in zero gravity!....", and now, I just might be able to go! Thanks, SpaceShipOne and crew!<BR><a> ID: 32822 ·

Janus Volunteer developer Send message Joined: 4 Dec 01 Posts: 376 Credit: 967,976 RAC: 0	Message 32890 - Posted: 5 Oct 2004, 7:46:57 UTC - in response to Message 32768. A little question: Will the downtime be used to delete some of the unused columns in the user table? - deleting them should give you a few extra Megs of space and help keep the size of the table as small as possible... Talking about 'signature' and 'posts' which have both moved to another table. ID: 32890 ·

Steve Cressman Volunteer tester Send message Joined: 6 Jun 02 Posts: 583 Credit: 65,644 RAC: 0	Message 32956 - Posted: 5 Oct 2004, 15:25:35 UTC Thanx Matt. Keeping us informed is always a good thing :) Application has reported a 'Not My Fault' in module KRNL.EXE in line 0200:103F ID: 32956 ·

oldlefthander Send message Joined: 19 Dec 01 Posts: 1 Credit: 622,589 RAC: 0	Message 32960 - Posted: 5 Oct 2004, 15:44:42 UTC Matt, Is there a document of the hardware configuration that you guys are using for SETI. I am just curious how you have everything set up. I appreciate the update on the database move, and I'm sure you guys are doing what you can with what you have. Keep up the good work. Pete ID: 32960 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 32968 - Posted: 5 Oct 2004, 16:06:04 UTC Couple answers: 1. We should have a hardware configuration document but don't, mainly because it would take a bunch of time to spell out what we got and because it's changing week to week. The suggestion has been thrown around to maybe add a lot more text and other good stuff to the server status page - once we get the time to do so that might actually help a lot in this regard. 2. Regarding deleting columns - that's actually part of the plan. So good suggestion. But not on the user table. The real problem is the result table, which is much larger and accessed more frequently. After we move today we'll start a job the slowly purges deleted results from that table (the signals from the deleted results are already in our master database - and we'll save the deleted rows to disk in xml format just in case we need them again later). Okay.. everybody's here now. I'll get going on this. - Matt BOINC/SETI@home ID: 32968 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 32979 - Posted: 5 Oct 2004, 18:08:41 UTC Okay.. just swapped memory and changed replica/master pointers. Halfway there. Testing out the system now with this here post.. - Matt BOINC/SETI@home ID: 32979 ·

Ulrich Metzner Volunteer tester Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13	Message 32983 - Posted: 5 Oct 2004, 18:23:30 UTC Hello everybody, how comes i can up-/download wu's and report them right this moment? Don't get me wrong, i like that ;) greetz, Uli ID: 32983 ·

texasfit Send message Joined: 11 May 03 Posts: 223 Credit: 500,626 RAC: 0	Message 32986 - Posted: 5 Oct 2004, 18:31:52 UTC - in response to Message 32979. > Okay.. just swapped memory and changed replica/master pointers. Halfway there. > Testing out the system now with this here post.. > > - Matt > BOINC/SETI@home > Thanks for the information, Matt. We really do appreciate all the hard work and time that you and the team put into keeping this project running with your limited or non-existant funds. ---------- Join the Overclockers.com SETI Team! ID: 32986 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 32987 - Posted: 5 Oct 2004, 18:37:46 UTC Wow. Things are really chugging along on this new set of disks. This is really good. The transitioners which have been backlogged for weeks look like they will fully catch up in an hour. Not sure how to check the validator status (I'll ask David when he gets in) but that'll probably catch up quickly too. Imagine - users getting credit when credit is due. Funny thing is.. we're still have high i/o waits, so imagine what we can do once we do get even faster disks! I'll post something to the front page eventually. We're not out of the woods yet. Still checking things out for a while.. - Matt BOINC/SETI@home ID: 32987 ·

[HWU] GHz & CO. - BOINC.Italy Volunteer tester Send message Joined: 1 Jul 02 Posts: 139 Credit: 1,466,611 RAC: 0	Message 32997 - Posted: 5 Oct 2004, 19:14:13 UTC - in response to Message 32987. Last modified: 5 Oct 2004, 19:15:36 UTC > Wow. Things are really chugging along on this new set of disks. This is really > good. The transitioners which have been backlogged for weeks look like they > will fully catch up in an hour. > > Not sure how to check the validator status (I'll ask David when he gets in) > but that'll probably catch up quickly too. Imagine - users getting credit when > credit is due. > > Funny thing is.. we're still have high i/o waits, so imagine what we can do > once we do get even faster disks! > > I'll post something to the front page eventually. We're not out of the woods > yet. > Still checking things out for a while.. > > - Matt > BOINC/SETI@home > Good news Matt! Thanks for your work and information on the server status :) Only an observetion and question about the server :) Boinc need more fastest hardware, and seticlassic will have to be closed in a near future. So, is not possible during tthis migration to use some part of seticlassic server for boinc? It's a problem for the server structure of seticlassic? Good work. <p>GHz Hardware Upgrade - Seti@home ID: 32997 ·

Captain Avatar Volunteer tester Send message Joined: 17 May 99 Posts: 15133 Credit: 529,088 RAC: 0	Message 33000 - Posted: 5 Oct 2004, 19:27:50 UTC Last modified: 5 Oct 2004, 22:44:12 UTC You Guys do rock! How do we donate money (not much but want to help) Thanks for the info it really helps us as part of the team... Timmy </img> ID: 33000 ·

TPR_Mojo Volunteer tester Send message Joined: 18 Apr 00 Posts: 323 Credit: 7,001,052 RAC: 0	Message 33009 - Posted: 5 Oct 2004, 20:00:33 UTC Everything seems really snappy here :) ID: 33009 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.