Database replica thread

Message boards : Number crunching : Database replica thread
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 32768 - Posted: 4 Oct 2004, 23:42:41 UTC

I'll start with the message I just posted on the main site:

"There will be an outage tomorrow morning (17:00 UTC) for several hours for database reconfiguration. Specifically, the transitioners and validators are falling way behind, making the crediting process quite slow. Last week we successfully created a replica database on a system with a much faster disk array than the master database system. Tomorrow, we hope to swap the two, as the database is seemingly I/O bound and faster disks will mean faster database queries which in turn will mean the validator/transitioner queues should quickly drain."

Since I get lots of complaints about our front page news being too technical, I figured, for now, I'd move further (i.e. nerdy) discussions on this matter off to this forum, which is full of people who actually care about such things.

FYI, currently we have four main servers, three of them attached to separate disk arrays:

1. master database - attached to a bunch of disks with software raid 5.
2. slave/replica database - attached to a bunch of disks with more spindles than the master, and it's raid 10.
3. scheduling server - i.e. the upload/download server, attached to the snapappliance (which is the fastest of the disk arrays, but its configuration is in flux so we are not doing anything else with this just yet).
4. web server

The last big hardware shift, happening behind the scenes, was making the replica database, which took a while to get going (as we had to scrape the hardware together and reinstall the OS, set up the raid, copy the data, resync the data, etc.). Now that is up and running it's pretty cool. Getting this going allowed us to get back to dumping xml stats for other web sites to slurp up and display without slowing down the master production database.

We were hoping to wait a while to let us break in the replica datbase before swapping it with the master, but with the validator/transitioners lagging behind, there's no better time like the present to move forward.

The outage tomorrow is made a little trickier as the two databases are on similar server machines, but have different amounts of RAM. So some memory exchange will have to occur during the outage. No biggie - just more down time.

Questions? I'll try to answer them if I have time. I make no guarantees.

- Matt
BOINC/SETI@home
ID: 32768 · Report as offensive
Arm

Send message
Joined: 12 Sep 03
Posts: 308
Credit: 15,584,777
RAC: 0
Message 32772 - Posted: 4 Oct 2004, 23:50:03 UTC
Last modified: 4 Oct 2004, 23:50:17 UTC

Thanks for the info, Matt. No questions from me. Im sure you're doing the right thing.
Outage? I have downloaded enough WUs to survive those few hours :))
Good luck tomorrow and thank you for respecting us!



S@h Berkeley's Staff Friends Club ©
ID: 32772 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 32773 - Posted: 4 Oct 2004, 23:51:27 UTC - in response to Message 32772.  

> Good luck tomorrow and thank you for respecting us!
>

I second that :)



greetz, Uli
ID: 32773 · Report as offensive
Profile Papa Zito
Avatar

Send message
Joined: 7 Feb 03
Posts: 257
Credit: 624,881
RAC: 0
United States
Message 32779 - Posted: 5 Oct 2004, 0:15:00 UTC - in response to Message 32768.  

>
> 1. master database - attached to a bunch of disks with software raid 5.

Why software RAID?




------------------------------------


The game High/Low is played by tossing two nuclear warheads into the air. The one whose bomb explodes higher wins. This game is usually played by people of low intelligence, hence the name High/Low.
ID: 32779 · Report as offensive
Profile 'bosh
Volunteer tester
Avatar

Send message
Joined: 7 Feb 03
Posts: 46
Credit: 3,314,901
RAC: 0
Canada
Message 32809 - Posted: 5 Oct 2004, 1:47:47 UTC - in response to Message 32768.  


> 1. master database - attached to a bunch of disks with software raid 5.

I thought you guys have stopped using software raid after the problems that you had with it the "last time". That begs the question why (use it)?
ID: 32809 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 32812 - Posted: 5 Oct 2004, 1:57:01 UTC

Why software raid? Good question. Answer: No hardware raid!

That's the short answer (and all I have time for as I'm actually not at the lab right now). When I say we have no money, I mean it. We use what we get generously donated to us and sometimes have to get creative.

- Matt
BOINC/SETI@home
ID: 32812 · Report as offensive
Profile Shontzomania

Send message
Joined: 20 Sep 04
Posts: 4
Credit: 4,147
RAC: 0
United States
Message 32818 - Posted: 5 Oct 2004, 2:08:21 UTC

Ah, the old "fix coding and SQL problems with more hardware" motif!

(Just kidding...been a DBA for 10 years and I can never resist the opportunity to use that line!)

In all seriousness, good luck with the migration. I can only imagine that a DB with the (I am guessing) INSANE amount of I/O thrashing you guys take would be an absolute nightmare to keep in line. Just remember to take out a few "SELECT *"'s while you are at it!

:)

I would also like to say that I would be happy to lend any assistance if anybody ever needs an extra set of eyes to look at a SQL script/procedure.

Cheers,
Doug
ID: 32818 · Report as offensive
Profile Everette Dobbins

Send message
Joined: 13 Jan 00
Posts: 291
Credit: 22,594,655
RAC: 0
United States
Message 32820 - Posted: 5 Oct 2004, 2:12:56 UTC - in response to Message 32768.  

> I'll start with the message I just posted on the main site:
>
> "There will be an outage tomorrow morning (17:00 UTC) for several hours for
> database reconfiguration. Specifically, the transitioners and validators are
> falling way behind, making the crediting process quite slow. Last week we
> successfully created a replica database on a system with a much faster disk
> array than the master database system. Tomorrow, we hope to swap the two, as
> the database is seemingly I/O bound and faster disks will mean faster database
> queries which in turn will mean the validator/transitioner queues should
> quickly drain."
>
> Since I get lots of complaints about our front page news being too technical,
> I figured, for now, I'd move further (i.e. nerdy) discussions on this matter
> off to this forum, which is full of people who actually care about such
> things.
>
> FYI, currently we have four main servers, three of them attached to separate
> disk arrays:
>
> 1. master database - attached to a bunch of disks with software raid 5.
> 2. slave/replica database - attached to a bunch of disks with more spindles
> than the master, and it's raid 10.
> 3. scheduling server - i.e. the upload/download server, attached to the
> snapappliance (which is the fastest of the disk arrays, but its configuration
> is in flux so we are not doing anything else with this just yet).
> 4. web server
>
> The last big hardware shift, happening behind the scenes, was making the
> replica database, which took a while to get going (as we had to scrape the
> hardware together and reinstall the OS, set up the raid, copy the data, resync
> the data, etc.). Now that is up and running it's pretty cool. Getting this
> going allowed us to get back to dumping xml stats for other web sites to slurp
> up and display without slowing down the master production database.
>
> We were hoping to wait a while to let us break in the replica datbase before
> swapping it with the master, but with the validator/transitioners lagging
> behind, there's no better time like the present to move forward.
>
> The outage tomorrow is made a little trickier as the two databases are on
> similar server machines, but have different amounts of RAM. So some memory
> exchange will have to occur during the outage. No biggie - just more down
> time.
>
> Questions? I'll try to answer them if I have time. I make no guarantees.
>
> - Matt
> BOINC/SETI@home
>

Can participants donate hardware. I have a 160 Gig Maxtor Hard Drive its to much for running this program. Can I donate it to the Berkley Set@home BOINC ?
ID: 32820 · Report as offensive
Profile Stephen Balch
Avatar

Send message
Joined: 20 Apr 00
Posts: 141
Credit: 13,912
RAC: 0
United States
Message 32822 - Posted: 5 Oct 2004, 2:29:05 UTC

Matt,

Thanks for the information. Some of us, at least, really do appreciate being kept informed as to what is happening.

I wish I could donate a few thousand dollars to you guys, but I'm back in school and have very limited funds myself.

Best of luck for tomorrow.

Stephen
<P>"I want to go dancing on the moon, I want to frolic in zero gravity!....", and now, I just might be able to go! Thanks, SpaceShipOne and crew!<BR><a>
ID: 32822 · Report as offensive
Janus
Volunteer developer

Send message
Joined: 4 Dec 01
Posts: 376
Credit: 967,976
RAC: 0
Denmark
Message 32890 - Posted: 5 Oct 2004, 7:46:57 UTC - in response to Message 32768.  

A little question:

Will the downtime be used to delete some of the unused columns in the user table? - deleting them should give you a few extra Megs of space and help keep the size of the table as small as possible...

Talking about 'signature' and 'posts' which have both moved to another table.
ID: 32890 · Report as offensive
Profile Steve Cressman
Volunteer tester
Avatar

Send message
Joined: 6 Jun 02
Posts: 583
Credit: 65,644
RAC: 0
Canada
Message 32956 - Posted: 5 Oct 2004, 15:25:35 UTC

Thanx Matt. Keeping us informed is always a good thing :)




Application has reported a 'Not My Fault' in module KRNL.EXE in line 0200:103F
ID: 32956 · Report as offensive
Profile oldlefthander

Send message
Joined: 19 Dec 01
Posts: 1
Credit: 622,589
RAC: 0
United States
Message 32960 - Posted: 5 Oct 2004, 15:44:42 UTC

Matt,

Is there a document of the hardware configuration that you guys are using for SETI. I am just curious how you have everything set up. I appreciate the update on the database move, and I'm sure you guys are doing what you can with what you have. Keep up the good work.

Pete
ID: 32960 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 32968 - Posted: 5 Oct 2004, 16:06:04 UTC

Couple answers:

1. We should have a hardware configuration document but don't, mainly because it would take a bunch of time to spell out what we got and because it's changing week to week. The suggestion has been thrown around to maybe add a lot more text and other good stuff to the server status page - once we get the time to do so that might actually help a lot in this regard.

2. Regarding deleting columns - that's actually part of the plan. So good suggestion. But not on the user table. The real problem is the result table, which is much larger and accessed more frequently. After we move today we'll start a job the slowly purges deleted results from that table (the signals from the deleted results are already in our master database - and we'll save the deleted rows to disk in xml format just in case we need them again later).

Okay.. everybody's here now. I'll get going on this.

- Matt
BOINC/SETI@home
ID: 32968 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 32979 - Posted: 5 Oct 2004, 18:08:41 UTC

Okay.. just swapped memory and changed replica/master pointers. Halfway there. Testing out the system now with this here post..

- Matt
BOINC/SETI@home
ID: 32979 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 32983 - Posted: 5 Oct 2004, 18:23:30 UTC

Hello everybody,

how comes i can up-/download wu's and report them right this moment?
Don't get me wrong, i like that ;)



greetz, Uli
ID: 32983 · Report as offensive
texasfit
Avatar

Send message
Joined: 11 May 03
Posts: 223
Credit: 500,626
RAC: 0
United States
Message 32986 - Posted: 5 Oct 2004, 18:31:52 UTC - in response to Message 32979.  

> Okay.. just swapped memory and changed replica/master pointers. Halfway there.
> Testing out the system now with this here post..
>
> - Matt
> BOINC/SETI@home
>

Thanks for the information, Matt.

We really do appreciate all the hard work and time that you and the team put into keeping this project running with your limited or non-existant funds.

----------



Join the Overclockers.com SETI Team!
ID: 32986 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 32987 - Posted: 5 Oct 2004, 18:37:46 UTC

Wow. Things are really chugging along on this new set of disks. This is really good. The transitioners which have been backlogged for weeks look like they will fully catch up in an hour.

Not sure how to check the validator status (I'll ask David when he gets in) but that'll probably catch up quickly too. Imagine - users getting credit when credit is due.

Funny thing is.. we're still have high i/o waits, so imagine what we can do once we do get even faster disks!

I'll post something to the front page eventually. We're not out of the woods yet.
Still checking things out for a while..

- Matt
BOINC/SETI@home
ID: 32987 · Report as offensive
Profile [HWU] GHz & CO. - BOINC.Italy
Volunteer tester
Avatar

Send message
Joined: 1 Jul 02
Posts: 139
Credit: 1,466,611
RAC: 0
Italy
Message 32997 - Posted: 5 Oct 2004, 19:14:13 UTC - in response to Message 32987.  
Last modified: 5 Oct 2004, 19:15:36 UTC

> Wow. Things are really chugging along on this new set of disks. This is really
> good. The transitioners which have been backlogged for weeks look like they
> will fully catch up in an hour.
>
> Not sure how to check the validator status (I'll ask David when he gets in)
> but that'll probably catch up quickly too. Imagine - users getting credit when
> credit is due.
>
> Funny thing is.. we're still have high i/o waits, so imagine what we can do
> once we do get even faster disks!
>
> I'll post something to the front page eventually. We're not out of the woods
> yet.
> Still checking things out for a while..
>
> - Matt
> BOINC/SETI@home
>

Good news Matt! Thanks for your work and information on the server status :)

Only an observetion and question about the server :)
Boinc need more fastest hardware, and seticlassic will have to be closed in a near future. So, is not possible during tthis migration to use some part of seticlassic server for boinc? It's a problem for the server structure of seticlassic?

Good work.

<p>GHz

Hardware Upgrade - Seti@home

ID: 32997 · Report as offensive
Profile Captain Avatar
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 15133
Credit: 529,088
RAC: 0
United States
Message 33000 - Posted: 5 Oct 2004, 19:27:50 UTC
Last modified: 5 Oct 2004, 22:44:12 UTC

You Guys do rock! How do we donate money (not much but want to help)

Thanks for the info it really helps us as part of the team...


Timmy
</img>


ID: 33000 · Report as offensive
TPR_Mojo
Volunteer tester

Send message
Joined: 18 Apr 00
Posts: 323
Credit: 7,001,052
RAC: 0
United Kingdom
Message 33009 - Posted: 5 Oct 2004, 20:00:33 UTC

Everything seems really snappy here :)
ID: 33009 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Database replica thread


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.