Database Shuffle (Jul 12 2012)


log in

Advanced search

Message boards : Technical News : Database Shuffle (Jul 12 2012)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1391
Credit: 74,079
RAC: 10
United States
Message 1259580 - Posted: 12 Jul 2012, 20:13:42 UTC

There has been all kinds of slow shuffling behind the scenes lately, but the bottom line is: paddym is now our new master science database server, having taken over all duties from oscar! The final switchover process was over the past few days (hence some minor workunit shortages) and had the usual expected unexpected issues slowing us down yesterday (a comment in a config file that actually wasn't acting like a comment, and some nfs issues).

What we gain using paddym is a faster system in general, with more disk spindles (which enhances read/write i/o), a much faster (and more usable) hardware RAID configuration, and most importantly a LOT more disk space to play with. We have several database tables that are actually fragmented over several tables - now we have the extra room to merge these tables together again (something that several database cleaning projects have been waiting on for months). And, the extra disk i/o seems to help - a full database backup yesterday took about 7 hours. On oscar it usually took about 40.

So that's all good news, and thanks again to the GPU User's Group gang who helped us acquire this much needed gear! And lest we forget as an added bonus we now have oscar up for grabs in our server closest - it will become a wonderful compute server, among other things.

Meanwhile our mysql replica database on jocelyn has been falling behind too much lately. It's swapping, so I've been adjusting various memory configuration variables and trying to tune it up. I'm thinking this is becoming a new issue as, unlike the result and workunit tables which are constanly churning and roughly staying the same size, the user and host tables slowly grow without bounds. Maybe we're starting to see the useful portions of the database not fitting into memory on jocelyn anymore...

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Sirius B
Volunteer tester
Avatar
Send message
Joined: 26 Dec 00
Posts: 11940
Credit: 1,794,803
RAC: 613
Bermuda
Message 1259581 - Posted: 12 Jul 2012, 20:15:46 UTC - in response to Message 1259580.

Thanks for the updates Matt. Was that also the reason behind the forums snail pace?
____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4238
Credit: 34,917,140
RAC: 24,403
United Kingdom
Message 1259584 - Posted: 12 Jul 2012, 20:21:54 UTC - in response to Message 1259580.

Thanks for the update too Matt,

Claggy

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 13157
Credit: 7,884,950
RAC: 13,726
United States
Message 1259595 - Posted: 12 Jul 2012, 21:03:59 UTC

Thanks for the update news Matt.

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8801
Credit: 53,363,730
RAC: 41,781
United Kingdom
Message 1260035 - Posted: 13 Jul 2012, 22:54:01 UTC - in response to Message 1259580.

Meanwhile our mysql replica database on jocelyn has been falling behind too much lately. It's swapping, so I've been adjusting various memory configuration variables and trying to tune it up. I'm thinking this is becoming a new issue as, unlike the result and workunit tables which are constanly churning and roughly staying the same size, the user and host tables slowly grow without bounds. Maybe we're starting to see the useful portions of the database not fitting into memory on jocelyn anymore...

- Matt

According to BOINCstats, only ~150,000 out of ~1,300,000 users (11.7%), and ~225,000 out of ~3,200,000 hosts (7%) are what Willy calls 'active'.

Would I be right in thinking that the only reason for loading the other ~88.3%/~93% of those tables into memory would be to generate the daily stats dump?

I can't think of a way of 'archiving' the inactive parts of the tables, but still generating a single, complete, version of the dump files for the stats sites to import, but a workround might be:

* stop replication process
* generate stats dump, overcommitting memory in the process
* shut down database, flushing from memory
* restart database
* restart replication process

Could the cron job handle that?

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 529,645
RAC: 346
United States
Message 1260105 - Posted: 14 Jul 2012, 1:59:53 UTC - in response to Message 1260035.

Meanwhile our mysql replica database on jocelyn has been falling behind too much lately. It's swapping, so I've been adjusting various memory configuration variables and trying to tune it up. I'm thinking this is becoming a new issue as, unlike the result and workunit tables which are constanly churning and roughly staying the same size, the user and host tables slowly grow without bounds. Maybe we're starting to see the useful portions of the database not fitting into memory on jocelyn anymore...

- Matt

According to BOINCstats, only ~150,000 out of ~1,300,000 users (11.7%), and ~225,000 out of ~3,200,000 hosts (7%) are what Willy calls 'active'.

Would I be right in thinking that the only reason for loading the other ~88.3%/~93% of those tables into memory would be to generate the daily stats dump?

I can't think of a way of 'archiving' the inactive parts of the tables, but still generating a single, complete, version of the dump files for the stats sites to import, but a workround might be:

* stop replication process
* generate stats dump, overcommitting memory in the process
* shut down database, flushing from memory
* restart database
* restart replication process

Could the cron job handle that?

I can see a way of doing it if the XML is formatted correctly.

1) Save the timestamp of now in a temporary.
2) query the db for any users that have had a change in credit since the last stats job was run. (This query could be major fun). Save this temporary table.
3) For each user in the XML stats query the temporary table. If they are found in the temporary table, update the credit. If not, they have not had a credit change.
4) Mark the user in the temporary table as read.
5) After the pass through the XML, query the temporary table for any unmarked rows (this is to pick up new users). Append the credit for these users to the end of the XML.
6) Copy the temporary value where the time stamp is stored into the location for the query for the next credit run.
7) Delete the temporary and the temporary table.
____________


BOINC WIKI

Juha
Volunteer tester
Send message
Joined: 7 Mar 04
Posts: 191
Credit: 143,947
RAC: 47
Finland
Message 1261941 - Posted: 18 Jul 2012, 18:58:29 UTC - in response to Message 1259580.

Meanwhile our mysql replica database on jocelyn has been falling behind too much lately. It's swapping, so I've been adjusting various memory configuration variables and trying to tune it up. I'm thinking this is becoming a new issue as, unlike the result and workunit tables which are constanly churning and roughly staying the same size, the user and host tables slowly grow without bounds. Maybe we're starting to see the useful portions of the database not fitting into memory on jocelyn anymore...

It seems you've gotten jocelyn under control but I'll post this anyway.

I've understood that we read forums from replica. As you probably know they have been a bit slow to load every now and then lately. You did post about some bots causing trouble but I imagine swapping isn't helping either.

Richard suggested that it might be the stats dump that's causing the swapping. Since the forums have been slow at any time of day I'm thinking it might be something else (too).

Like search engines scanning retired hosts and long gone users not to mention result and workunit tables. After taking a peek into your robots.txt it looks like you have all the holes nicely locked out. Except that Google sneaks in through /sah/ and /beta// .

On an almost related note: While playing with Google I tried to find the last remaining Astropulse v505 workunits. Google still has about 100 of those in its index but all the links to the workunits give "can't find workunit". So are the numbers in Server status page correct?

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8801
Credit: 53,363,730
RAC: 41,781
United Kingdom
Message 1261964 - Posted: 18 Jul 2012, 20:22:34 UTC - in response to Message 1261941.

I've understood that we read forums from replica.

We're usually able to post to, and read back from, the forum in real time even when the replica is over a day behind the master, or completely disabled. Forum posters see up-to-date stats below their name, and the same for other contributors to the discussion. Unless there's a two-way replication going on (with new forum posts going first into the 'replica' database, and being replicated into the 'master' database from there), I don't see how we can have a single authoritative database to be backed up, and then restored in a crisis. No, I think this must be an urban myth.

What has certainly been the case in previous 'behind' events is the task listings have been delayed by a similar amount - I think confirming that task lists are (were? I haven't noticed it this time) served from the replica. That makes more sense - we don't have any way of interacting with the task records, they're read-only as far as the user website is concerned. But - and I'm in danger of creating my own urban myth here - I believe the admins have fairly easy tools for switching the task list display between master and replica databases, so that service can be maintained when the replica is completely offline. I think that might even be the normal running mode since the more powerful 'carolyn' took over master database duties.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 840
Credit: 1,578,051
RAC: 55
Germany
Message 1261973 - Posted: 18 Jul 2012, 20:41:00 UTC
Last modified: 18 Jul 2012, 20:44:44 UTC

According to the server status page Oscar is now the replica db and can keep up with the master so far (which is not surprising, since it's same powerful as Carolyn). So probably we don't need any workarounds for generating the stats anymore.
____________
.

Juha
Volunteer tester
Send message
Joined: 7 Mar 04
Posts: 191
Credit: 143,947
RAC: 47
Finland
Message 1261981 - Posted: 18 Jul 2012, 21:08:17 UTC - in response to Message 1261964.
Last modified: 18 Jul 2012, 21:09:27 UTC

Richard wrote:
I've understood that we read forums from replica.

We're usually able to post to, and read back from, the forum in real time even when the replica is over a day behind the master, or completely disabled.

Too many parts to remember how everything is wired...

Yeah I didn't think what I was saying and got the forums and task listing mixed up.

Link wrote:
According to the server status page Oscar is now the replica db and can keep up with the master so far (which is not surprising, since it's same powerful as Carolyn). So probably we don't need any workarounds for generating the stats anymore.

Before I posted I checked how jocelyn was doing but completely missed that they had changed servers.

Well anyway, Google has access to the ever-changing workunit/result list. Whether or not the servers can handle it it's not very useful to index something that's gone in a few days.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 840
Credit: 1,578,051
RAC: 55
Germany
Message 1261995 - Posted: 18 Jul 2012, 21:28:34 UTC - in response to Message 1261981.

Well anyway, Google has access to the ever-changing workunit/result list. Whether or not the servers can handle it it's not very useful to index something that's gone in a few days.

According to robots.txt results (and many other things) should not be indexed. I'd assume that google (and most other search engines) follows that.
____________
.

Juha
Volunteer tester
Send message
Joined: 7 Mar 04
Posts: 191
Credit: 143,947
RAC: 47
Finland
Message 1262004 - Posted: 18 Jul 2012, 21:38:58 UTC - in response to Message 1261995.

According to robots.txt results (and many other things) should not be indexed. I'd assume that google (and most other search engines) follows that.

Yes, except that somehow, somewhere Google found a link that lets it access all the things listed in robots.txt. If you don't believe me try searching for some workunits.

I didn't want to write down the links in full because I think that would be an invitation for all the other search engines. I thought that writing /sah/ and /beta// would be obvious enough but maybe I was wrong.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 840
Credit: 1,578,051
RAC: 55
Germany
Message 1262018 - Posted: 18 Jul 2012, 22:10:52 UTC - in response to Message 1262004.

Hmm... the issue seems to be, that all those things can also be accessed over "/sah//" and "/sah/", which are not excluded... so basically the robots.txt is wrong, either the entire /sah should be disallowed or subdirectories of it like for beta.
____________
.

Swibby Bear
Send message
Joined: 1 Aug 01
Posts: 236
Credit: 7,276,504
RAC: 1
United States
Message 1262082 - Posted: 19 Jul 2012, 2:21:41 UTC - in response to Message 1259580.

And lest we forget as an added bonus we now have Oscar up for grabs in our server closet...

Maybe we're starting to see the useful portions of the database not fitting into memory on Jocelyn anymore...


Well, that didn't take long! Oscar replacing Jocelyn was sure a fast way to get more memory and horsepower onto the secondary database. Is the Oscar move expected to be permanent, or can you add more ram to Jocelyn and return her to the same old duties? At first blush, Oscar seems over-powered for that job.

You all are stars in our eyes!

Whit

Message boards : Technical News : Database Shuffle (Jul 12 2012)

Copyright © 2014 University of California