Database Shuffle (Jul 12 2012)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 1259580 - Posted: 12 Jul 2012, 20:13:42 UTC There has been all kinds of slow shuffling behind the scenes lately, but the bottom line is: paddym is now our new master science database server, having taken over all duties from oscar! The final switchover process was over the past few days (hence some minor workunit shortages) and had the usual expected unexpected issues slowing us down yesterday (a comment in a config file that actually wasn't acting like a comment, and some nfs issues). What we gain using paddym is a faster system in general, with more disk spindles (which enhances read/write i/o), a much faster (and more usable) hardware RAID configuration, and most importantly a LOT more disk space to play with. We have several database tables that are actually fragmented over several tables - now we have the extra room to merge these tables together again (something that several database cleaning projects have been waiting on for months). And, the extra disk i/o seems to help - a full database backup yesterday took about 7 hours. On oscar it usually took about 40. So that's all good news, and thanks again to the GPU User's Group gang who helped us acquire this much needed gear! And lest we forget as an added bonus we now have oscar up for grabs in our server closest - it will become a wonderful compute server, among other things. Meanwhile our mysql replica database on jocelyn has been falling behind too much lately. It's swapping, so I've been adjusting various memory configuration variables and trying to tune it up. I'm thinking this is becoming a new issue as, unlike the result and workunit tables which are constanly churning and roughly staying the same size, the user and host tables slowly grow without bounds. Maybe we're starting to see the useful portions of the database not fitting into memory on jocelyn anymore... - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 1259580 ·

Sirius B Volunteer tester Send message Joined: 26 Dec 00 Posts: 24891 Credit: 3,081,182 RAC: 7	Message 1259581 - Posted: 12 Jul 2012, 20:15:46 UTC - in response to Message 1259580. Thanks for the updates Matt. Was that also the reason behind the forums snail pace? ID: 1259581 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1259584 - Posted: 12 Jul 2012, 20:21:54 UTC - in response to Message 1259580. Thanks for the update too Matt, Claggy ID: 1259584 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30812 Credit: 53,134,872 RAC: 32	Message 1259595 - Posted: 12 Jul 2012, 21:03:59 UTC Thanks for the update news Matt. ID: 1259595 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 1260035 - Posted: 13 Jul 2012, 22:54:01 UTC - in response to Message 1259580. Meanwhile our mysql replica database on jocelyn has been falling behind too much lately. It's swapping, so I've been adjusting various memory configuration variables and trying to tune it up. I'm thinking this is becoming a new issue as, unlike the result and workunit tables which are constanly churning and roughly staying the same size, the user and host tables slowly grow without bounds. Maybe we're starting to see the useful portions of the database not fitting into memory on jocelyn anymore... - Matt According to BOINCstats, only ~150,000 out of ~1,300,000 users (11.7%), and ~225,000 out of ~3,200,000 hosts (7%) are what Willy calls 'active'. Would I be right in thinking that the only reason for loading the other ~88.3%/~93% of those tables into memory would be to generate the daily stats dump? I can't think of a way of 'archiving' the inactive parts of the tables, but still generating a single, complete, version of the dump files for the stats sites to import, but a workround might be: * stop replication process * generate stats dump, overcommitting memory in the process * shut down database, flushing from memory * restart database * restart replication process Could the cron job handle that? ID: 1260035 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 1260105 - Posted: 14 Jul 2012, 1:59:53 UTC - in response to Message 1260035. Meanwhile our mysql replica database on jocelyn has been falling behind too much lately. It's swapping, so I've been adjusting various memory configuration variables and trying to tune it up. I'm thinking this is becoming a new issue as, unlike the result and workunit tables which are constanly churning and roughly staying the same size, the user and host tables slowly grow without bounds. Maybe we're starting to see the useful portions of the database not fitting into memory on jocelyn anymore... - Matt According to BOINCstats, only ~150,000 out of ~1,300,000 users (11.7%), and ~225,000 out of ~3,200,000 hosts (7%) are what Willy calls 'active'. Would I be right in thinking that the only reason for loading the other ~88.3%/~93% of those tables into memory would be to generate the daily stats dump? I can't think of a way of 'archiving' the inactive parts of the tables, but still generating a single, complete, version of the dump files for the stats sites to import, but a workround might be: * stop replication process * generate stats dump, overcommitting memory in the process * shut down database, flushing from memory * restart database * restart replication process Could the cron job handle that? I can see a way of doing it if the XML is formatted correctly. 1) Save the timestamp of now in a temporary. 2) query the db for any users that have had a change in credit since the last stats job was run. (This query could be major fun). Save this temporary table. 3) For each user in the XML stats query the temporary table. If they are found in the temporary table, update the credit. If not, they have not had a credit change. 4) Mark the user in the temporary table as read. 5) After the pass through the XML, query the temporary table for any unmarked rows (this is to pick up new users). Append the credit for these users to the end of the XML. 6) Copy the temporary value where the time stamp is stored into the location for the query for the next credit run. 7) Delete the temporary and the temporary table. BOINC WIKI ID: 1260105 ·

Juha Volunteer tester Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0	Message 1261941 - Posted: 18 Jul 2012, 18:58:29 UTC - in response to Message 1259580. Meanwhile our mysql replica database on jocelyn has been falling behind too much lately. It's swapping, so I've been adjusting various memory configuration variables and trying to tune it up. I'm thinking this is becoming a new issue as, unlike the result and workunit tables which are constanly churning and roughly staying the same size, the user and host tables slowly grow without bounds. Maybe we're starting to see the useful portions of the database not fitting into memory on jocelyn anymore... It seems you've gotten jocelyn under control but I'll post this anyway. I've understood that we read forums from replica. As you probably know they have been a bit slow to load every now and then lately. You did post about some bots causing trouble but I imagine swapping isn't helping either. Richard suggested that it might be the stats dump that's causing the swapping. Since the forums have been slow at any time of day I'm thinking it might be something else (too). Like search engines scanning retired hosts and long gone users not to mention result and workunit tables. After taking a peek into your robots.txt it looks like you have all the holes nicely locked out. Except that Google sneaks in through /sah/ and /beta// . On an almost related note: While playing with Google I tried to find the last remaining Astropulse v505 workunits. Google still has about 100 of those in its index but all the links to the workunits give "can't find workunit". So are the numbers in Server status page correct? ID: 1261941 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 1261964 - Posted: 18 Jul 2012, 20:22:34 UTC - in response to Message 1261941. I've understood that we read forums from replica. We're usually able to post to, and read back from, the forum in real time even when the replica is over a day behind the master, or completely disabled. Forum posters see up-to-date stats below their name, and the same for other contributors to the discussion. Unless there's a two-way replication going on (with new forum posts going first into the 'replica' database, and being replicated into the 'master' database from there), I don't see how we can have a single authoritative database to be backed up, and then restored in a crisis. No, I think this must be an urban myth. What has certainly been the case in previous 'behind' events is the task listings have been delayed by a similar amount - I think confirming that task lists are (were? I haven't noticed it this time) served from the replica. That makes more sense - we don't have any way of interacting with the task records, they're read-only as far as the user website is concerned. But - and I'm in danger of creating my own urban myth here - I believe the admins have fairly easy tools for switching the task list display between master and replica databases, so that service can be maintained when the replica is completely offline. I think that might even be the normal running mode since the more powerful 'carolyn' took over master database duties. ID: 1261964 ·

Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0	Message 1261973 - Posted: 18 Jul 2012, 20:41:00 UTC Last modified: 18 Jul 2012, 20:44:44 UTC According to the server status page Oscar is now the replica db and can keep up with the master so far (which is not surprising, since it's same powerful as Carolyn). So probably we don't need any workarounds for generating the stats anymore. ID: 1261973 ·

Juha Volunteer tester Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0	Message 1261981 - Posted: 18 Jul 2012, 21:08:17 UTC - in response to Message 1261964. Last modified: 18 Jul 2012, 21:09:27 UTC Richard wrote: I've understood that we read forums from replica. We're usually able to post to, and read back from, the forum in real time even when the replica is over a day behind the master, or completely disabled. Too many parts to remember how everything is wired... Yeah I didn't think what I was saying and got the forums and task listing mixed up. Link wrote: According to the server status page Oscar is now the replica db and can keep up with the master so far (which is not surprising, since it's same powerful as Carolyn). So probably we don't need any workarounds for generating the stats anymore. Before I posted I checked how jocelyn was doing but completely missed that they had changed servers. Well anyway, Google has access to the ever-changing workunit/result list. Whether or not the servers can handle it it's not very useful to index something that's gone in a few days. ID: 1261981 ·

Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0	Message 1261995 - Posted: 18 Jul 2012, 21:28:34 UTC - in response to Message 1261981. Well anyway, Google has access to the ever-changing workunit/result list. Whether or not the servers can handle it it's not very useful to index something that's gone in a few days. According to robots.txt results (and many other things) should not be indexed. I'd assume that google (and most other search engines) follows that. ID: 1261995 ·

Juha Volunteer tester Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0	Message 1262004 - Posted: 18 Jul 2012, 21:38:58 UTC - in response to Message 1261995. According to robots.txt results (and many other things) should not be indexed. I'd assume that google (and most other search engines) follows that. Yes, except that somehow, somewhere Google found a link that lets it access all the things listed in robots.txt. If you don't believe me try searching for some workunits. I didn't want to write down the links in full because I think that would be an invitation for all the other search engines. I thought that writing /sah/ and /beta// would be obvious enough but maybe I was wrong. ID: 1262004 ·

Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0	Message 1262018 - Posted: 18 Jul 2012, 22:10:52 UTC - in response to Message 1262004. Hmm... the issue seems to be, that all those things can also be accessed over "/sah//" and "/sah/", which are not excluded... so basically the robots.txt is wrong, either the entire /sah should be disallowed or subdirectories of it like for beta. ID: 1262018 ·

Swibby Bear Send message Joined: 1 Aug 01 Posts: 246 Credit: 7,945,093 RAC: 0	Message 1262082 - Posted: 19 Jul 2012, 2:21:41 UTC - in response to Message 1259580. And lest we forget as an added bonus we now have Oscar up for grabs in our server closet... Maybe we're starting to see the useful portions of the database not fitting into memory on Jocelyn anymore... Well, that didn't take long! Oscar replacing Jocelyn was sure a fast way to get more memory and horsepower onto the secondary database. Is the Oscar move expected to be permanent, or can you add more ram to Jocelyn and return her to the same old duties? At first blush, Oscar seems over-powered for that job. You all are stars in our eyes! Whit ID: 1262082 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.