Lesson Today: Databases (Apr 29 2008)

Message boards : Technical News : Lesson Today: Databases (Apr 29 2008)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 745857 - Posted: 29 Apr 2008, 22:08:03 UTC

During today's outage, Jeff and I did yet more reorganization of room 329, culminating in finally, for the first time ever, putting sidious in a rack. This was a major step in filling this particular rack, which will hopefully replace one of the three racks in the closet sooner than later. We also did the steps to rebuild the replica database, which is happening in the background now. May complete tonight or tomorrow, and then it shall "catch up" quickly after that and we'll be back in business on that front.

Clarifying the bottleneck I mentioned yesterday - this is strictly due to our current data processing rate. Drives with raw data come in, which we always archive to off site storage as well as copy into our processing directory (where the splitters read them to make workunits). In a perfect world, we'd be processing data as fast as we archive them, but to do so would require a lot more active users. So frequently our 8 terabyte processing directory fills up with unsplit data, and everything logjams. So this isn't a database bottleneck - it's a data bottleneck. More people/computers is the solution.

Still, people asked for more info about the quality/quantity of database throughput. Here's a short essay about that. This is by no means complete it's but a good start.

We have two databases, the mysql database which is BOINC specific (running on jocelyn, replicated on sidious - we call it the "BOINC" database), and the informix database which is SETI specific (running on thumper, replicated on bambi - we call it the "science" database).

The science database, while very very large (billions of rows) is not a problem under normal conditions, even as we insert over million new rows every day. This is because inserts are generally at the ends of tables, so it's all pretty much sequential writes and that's it. With the introduction of actual scientific data analysis comes large numbers of random access reads. Earlier this years tests using the NTPCkr (our software to do such analysis) showed this will be a problem so we spent a couple months reconfiguring the science database server/RAID systems to optimize random access performance. We seem to be in the clear for now as we continue NTPCkr testing.

The BOINC database is largely where problems arise, partially because this is our public facing database, i.e. users notice quickly when it isn't working. This contains all data pertaining to user stats, the web site, result/workunit flow, and the whole BOINC backend state machine. On average it gets about 600 queries per second, peaking at well over 2000 per second (like now, as we recover from today's outage). Thanks to many years of gaining expertise forming proper queries and creating proper indexes, 99% of these queries are super duper fast. But there are still unavoidable issues.

The lifetime of a particular workunit and its constituent results is long, as they are created, sit on disk waiting to be sent, hang out in the database as users process them after which they succomb to the whole validation/assimilation/deletion cycle, and finally get purged after a 24 grace period (so users can still see finished results up on the web for some time after completion).

Due to this lifetime at any given point we have roughly 3 million workunits and 6 million results in the BOINC database. This is all important data, but it's mostly metadata - the scientific stuff is contained on larger files on disk. So even with these large tables, and the user/host tables, and forum/post/thread tables, all the commonly accessed parts of the database fit into memory cache when it's all "tightly packed."

We create upwards to a million workunits/results a day in this database, which means the tables would immediately grow too large to be useful, which is why we purge (i.e. delete) them when they are finished - the useful data has been assimilated into the science database at this point anyhow. But deleting isn't in sequence - it's random as results don't return in sequential order. When rows are deleted from a mysql table, it doesn't free up space until ALL rows from the entire database page are deleted - something that isn't likely when done in random order. So even though row counts remain stagnant on these two tables, the tables bloat to roughly twice the size on disk by weeks' end, and mysql memory cache takes a major hit. This is why we have a weekly outage to, among other things, compress the tables (or "repack" them).

Meanwhile, there are daily unavoidable long queries, for example to do user/host/team stats dumps. To dump all this data means reading in whole tables into memory (not just pertinent rows/fields) - queries like this temporarily choke memory cache. Indexes won't help - we're reading in everything no matter what.

Also meanwhile, I haven't mentioned the "credited_job" table which is actually the largest table in the BOINC database. We're still just inserting into it (harmless sequential writes) but I'm afraid this is a disaster waiting to happen once we start actually reading from it.

Bottom line, the BOINC/mysql database is usually fine as of now. It beautifully handles a stunning variety of queries from several public servers and a rather busy backend. A perfect open source solution that folds nicely into the general BOINC philosophy (keep it standard and free). SETI@home is rather large compared to other BOINC projects, so we had to put a lot more TLC into maintaining our mysql servers, and we pass our improvements on to the general BOINC community.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 745857 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 745866 - Posted: 29 Apr 2008, 22:21:54 UTC


. . . breaking iT all down to terms that can be understood by individual users - nice work Matt - Thanks for the Explanation Sir

< and that Thanks goes out to the rest of the crew @ Berkeley as usual




BOINC Wiki . . .

Science Status Page . . .
ID: 745866 · Report as offensive
Nicolas
Avatar

Send message
Joined: 30 Mar 05
Posts: 161
Credit: 12,985
RAC: 0
Argentina
Message 745877 - Posted: 29 Apr 2008, 22:49:57 UTC - in response to Message 745857.  

Bottom line, the BOINC/mysql database is usually fine as of now. It beautifully handles a stunning variety of queries from several public servers and a rather busy backend.

So when people say "we can't add that feature because it would need more DB queries and SETI@Home's DB would explode if it takes a single query per second more than now"... it's actually just a myth?


Contribute to the Wiki!
ID: 745877 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 745887 - Posted: 29 Apr 2008, 23:12:45 UTC - in response to Message 745877.  

So when people say "we can't add that feature because it would need more DB queries and SETI@Home's DB would explode if it takes a single query per second more than now"... it's actually just a myth?


That statement is an oversimplification - it really depends on the query. For example, we can handle a ton more web page queries. Indexed lookups are basically zero load. But when we turn on the "resend lost results" query it blows everything up. Random access joins, even with indexes, can be quite painful. This is general over any kind of database, not just mysql.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 745887 · Report as offensive
Nicolas
Avatar

Send message
Joined: 30 Mar 05
Posts: 161
Credit: 12,985
RAC: 0
Argentina
Message 745897 - Posted: 29 Apr 2008, 23:52:17 UTC - in response to Message 745887.  
Last modified: 29 Apr 2008, 23:53:48 UTC

That statement is an oversimplification - it really depends on the query. For example, we can handle a ton more web page queries. Indexed lookups are basically zero load. But when we turn on the "resend lost results" query it blows everything up. Random access joins, even with indexes, can be quite painful. This is general over any kind of database, not just mysql.

Task count on the host list was recently removed claiming DB load. Was that query actually causing too much load?

(do not try to make a link between my oversimplified statement on the previous post and the task count thing; different issues)

EDIT: and yes, I admit that statement was a huge oversimplification :)

Contribute to the Wiki!
ID: 745897 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 745954 - Posted: 30 Apr 2008, 3:03:24 UTC - in response to Message 745887.  
Last modified: 30 Apr 2008, 3:04:52 UTC

That statement is an oversimplification - it really depends on the query. For example, we can handle a ton more web page queries. Indexed lookups are basically zero load. But when we turn on the "resend lost results" query it blows everything up. Random access joins, even with indexes, can be quite painful. This is general over any kind of database, not just mysql.

- Matt


My experience says that's true, unless the joins are unique index lookups. Then, it's very small overhead. Any kind of count() or non-unique lookup will significantly slow a query down.

Honestly, I've never given mysql much credit until I hear about what it deals with everyday with SETI. You have sited statistics "...peaking at well over 2000 per second..."--an intense load for any OLTP database--showing mysql's capability and performance. It's enough to stand up and take notice.
ID: 745954 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 746114 - Posted: 30 Apr 2008, 15:25:01 UTC
Last modified: 30 Apr 2008, 15:38:41 UTC

There always seems to be a lot of discussion about and energy spent on the replica database. What is it and what purpose does it provide? (I had always assumed that the database being replicated was also being archived from time to time- somewhere.) -Thx

(By the way, Matt, you did an awesome job in your post this morning. I don't usually complement you every day as some do, but you really deserve it for today's effort. Your next job could be as a science writer for the NYT!)
ID: 746114 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 746124 - Posted: 30 Apr 2008, 16:08:24 UTC - in response to Message 745897.  

Some replies...

Task count on the host list was recently removed claiming DB load. Was that query actually causing too much load?


What feature/query are you referring to exactly? Bear in mind there are lots of BOINC features that come and go without my knowledge as I'm mostly focused on SETI stuff and not user interface stuff.

My experience says that's true, unless the joins are unique index lookups. Then, it's very small overhead.


Right... and I think this is the case sometimes. My experience is that indexes aren't always helpful (depending on memory usage, disk i/o, if the binary tree is way too flat to be of that much use, etc.). We try to keep the number of indexes to a minimum, anyway.

There always seems to be a lot of discussion about and energy spent on the replica database. What is it and what purpose does it provide?


The replica is for at least two purposes. First, it's an immediate hot backup (if the primary fails, we can switch over to using the replica. Second, we can offload some of the big queries to the replica (as long as they don't update anything) to help reduce the strain on the master. In theory, we could also use it to vastly reduce our Tuesday outage time by swapping between the two every week, but that was proving too much trouble and we like the regular outage anyway to do hardware related stuff.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 746124 · Report as offensive
Nicolas
Avatar

Send message
Joined: 30 Mar 05
Posts: 161
Credit: 12,985
RAC: 0
Argentina
Message 746130 - Posted: 30 Apr 2008, 16:25:16 UTC - in response to Message 746124.  

Task count on the host list was recently removed claiming DB load. Was that query actually causing too much load?


What feature/query are you referring to exactly? Bear in mind there are lots of BOINC features that come and go without my knowledge as I'm mostly focused on SETI stuff and not user interface stuff.

http://setiathome.berkeley.edu/hosts_user.php
Used to have a "Tasks" column, linking to the task list, and also saying how many tasks the host had. It was completely removed. After user complaints, the link got added back, but task count didn't come back claiming DB load.

Ok, the query used was probably something like SELECT COUNT(*) FROM result WHERE hostid=X, one query per host... Which probably sucks. And might be faster if done as a single query SELECT hostid, COUNT(*) FROM result WHERE userid=X GROUP BY hostid; And would be definitely faster if task count was cached in the host table (although that would need changing the backend to keep the field updated). But still... is it *that* bad?


Contribute to the Wiki!
ID: 746130 · Report as offensive

Message boards : Technical News : Lesson Today: Databases (Apr 29 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.