Message boards :
Number crunching :
Database Specs
Message board moderation
Author | Message |
---|---|
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Hi everyone. I was having a talk with a colleague about getting off Oracle database system and using MySQL (or some other alternative). I mentioned SETI, and we were wondering what the size of the data is at SETI. IIRC, there are 2 databases: science and boinc web. Does anyone here know what technologies each one uses and approximately what size these are? Also, about how many transactions per day does SETI handle? Thanks in advance for the info. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
From memory: I think the size is >1 TB BOINC Database - "Master database queries/second" is often >2000 http://setiathome.berkeley.edu/sah_status.html At the bottom of page: Programs: BOINC master database: The mysql database that contains all BOINC related information (user stats, forum messages, basic workunit/result information, etc.). BOINC replica database: A back-up server which contains an identical copy of everything in the BOINC database. Read-only queries can be aimed at this server to lessen the load on the BOINC database. SETI@home science database: The informix database that contains final science products returned by SETI@home clients. Astropulse science database: The informix database that contains final science products returned by Astropulse clients. Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Thanks for the info. I'm still interested in the rough estimate of DB size. Any idea? |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I know the DB is huge. But we don't need access to all of it at all times. The "active" portion of it (results out in the field) is, I believe, somewhere between 100-150GB. It is optimized in such a way that after an hour or two of the server being rebooted, the entire "active" section of the DB fits in RAM as a sort-of cached copy, and changes get written to disk as they happen. Also, I think I very vaguely remember Matt mentioning ball-park figures for the DB sizes at least a year ago, but less than 3 years ago in one of the Tech News posts. I know I remember seeing that the DB was over 1TB, and i want to say it was over 10TB. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
That is pretty big. I was talking to a co-worker about how big a MySQL database can get, properly configured and with good hardware. That's what spawned this discussion. Who needs Oracle or M$SQL? :) |
Ulrich Metzner Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 |
There is a reason for Oracle and IBM DB2 databases, believe me! If not, you could get all thinkable applications handled by H2DB. >;) http://www.h2database.com/html/main.html Aloha, Uli |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I decided to do some digging through the search function. First mention of a chunk of data that I see, from 2012, georgem was holding ~13TB of data from a Green Bank galactic center survey. Early 2011, a tally of what makes up the S@H project, 26 systems, 100 CPUs, 500GB RAM, 150TB of raw storage capacity. Late 2010, somebody did some research on Informix, supporting a maximum database size of 4TB (I wonder if IBM has updated the capabilities since then?) In 2009, Matt mentioned part of the purpose of the weekly maintenance, to defragment the 20GB of 'active' database fragmentation, to the 800GB total. (Remember, that was 5 years ago, and GPUs have come a LONG way since then.) And.. I couldn't really find anything else noteworthy. Other than the mention of when the resultIDs hit a DB limit of 2^31 (signed 4-byte integer) and Matt had to reconfigure things and it is now an unsigned 8-byte integer, "meaning we can now add a million results per day for the next 6.3 billion years. If S@H is still searching 6.3 billion years from now, I think we can all agree there's no one else out there, right?" But I know I've read something from Matt before that said the DB was close to or just over 10TB, I just can't find it. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
OK, that gives me some ball-park figures to use. Thanks for everyone's help! |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Resurrected this one because of new info. As of December 2014, the AP database alone is ~4.5 TB. I would have to imagine the MB database is at least 3 times that size, simply because of how much longer it has been around and how many more WUs it has processed. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Resurrected this one because of new info. That was discussed quite a bit over in the News and Tech News threads regarding that. It is an old version of Informix, but they continue to (or did for a while) get updates/patches that may have fixed that issue. I also know I remember reading that there were talks of splitting the DB into smaller pieces.. so that old portions from years ago can be archived off to the side, leaving the active DB smaller, and therefore, with less overhead. I don't know how that ended up working out, or if that even worked. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.