Database Specs

Message boards : Number crunching : Database Specs
Message board moderation

To post messages, you must log in.

AuthorMessage
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 1578172 - Posted: 26 Sep 2014, 14:06:31 UTC

Hi everyone. I was having a talk with a colleague about getting off Oracle database system and using MySQL (or some other alternative). I mentioned SETI, and we were wondering what the size of the data is at SETI. IIRC, there are 2 databases: science and boinc web. Does anyone here know what technologies each one uses and approximately what size these are? Also, about how many transactions per day does SETI handle?

Thanks in advance for the info.
ID: 1578172 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1578466 - Posted: 26 Sep 2014, 20:52:04 UTC - in response to Message 1578172.  

From memory: I think the size is >1 TB

BOINC Database - "Master database queries/second" is often >2000
http://setiathome.berkeley.edu/sah_status.html

At the bottom of page:

Programs:
BOINC master database: The mysql database that contains all BOINC related information (user stats, forum messages, basic workunit/result information, etc.).
BOINC replica database: A back-up server which contains an identical copy of everything in the BOINC database. Read-only queries can be aimed at this server to lessen the load on the BOINC database.

SETI@home science database: The informix database that contains final science products returned by SETI@home clients.
Astropulse science database: The informix database that contains final science products returned by Astropulse clients.
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1578466 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 1579030 - Posted: 28 Sep 2014, 12:17:06 UTC - in response to Message 1578466.  

Thanks for the info. I'm still interested in the rough estimate of DB size. Any idea?
ID: 1579030 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1579274 - Posted: 29 Sep 2014, 4:42:26 UTC

I know the DB is huge. But we don't need access to all of it at all times. The "active" portion of it (results out in the field) is, I believe, somewhere between 100-150GB.

It is optimized in such a way that after an hour or two of the server being rebooted, the entire "active" section of the DB fits in RAM as a sort-of cached copy, and changes get written to disk as they happen.



Also, I think I very vaguely remember Matt mentioning ball-park figures for the DB sizes at least a year ago, but less than 3 years ago in one of the Tech News posts. I know I remember seeing that the DB was over 1TB, and i want to say it was over 10TB.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1579274 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 1579374 - Posted: 29 Sep 2014, 13:58:56 UTC

That is pretty big. I was talking to a co-worker about how big a MySQL database can get, properly configured and with good hardware. That's what spawned this discussion. Who needs Oracle or M$SQL? :)
ID: 1579374 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1579414 - Posted: 29 Sep 2014, 16:29:17 UTC

There is a reason for Oracle and IBM DB2 databases, believe me!

If not, you could get all thinkable applications handled by H2DB. >;)
http://www.h2database.com/html/main.html
Aloha, Uli

ID: 1579414 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1579775 - Posted: 30 Sep 2014, 4:29:07 UTC

I decided to do some digging through the search function.

First mention of a chunk of data that I see,
from 2012, georgem was holding ~13TB of data from a Green Bank galactic center survey.

Early 2011, a tally of what makes up the S@H project, 26 systems, 100 CPUs, 500GB RAM, 150TB of raw storage capacity.

Late 2010, somebody did some research on Informix, supporting a maximum database size of 4TB (I wonder if IBM has updated the capabilities since then?)

In 2009, Matt mentioned part of the purpose of the weekly maintenance, to defragment the 20GB of 'active' database fragmentation, to the 800GB total. (Remember, that was 5 years ago, and GPUs have come a LONG way since then.)



And.. I couldn't really find anything else noteworthy. Other than the mention of when the resultIDs hit a DB limit of 2^31 (signed 4-byte integer) and Matt had to reconfigure things and it is now an unsigned 8-byte integer, "meaning we can now add a million results per day for the next 6.3 billion years. If S@H is still searching 6.3 billion years from now, I think we can all agree there's no one else out there, right?"

But I know I've read something from Matt before that said the DB was close to or just over 10TB, I just can't find it.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1579775 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 1583061 - Posted: 7 Oct 2014, 13:24:49 UTC

OK, that gives me some ball-park figures to use. Thanks for everyone's help!
ID: 1583061 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1613495 - Posted: 13 Dec 2014, 19:58:33 UTC

Resurrected this one because of new info.

As of December 2014, the AP database alone is ~4.5 TB.

I would have to imagine the MB database is at least 3 times that size, simply because of how much longer it has been around and how many more WUs it has processed.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1613495 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1613522 - Posted: 13 Dec 2014, 20:41:34 UTC - in response to Message 1613500.  

Resurrected this one because of new info.

As of December 2014, the AP database alone is ~4.5 TB.

I would have to imagine the MB database is at least 3 times that size, simply because of how much longer it has been around and how many more WUs it has processed.


And you wrote in Sep:

"Late 2010, somebody did some research on Informix, supporting a maximum database size of 4TB (I wonder if IBM has updated the capabilities since then?)"

As far as I know, SETI is using an old, by IBM donated informix version. I wonder if this is part of the recent problems?

That was discussed quite a bit over in the News and Tech News threads regarding that. It is an old version of Informix, but they continue to (or did for a while) get updates/patches that may have fixed that issue.

I also know I remember reading that there were talks of splitting the DB into smaller pieces.. so that old portions from years ago can be archived off to the side, leaving the active DB smaller, and therefore, with less overhead.

I don't know how that ended up working out, or if that even worked.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1613522 · Report as offensive

Message boards : Number crunching : Database Specs


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.