Message boards :
Technical News :
Marching on... (March 4, 2015)
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Julie Send message Joined: 28 Oct 09 Posts: 34060 Credit: 18,883,157 RAC: 18 |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13847 Credit: 208,696,464 RAC: 304 |
By the way my estimates were off. Without any changes to what I was doing as of last week, it will take 5 months to rebuild the astropulse database!! But we can and will do a lot better than that.... Is the bottleneck disk I/O, or the database itself I/O? I remember back in the days of DBase with stepper motor HDDs I had a large (for the time) database that was taking ages to export to a new database. We got a new system with a Voice Coil HDD, and while things were a lot faster, they weren't nearly as fast as we expected them to be. In the end we found we were better off transferring the old database to the new without any indexes, the rebuilding all the indexes on the new database from scratch. Grant Darwin NT |
Ananas Send message Joined: 14 Dec 01 Posts: 195 Credit: 2,503,252 RAC: 0 |
... but has partitioning been looked at ... That was my first idea too, I think it would be worth to think about. Our DataWarehouse guys do that with DB2 databases (on AIX, storage on EMC² SAN, temp dbspace on SSDs, which are also located in the EMC²) and I know that Informix tables have such an option too. Another thing to fiddle with would be the extent sizes, things like making the first extent as large as possible and make the next extents very large too, in order to reduce the total amount of extents ... if that hasn't already been done anyways. Default size is 16, which only makes sense for small tables up to 32k. The rule how we calculate the extent sizes for large tables : Size limit is 1963000, if a table fills that, we use it for the 1st extent. If the table is large enough to fill more than 7 of those max. extents, we choose 1963000 as the next extent size too. If the table is large enough to fill from 4 up to 7 more of those max. extents, we choose 1963000/2 as the next extent size. If the table is large enough to fill from 1 up to 3 more of those max. extents, we choose 1963000/3 as the next extent size. This is not my own knowledge, our Informix admin gave us those rules for optimizing tables with a too high total number of extents - he usually starts to get nervous when there are more than 50 extents for one table. (I didn't even know that SETI uses Informix btw.) |
JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1 |
One possible reason a newer database hasn't been tried I would imagine is the cost, unless a suitable freeware db already exists. In that case I would imagine staff time in migration and integration would be the limiting factor. "Sour Grapes make a bitter Whine." <(0)> |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
One possible reason a newer database hasn't been tried I would imagine is the cost, unless a suitable freeware db already exists. In that case I would imagine staff time in migration and integration would be the limiting factor. Well, yeah, but it would seem to be time to bite the bullet on that one. The main problem seems to be the use of "academic" DBs (MySQL, Informix) which just were not designed with heavy usage (like SETI) in mind. Perhaps a license for something more suited could be obtained? Anybody? And staff time is already being over-consumed either because of (IMO) unsuitable DBs, or (maybe) lack of staff knowledge of the intricacies of the DBs in use (not an attack, just a possibility), so what's to lose? |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
One possible reason a newer database hasn't been tried I would imagine is the cost, unless a suitable freeware db already exists. In that case I would imagine staff time in migration and integration would be the limiting factor. Anybody know what the license fees for a "suitable" db program would be? Maybe that's where our next major fundraising campaign should go. Donald Infernal Optimist / Submariner, retired |
Dad Send message Joined: 21 May 99 Posts: 44 Credit: 35,266,844 RAC: 10 |
Maybe we could lobby Eric to allow a fairly open ended Bitcoin Utopia campaign to start the funding until a new db product can be found. I would contribute considerable amount of computer time to it exclusively. With all of the Seti users, 1 or 2 percent of computer resources from each, it should not take too long. Perhaps it could be used to fund a db administrator/programmer for the project. Just my thoughts Dad |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Okay... Looks like using a method Eric suggests of rebuilding the database in 8 separate fragments and then merging them will actually speed things up considerably. Like 6-7 weeks instead of 5 months. That's something. I'm still not exactly sure what the bottleneck is - the disk/cpu i/o is basically nothing - this is some mysterious informix internal voodoo. Since we're getting this time down, we're considering in parallel restarting the actual Astropulse database server (not the temporary one I'm doing all this rebuilding on) just to insert new signals in the meantime. And then we can merge 6-7 weeks of new data into the fixed database and we'll be back on track. Anyway, still a work in progress, but we may get AP back online fairly soon. Key words: "may" and "fairly." Also Dave reminded to turn of the "resend lost results" flag when the BOINC/mysql database is getting overload as it has recently, so I did that today. That will help general slowness we've been seeing with the scheduler lately. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Thanks for the update Matt, and good to see you back. Do you think a different database would behave any better? It looks like if there's a need, people are willing. Wish they would do that for staff too, but hey, take your wins where you can! |
betreger Send message Joined: 29 Jun 99 Posts: 11414 Credit: 29,581,041 RAC: 66 |
The question that keeps bugging me is since so many more MBs are produced vs APs why hasn't this problem surfaced there? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Also Dave reminded to turn of the "resend lost results" flag when the BOINC/mysql database is getting overload as it has recently, so I did that today. That will help general slowness we've been seeing with the scheduler lately. Err, Matt, are you sure that's a good idea? May I remind you of the events of Sunday, 4 November 2012, and my email thread "SETI server - runaway database bloat". Eric replied I've stopped the splitters and doubled the httpd timeout. David We got up to more than 10 million results "in the field" (and hence in the database too) that weekend. David may be an inspired and instinctive coder, but I don't think he's aware of all the feedback loops in server administration. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I would say because files didn't get deleted from the MB database. |
betreger Send message Joined: 29 Jun 99 Posts: 11414 Credit: 29,581,041 RAC: 66 |
I would say because files didn't get deleted from the MB database. If that is true then the MB database keeps getting bigger, isn't that the problem that the AP database is too big? |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
I would say because files didn't get deleted from the MB database. Are you talking about the Seti@Home/MB Science (MySQL) Database, or the Master & replica BOINC (Informix) databases? The BOINC databases deal with results creation, distribution, & validation, and have records added & deleted frequently, whereas the Science databases only get added to. Donald Infernal Optimist / Submariner, retired |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I would say because files didn't get deleted from the MB database. And also, the BOINC database keeps track of things like user accounts, machines, WUs that are in play, and these forums. The science DB has much more data in it: You know how every MB has spikes, triplets, pulses, gaussians, and auto-correlation counts? Every single one of those that gets found in every single WU.. becomes a new row in the respective tables. Based on what I'm seeing on my one cruncher, we've passed 4 billion tasks, and are nearing 1.75 billion WUs. Considering MB has five kinds of signals, and can have 30(?) of each, that adds up to a massive number of rows in the DB. Plus there's everything that AP has found thus far, with its two kinds of signals. I know I'm being rather generalized in the explanation here, but the science DB has easily 10x more rows than the BOINC DB, and much larger tables, and more of them, with a lot of indexes. It's no wonder the science DB is struggling very hard. I know AP has been stored in a separate DB, but it is still tables and indexes. It would seem that the reason the MB side of things is fine and AP is not.. is probably because for whatever reason, the MB tables are configured..more efficiently? AP seems to have been playing "catch-up" for a while.. new problems get found, band-aids and quick fixes are applied to resolve the problem, and then a larger problem comes from that before anyone had the time to get around to fixing the first one, and it just kind of.. snowballed into a massive crash. So now the only option there is.. is to fix it right and try to future-proof it, too. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
If I'm not mistaken, It is the Science Databases that are Infomix not the Master/User databases. |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
The science DB has much more data in it: Really? Based on what I'm seeing on my one cruncher, we've passed 4 billion tasks, and are nearing 1.75 billion WUs. Considering MB has five kinds of signals, and can have 30(?) of each, that adds up to a massive number of rows in the DB. No, there can only be 30 signals total, in any combination of the 5 types. It can be 30 spikes, or 15 spikes and 15 pulses, or 22-2-2-2-2, but not 30-30-30-30-30. Right now I have an inconclusive where my wingman overflowed with 24-4-2-0-0. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
The science DB has much more data in it: Yes, really. The BOINC database is probably 'wider', in that it holds more data about any one WU/task, but the science database is much, much, 'deeper'. The BOINC database contains the equivalent of about 3 days average crunching: the science database contains 15 years of results. |
Bill Butler Send message Joined: 26 Aug 03 Posts: 101 Credit: 4,270,697 RAC: 0 |
... the science database contains 15 years of results. So I assume, then, that the science database is the one that Nitpicker will work on (when enough staff and time can be resourced (i.e., funded) ). If so, it is a relief that an independent, largely uncoupled, database will be used, instead of the over loaded MB and AP databases. Of course messin' with that big beast is askin' for trouble too. "It is often darkest just before it turns completely black." |
Julie Send message Joined: 28 Oct 09 Posts: 34060 Credit: 18,883,157 RAC: 18 |
The science DB has much more data in it: Paddym is the best! Marvin causes a lot of worries but without him, the project wouldn't be complete:) rOZZ Music Pictures |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.