Marching on... (March 4, 2015)

Author	Message
Julie Volunteer moderator Volunteer tester Send message Joined: 28 Oct 09 Posts: 34053 Credit: 18,883,157 RAC: 18	Message 1650184 - Posted: 7 Mar 2015, 0:46:53 UTC - in response to Message 1650094. [quote]Cosmic you are making a lot of sense. I try. He does make a lot of sense. Thanx for the update Matt! rOZZ Music Pictures ID: 1650184 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1650197 - Posted: 7 Mar 2015, 1:33:49 UTC - in response to Message 1650150. Last modified: 7 Mar 2015, 1:41:08 UTC By the way my estimates were off. Without any changes to what I was doing as of last week, it will take 5 months to rebuild the astropulse database!! But we can and will do a lot better than that.... - Matt Is the bottleneck disk I/O, or the database itself I/O? I remember back in the days of DBase with stepper motor HDDs I had a large (for the time) database that was taking ages to export to a new database. We got a new system with a Voice Coil HDD, and while things were a lot faster, they weren't nearly as fast as we expected them to be. In the end we found we were better off transferring the old database to the new without any indexes, the rebuilding all the indexes on the new database from scratch. Grant Darwin NT ID: 1650197 ·

Ananas Volunteer tester Send message Joined: 14 Dec 01 Posts: 195 Credit: 2,503,252 RAC: 0	Message 1650401 - Posted: 7 Mar 2015, 19:13:10 UTC - in response to Message 1649772. Last modified: 7 Mar 2015, 19:35:51 UTC ... but has partitioning been looked at ... That was my first idea too, I think it would be worth to think about. Our DataWarehouse guys do that with DB2 databases (on AIX, storage on EMCÂ² SAN, temp dbspace on SSDs, which are also located in the EMCÂ²) and I know that Informix tables have such an option too. Another thing to fiddle with would be the extent sizes, things like making the first extent as large as possible and make the next extents very large too, in order to reduce the total amount of extents ... if that hasn't already been done anyways. Default size is 16, which only makes sense for small tables up to 32k. The rule how we calculate the extent sizes for large tables : Size limit is 1963000, if a table fills that, we use it for the 1st extent. If the table is large enough to fill more than 7 of those max. extents, we choose 1963000 as the next extent size too. If the table is large enough to fill from 4 up to 7 more of those max. extents, we choose 1963000/2 as the next extent size. If the table is large enough to fill from 1 up to 3 more of those max. extents, we choose 1963000/3 as the next extent size. This is not my own knowledge, our Informix admin gave us those rules for optimizing tables with a too high total number of extents - he usually starts to get nervous when there are more than 50 extents for one table. (I didn't even know that SETI uses Informix btw.) ID: 1650401 ·

JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1	Message 1650708 - Posted: 8 Mar 2015, 16:55:08 UTC One possible reason a newer database hasn't been tried I would imagine is the cost, unless a suitable freeware db already exists. In that case I would imagine staff time in migration and integration would be the limiting factor. "Sour Grapes make a bitter Whine." <(0)> ID: 1650708 ·

Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340	Message 1650736 - Posted: 8 Mar 2015, 18:26:08 UTC - in response to Message 1650708. One possible reason a newer database hasn't been tried I would imagine is the cost, unless a suitable freeware db already exists. In that case I would imagine staff time in migration and integration would be the limiting factor. Well, yeah, but it would seem to be time to bite the bullet on that one. The main problem seems to be the use of "academic" DBs (MySQL, Informix) which just were not designed with heavy usage (like SETI) in mind. Perhaps a license for something more suited could be obtained? Anybody? And staff time is already being over-consumed either because of (IMO) unsuitable DBs, or (maybe) lack of staff knowledge of the intricacies of the DBs in use (not an attack, just a possibility), so what's to lose? ID: 1650736 ·

Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20	Message 1650798 - Posted: 8 Mar 2015, 20:33:04 UTC - in response to Message 1650736. One possible reason a newer database hasn't been tried I would imagine is the cost, unless a suitable freeware db already exists. In that case I would imagine staff time in migration and integration would be the limiting factor. Well, yeah, but it would seem to be time to bite the bullet on that one. The main problem seems to be the use of "academic" DBs (MySQL, Informix) which just were not designed with heavy usage (like SETI) in mind. Perhaps a license for something more suited could be obtained? Anybody? And staff time is already being over-consumed either because of (IMO) unsuitable DBs, or (maybe) lack of staff knowledge of the intricacies of the DBs in use (not an attack, just a possibility), so what's to lose? Anybody know what the license fees for a "suitable" db program would be? Maybe that's where our next major fundraising campaign should go. Donald Infernal Optimist / Submariner, retired ID: 1650798 ·

Dad Volunteer tester Send message Joined: 21 May 99 Posts: 44 Credit: 35,266,844 RAC: 10	Message 1650836 - Posted: 8 Mar 2015, 21:57:39 UTC Maybe we could lobby Eric to allow a fairly open ended Bitcoin Utopia campaign to start the funding until a new db product can be found. I would contribute considerable amount of computer time to it exclusively. With all of the Seti users, 1 or 2 percent of computer resources from each, it should not take too long. Perhaps it could be used to fund a db administrator/programmer for the project. Just my thoughts Dad ID: 1650836 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 1651168 - Posted: 9 Mar 2015, 23:22:46 UTC Okay... Looks like using a method Eric suggests of rebuilding the database in 8 separate fragments and then merging them will actually speed things up considerably. Like 6-7 weeks instead of 5 months. That's something. I'm still not exactly sure what the bottleneck is - the disk/cpu i/o is basically nothing - this is some mysterious informix internal voodoo. Since we're getting this time down, we're considering in parallel restarting the actual Astropulse database server (not the temporary one I'm doing all this rebuilding on) just to insert new signals in the meantime. And then we can merge 6-7 weeks of new data into the fixed database and we'll be back on track. Anyway, still a work in progress, but we may get AP back online fairly soon. Key words: "may" and "fairly." Also Dave reminded to turn of the "resend lost results" flag when the BOINC/mysql database is getting overload as it has recently, so I did that today. That will help general slowness we've been seeing with the scheduler lately. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 1651168 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 1651170 - Posted: 9 Mar 2015, 23:32:55 UTC - in response to Message 1651168. Last modified: 9 Mar 2015, 23:35:25 UTC Thanks for the update Matt, and good to see you back. Do you think a different database would behave any better? It looks like if there's a need, people are willing. Wish they would do that for staff too, but hey, take your wins where you can! ID: 1651170 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1651190 - Posted: 10 Mar 2015, 0:12:24 UTC The question that keeps bugging me is since so many more MBs are produced vs APs why hasn't this problem surfaced there? ID: 1651190 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1651199 - Posted: 10 Mar 2015, 0:19:20 UTC - in response to Message 1651168. Also Dave reminded to turn of the "resend lost results" flag when the BOINC/mysql database is getting overload as it has recently, so I did that today. That will help general slowness we've been seeing with the scheduler lately. - Matt Err, Matt, are you sure that's a good idea? May I remind you of the events of Sunday, 4 November 2012, and my email thread "SETI server - runaway database bloat". Eric replied I've stopped the splitters and doubled the httpd timeout. David turned off "Resend Lost Result" yesterday in an attempt to fix the timeout problem. It appears that made things worse, so I'm turning it back on. I think we're going to need to at least temporarily go back to restricting workunits in progress on a per host basis and per RPC basis, regardless of what complaints we get about people being unable to keep their hosts busy. We got up to more than 10 million results "in the field" (and hence in the database too) that weekend. David may be an inspired and instinctive coder, but I don't think he's aware of all the feedback loops in server administration. ID: 1651199 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1651202 - Posted: 10 Mar 2015, 0:23:28 UTC - in response to Message 1651190. I would say because files didn't get deleted from the MB database. ID: 1651202 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1651206 - Posted: 10 Mar 2015, 0:45:48 UTC - in response to Message 1651202. I would say because files didn't get deleted from the MB database. If that is true then the MB database keeps getting bigger, isn't that the problem that the AP database is too big? ID: 1651206 ·

Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20	Message 1651252 - Posted: 10 Mar 2015, 4:13:46 UTC - in response to Message 1651206. Last modified: 10 Mar 2015, 4:15:39 UTC I would say because files didn't get deleted from the MB database. If that is true then the MB database keeps getting bigger, isn't that the problem that the AP database is too big? Are you talking about the Seti@Home/MB Science (MySQL) Database, or the Master & replica BOINC (Informix) databases? The BOINC databases deal with results creation, distribution, & validation, and have records added & deleted frequently, whereas the Science databases only get added to. Donald Infernal Optimist / Submariner, retired ID: 1651252 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1651273 - Posted: 10 Mar 2015, 5:14:08 UTC - in response to Message 1651252. I would say because files didn't get deleted from the MB database. If that is true then the MB database keeps getting bigger, isn't that the problem that the AP database is too big? Are you talking about the Seti@Home/MB Science (MySQL) Database, or the Master & replica BOINC (Informix) databases? The BOINC databases deal with results creation, distribution, & validation, and have records added & deleted frequently, whereas the Science databases only get added to. And also, the BOINC database keeps track of things like user accounts, machines, WUs that are in play, and these forums. The science DB has much more data in it: You know how every MB has spikes, triplets, pulses, gaussians, and auto-correlation counts? Every single one of those that gets found in every single WU.. becomes a new row in the respective tables. Based on what I'm seeing on my one cruncher, we've passed 4 billion tasks, and are nearing 1.75 billion WUs. Considering MB has five kinds of signals, and can have 30(?) of each, that adds up to a massive number of rows in the DB. Plus there's everything that AP has found thus far, with its two kinds of signals. I know I'm being rather generalized in the explanation here, but the science DB has easily 10x more rows than the BOINC DB, and much larger tables, and more of them, with a lot of indexes. It's no wonder the science DB is struggling very hard. I know AP has been stored in a separate DB, but it is still tables and indexes. It would seem that the reason the MB side of things is fine and AP is not.. is probably because for whatever reason, the MB tables are configured..more efficiently? AP seems to have been playing "catch-up" for a while.. new problems get found, band-aids and quick fixes are applied to resolve the problem, and then a larger problem comes from that before anyone had the time to get around to fixing the first one, and it just kind of.. snowballed into a massive crash. So now the only option there is.. is to fix it right and try to future-proof it, too. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1651273 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1651279 - Posted: 10 Mar 2015, 5:35:17 UTC - in response to Message 1651252. If I'm not mistaken, It is the Science Databases that are Infomix not the Master/User databases. ID: 1651279 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1651373 - Posted: 10 Mar 2015, 13:26:31 UTC - in response to Message 1651273. The science DB has much more data in it: You know how every MB has spikes, triplets, pulses, gaussians, and auto-correlation counts? Every single one of those that gets found in every single WU.. becomes a new row in the respective tables. Really? Based on what I'm seeing on my one cruncher, we've passed 4 billion tasks, and are nearing 1.75 billion WUs. Considering MB has five kinds of signals, and can have 30(?) of each, that adds up to a massive number of rows in the DB. No, there can only be 30 signals total, in any combination of the 5 types. It can be 30 spikes, or 15 spikes and 15 pulses, or 22-2-2-2-2, but not 30-30-30-30-30. Right now I have an inconclusive where my wingman overflowed with 24-4-2-0-0. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1651373 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1651384 - Posted: 10 Mar 2015, 13:52:53 UTC - in response to Message 1651373. The science DB has much more data in it: You know how every MB has spikes, triplets, pulses, gaussians, and auto-correlation counts? Every single one of those that gets found in every single WU.. becomes a new row in the respective tables. Really? Yes, really. The BOINC database is probably 'wider', in that it holds more data about any one WU/task, but the science database is much, much, 'deeper'. The BOINC database contains the equivalent of about 3 days average crunching: the science database contains 15 years of results. ID: 1651384 ·

Bill Butler Send message Joined: 26 Aug 03 Posts: 101 Credit: 4,270,697 RAC: 0	Message 1651421 - Posted: 10 Mar 2015, 16:16:06 UTC - in response to Message 1651384. ... the science database contains 15 years of results. So I assume, then, that the science database is the one that Nitpicker will work on (when enough staff and time can be resourced (i.e., funded) ). If so, it is a relief that an independent, largely uncoupled, database will be used, instead of the over loaded MB and AP databases. Of course messin' with that big beast is askin' for trouble too. "It is often darkest just before it turns completely black." ID: 1651421 ·

Julie Volunteer moderator Volunteer tester Send message Joined: 28 Oct 09 Posts: 34053 Credit: 18,883,157 RAC: 18	Message 1651457 - Posted: 10 Mar 2015, 22:04:38 UTC - in response to Message 1651384. Last modified: 10 Mar 2015, 22:08:33 UTC The science DB has much more data in it: You know how every MB has spikes, triplets, pulses, gaussians, and auto-correlation counts? Every single one of those that gets found in every single WU.. becomes a new row in the respective tables. Really? Yes, really. The BOINC database is probably 'wider', in that it holds more data about any one WU/task, but the science database is much, much, 'deeper'. The BOINC database contains the equivalent of about 3 days average crunching: the science database contains 15 years of results. Paddym is the best! Marvin causes a lot of worries but without him, the project wouldn't be complete:) rOZZ Music Pictures ID: 1651457 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.