Unexplained database slowness

Author	Message
popandbob Volunteer tester Send message Joined: 19 Mar 05 Posts: 551 Credit: 4,673,015 RAC: 0	Message 1927626 - Posted: 1 Apr 2018, 1:14:40 UTC My 2cents worth... I think perhaps its time for this project to rethink the 1 master, 1 slave database setup. I do not see any reason preventing a multi-master database or several databases. I think the best move would be splitting the database into several smaller databases. 1 ready to send, 1 in-progress, 1 waiting-validation, 1 awaiting purge. For those wondering about finances/fundraising... ever look how many times this project has managed to raise the full value of its budget? I haven't seen it happen once in the 12+ years I've been here. The simple fact that this project is still here after falling that short on the budget continuously for that many years is a testament to the time and effort put forward by the crew here to keep the doors open. Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957 Or Good Shop? http://www.goodshop.com/?charityid=888957 ID: 1927626 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30651 Credit: 53,134,872 RAC: 32	Message 1927647 - Posted: 1 Apr 2018, 5:09:30 UTC Last modified: 1 Apr 2018, 5:10:47 UTC Most posts here bitch about hardware this or hardware that. I think the real issue is software. Seti can't afford to pay monthly per core license fees for enterprise class software. So it has to get by on free software. Not that free is bad, but it just isn't tuned to the max. We have hit the limits on the free software. Faster hardware only goes so far. Being able to parallel the software processing to optimize the I/O can get spectacular gains. I any case corruption is the bane of digital data. <ed>I assume this is why Beta is down? ID: 1927647 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1927650 - Posted: 1 Apr 2018, 5:23:21 UTC - in response to Message 1927647. Last modified: 1 Apr 2018, 5:23:49 UTC Being able to parallel the software processing to optimize the I/O can get spectacular gains. Only so far as the hardware is capable. If a device is only capable of providing a particular amount of bandwidth and Input/Output operations per second, it doesn't matter what you do with the software that is making use of it- the hardware can't exceed it's physical limits. Grant Darwin NT ID: 1927650 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22204 Credit: 416,307,556 RAC: 380	Message 1927654 - Posted: 1 Apr 2018, 6:44:19 UTC As Richard said a few hours ago - it all boils down to money. Money for staff, money for hardware, money for software, money for running costs. Any change in the topology of the database would require a considerable amount of re-engineering, probably new hardware and new software, plus the staff to do the job and then the staff to maintain the system. Any change from HDD to SSD would require a substantial investment in new hardware, require staff to do the transfer, possibly require new software... I think you get the picture - MONEY, not donations of random bits of hardware, but well thought out, properly planned, specified and executed developments in both hardware and software. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1927654 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1927656 - Posted: 1 Apr 2018, 7:13:33 UTC - in response to Message 1927580. That money was for the Parkes data store & development What good is bringing more data online if you can't process the data you have now because the hardware and software infrastructure is not up to the task? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1927656 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1927660 - Posted: 1 Apr 2018, 7:43:18 UTC - in response to Message 1927656. Last modified: 1 Apr 2018, 7:44:08 UTC That money was for the Parkes data store & development What good is bringing more data online if you can't process the data you have now because the hardware and software infrastructure is not up to the task? I've asked that a few times also when it'd been said we need more crunchers to process the data we already have. The more data from more sources we have, the greater the chance of finding something. But if we can't get it to process it... shrug Grant Darwin NT ID: 1927660 ·

Dena Wiltsie Volunteer tester Send message Joined: 19 Apr 01 Posts: 1628 Credit: 24,230,968 RAC: 26	Message 1927661 - Posted: 1 Apr 2018, 7:52:39 UTC Last modified: 1 Apr 2018, 7:53:45 UTC I am on a site that attempted to use SSD drives for main storage. They would last about a year and then they would fail. The reason is SSDs wear on every write so they have a limited number of writes. This isn't a problem as long as the data base is occasionally updated however if the drives are used for temporary storage, get ready for regular replacement. Fortunately when we went through a hardware upgrade, somebody listened to me and a RAID replaced the SSD drive. Yes, both my tower and laptop came with a SSD drive and I love how fast the thing boots but I still use a good old hard drive for backup and if I have something that's data intensive, it gets moved to a hard drive. https://techreport.com/review/24841/introducing-the-ssd-endurance-experiment/5 ID: 1927661 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1927663 - Posted: 1 Apr 2018, 8:37:41 UTC - in response to Message 1927661. Last modified: 1 Apr 2018, 8:51:17 UTC I am on a site that attempted to use SSD drives for main storage. They would last about a year and then they would fail. The reason is SSDs wear on every write so they have a limited number of writes. This isn't a problem as long as the data base is occasionally updated however if the drives are used for temporary storage, get ready for regular replacement. Did they use one that was actually designed for such use? Sounds like they were using a consumer drive for enterprise type work. Enterprise SSDs are rated for the number of DWPD (Drive Writes Per Day- ie writing the full capacity of the drive in a 24 hour period), and amongst Enterprise SSDs there are those designed for mostly read intensive work (1 up to 4 DWPD), and those for write intensive work (over 4, some over 10 (although if you're writing that much data to a single drive every day, you need more and/or larger drives), consumer SSDs are around 0.4 or less)). Enterprise drives have higher levels of over provisioning, and can often be configured for greater or lesser levels using manufacturer supplied software. Enterprise drives are also able to do garbage collection while still being heavily used, Consumer drives do it during down time or when the trim command is explicitly issued. Edit- also the percentage of the drives capacity used has a huge impact on it's life expectancy for a given number of writes per day. A drive that is 85% full, will result in a huge number of writes for a given amount of data written per day & a huge decrease in it's life expectancy. The same drive, with the same amount of data written that is only 25% full, will have a significant reduction in the number of writes, and a correspondingly large increase in it's life expectancy. Grant Darwin NT ID: 1927663 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1927676 - Posted: 1 Apr 2018, 12:18:35 UTC Last modified: 1 Apr 2018, 12:33:08 UTC I had long ago suggested splitting the master database into 2 parts. An 'archive' master database, and a 'working' master database. The archive DB would include all WUs that have been completed and validated. The working DB would include all WUs currently in play and waiting for results. And this much smaller DB would be what the servers would have to deal with on a day to day basis when sending out new work. The weekly outage would include transferring all completed work from the working DB to the archive DB. And since the archive DB would be static except when being updated during the outage, it could be backed up at any time during the week, rather that contributing to the downtime for the outage. The biggest problem, as I see it, would be the rewrites of all of the software to properly direct all the daily queries to the correct DB. And that would most likely be an immense undertaking. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1927676 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1927678 - Posted: 1 Apr 2018, 12:46:59 UTC - in response to Message 1927676. Last modified: 1 Apr 2018, 13:01:34 UTC In theory, that is what we have already. The 'archive' database you describe is the science database (Informix on paddym, plus astropulse on marvin). The 'working' database is the master BOINC database on oscar, with some less time-critical functions offloaded to the replica on carolyn. When things are working smoothly, tasks are removed from the working database 24 hours after validation - nothing 'archival' is held on oscar or carolyn. Oops - except this message board, our user profiles, our PMs, and our host computer records. Now there's a thought I need to come back to later. But what I was going to say was that the 'working' database is probably most efficient when it's working with freshly created tasks, newly added at the top end of the ID range. If they are created, both replications are sent out, both are returned within hours, both validate, and both are removed after 24 hours, that bit of the database will be clean and easy to index. The tasks which make the database 'messy' will be the ones which hang around, and their wingmates - the errors, the failed validations, the ghosts, the users who walk away without clearing their caches. Reducing that number should increase the database efficiency significantly, and the easiest way to start achieving that would be to halve the deadlines for all newly created work. ID: 1927678 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30651 Credit: 53,134,872 RAC: 32	Message 1927706 - Posted: 1 Apr 2018, 14:42:14 UTC - in response to Message 1927650. Last modified: 1 Apr 2018, 14:42:22 UTC Being able to parallel the software processing to optimize the I/O can get spectacular gains. Only so far as the hardware is capable. If a device is only capable of providing a particular amount of bandwidth and Input/Output operations per second, it doesn't matter what you do with the software that is making use of it- the hardware can't exceed it's physical limits. No, but if the software isn't designed to use the device to its limit ... if you do lots of little read/write chunks or do you cache it for a while and do big chunks do you do chunks all across the address space or do you do constitutive chunks the software needs to be tuned to the specific hardware to get the most out of it and there can be spectacular gains by doing so. when computers were slow programmers were taught these lessons and they wrote in assembler today they write in high level languages and have no clue what is happening at the machine level ID: 1927706 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1927708 - Posted: 1 Apr 2018, 14:44:46 UTC - in response to Message 1927678. Now there's a thought I need to come back to later... Well, that led to an interesting half-hour with a spreadsheet. In common with standard database practice, our user interactions here have an ID number and a datestamp. So far as I can tell, the answer is always 200 The questions were 'how many message board posts per day?', 'how many PMs per day?', 'how many new users per day?', and so on, taking figures averaged over the last three months. It's not quite as neat as that, but it gives us an order of magnitude. That's how busy the 'archival' bits of the active database are, and they grow steadily - I don't think anything is ever deleted (except possibly old computers). That makes the indexing requirement very different from... 5 million per day. That's roughly the number of workunits and tasks (combined) both added to, and deleted from the same database. I think that, if I was defining the database today, I would put the workunit and task tables into a different database container instance, and store it on a different RAID container, from the human-speed components. I don't know how easy it would be to split the current database - it might be as easy as changing some of the database connection ('join') definitions in https://github.com/BOINC/boinc/blob/master/db/boinc_db.cpp, and re-directing several of the daemons (transitioner, purge), although I doubt it. But if we could backup, compress, and re-index the data area separately from the user area, we might have a structure which will be quicker to maintain and more robust in the long term. Anyone see a flaw in that argument before I pitch it to David and Eric? ID: 1927708 ·

bob Send message Joined: 30 Dec 17 Posts: 3 Credit: 56,347 RAC: 0	Message 1927723 - Posted: 1 Apr 2018, 16:17:13 UTC - in response to Message 1927456. Thanks for update Eric have a nice day. ID: 1927723 ·

popandbob Volunteer tester Send message Joined: 19 Mar 05 Posts: 551 Credit: 4,673,015 RAC: 0	Message 1927724 - Posted: 1 Apr 2018, 16:25:55 UTC - in response to Message 1927708. The flaw still lies in the majority of the workload is in the workunit section. The forums/users add very little workload relatively. If one was to separate the workunit database into separate sections (a lot of work no denying that) it would create the following situations: Increased per CPU/GPU task numbers (we were told they were limited because of DB size) Reduced weekly downtime ( instead of dealing with 1 large db you deal with only 2 smaller ones as only 2 would be fragmented, the others would be linear in function) Possibility of still issuing work during weekly downtime (if ready to send were on a separate DB it could be done) Reduced DB load (Instead of querying the full db to find tasks/wu's to delete/purge it could be done by a simple return first/last 20 results of my suggested waiting for purge db) Dealing with a single large DB is only going to get worse as hardware continues to get better and more users join. It won't get any smaller. Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957 Or Good Shop? http://www.goodshop.com/?charityid=888957 ID: 1927724 ·

marsinph Volunteer tester Send message Joined: 7 Apr 01 Posts: 172 Credit: 23,823,824 RAC: 0	Message 1927732 - Posted: 1 Apr 2018, 17:25:09 UTC hello everyone Thanks Eric for update. But ! I repeat : But ! Since a few months there are more problems than something else. Problems to connect, server down, slow or unavailaible (due to DNS error) Unaible to get enough WU. I have a mid rage computer and mid rage GPU, I understand the limit on 100 WU / CPU / GPU to avoid nerver ending WU sent to inactive cruncher. Like the most of us, after a few hours outage, we are without WU to crunch. Why not to adapt this amount, based on the turnaround time ? I am sure more and more cruncher go to other project. I am on SETI from the begining. I try to crunch as much as possible, but now, I am tired about all those problems. Also create a team to collect as much as possible user. Often I send mail to team to ask to not give up. But more and more, we are tired of all those problems. I really do not kown what happen at SETI, but it started in november 2017. More or less at same time with the change of credits / WU . In only one month, I receive more credits on three projects together than in twenty years on SETI !!! What really happens with SETI ??? ID: 1927732 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1927770 - Posted: 1 Apr 2018, 23:42:30 UTC - in response to Message 1927708. Last modified: 2 Apr 2018, 0:41:56 UTC Anyone see a flaw in that argument before I pitch it to David and Eric? As mentioned earlier in the thread, we really need to know just what the issue is in order to fix it. From what's been going on it looks very much like Input/Output contention- for several weeks when the number of WUs needing deletion reached a certain level, splitter output would drop. And it would continue to drop as the number of WUs awaiting deletion grew. Then at a certain point, that deletion back log would clear, the splitters would crank up their output, and things would be good again till the deletion backlog reached that point again, splitter output would fall & so on. Lately, that hasn't been the case. WU awaiting deletion keeps growing, with the odd decline. But it just doesn't clear. And now WU awaiting purge has more than doubled, and just isn't clearing. Splitter output now varies from 10/s to 70/s with no clear reason for doing so. Giving the heaviest I/O tables their own storage space would certainly significantly reduce I/O contention. But what of the network bandwidth between the file servers and the database server? I recall it being said that the files for splitting need to be read in to memory before they can be split, much of the feeding of work relies on caching. Do we have the hardware with the necessary CPU & RAM resources to move Tables to other file sever hardware and provide that caching? And the available bandwidth between them? If the issue is I/O contention, replacing at least some of the main storage with Flash would resolve that, no changes in DB schema required . Copy the data from the existing storage to the new storage. But it would cost. A lot. Moving heavy I/O tables on to different file servers would give the same result, but with considerable time & effort required. And would only be worthwhile if it doesn't result in creating other bottle necks which would require their own expensive time/effort & hardware fixes. EDIT- there is a 3rd option. Any file/DB server hardware that hasn't got it's maximum possible memory, max them out to help with caching. Instead of (or as well as) moving tables to different file servers/pools, replace existing HDDs with (or just add) some Enterprise SSDs and make them part of their own storage pool for all logs & indexes, as they are smaller than the actual DB tables, yet their access has a significant impact on DB performance; particularly the logs. Grant Darwin NT ID: 1927770 ·

marsinph Volunteer tester Send message Joined: 7 Apr 01 Posts: 172 Credit: 23,823,824 RAC: 0	Message 1927795 - Posted: 2 Apr 2018, 14:22:04 UTC Why not to fully stop producing work !!! Why produce if all WU/results goes to DB, but system can not purge it. The bottlenek effect make it more and more difficult ! "Bruno" is the less powerfull server, but with the most work ! So, I will leave SETI till it again normal works (I hope it will do, i!!). Why so pessismist ? Because since november, it are more and more problems. In fact, there are more days with problems than normal day. DB problems, connections problems, some DNS problems,.... For sure because very few news from staff. Thanks Grant for explanations.. By reading you, one solution is to stop producing work till DB normally works. I repeat, why produce work if system can not manage the results. ID: 1927795 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22204 Credit: 416,307,556 RAC: 380	Message 1927800 - Posted: 2 Apr 2018, 15:40:13 UTC While it may sound to be a "good solution" to stop producing tasks until such time as the database issues are "cured", the only way it is actually known that the "cure" has been arrived at is to stress the overall system. There is certainly a very good argument for revising the database schema and structure, but, and this is a big but, neither should be undertaken lightly, and either will take time and money to implement. The trouble with the current database structure and implementation is that it is fragile, a small change to improve the performance in one area can have a significant adverse impact on another function. I'm not sure, but the use of at least two different database engines (Informix and MySQL) possibly introduces some of issues that are difficult to predict, diagnose or resolve; but at the time of setting up the project, the best part of twenty years ago, this was not an uncommon practice. It is worth noting that, at the time of their adoption, they were both considered to be suited for use in scaleable enterprise applications. It is also worth noting that data porting from one database engine to another is never as smooth as the salesman would have you believe.... Reducing deadlines would have a small impact, but probably not as much as is needed as the fraction of "hanging" tasks is quite low, and certainly nowhere near 50%. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1927800 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1927808 - Posted: 2 Apr 2018, 18:28:26 UTC Good to see that we've all got our thinking caps on. I think that the whole situation is slightly more compartmentalised than recent posters have made it seem. I think I see three logical black boxes, with complicated functions inside but very simple interactions between them. BB One is the splitter/workunit generator. That's to do with taking big linear recordings from the telescopes, and turning them into bite-sized chunks. That's clearly subject to optimisations like disk access to those big 100 GB files, but the outputs are simple: one 700 KB file to disk, and one 'insert' query to the workunit table in the MySQL BOINC database. Which takes us to... BB Two. This is where all the visible action takes place, and this is the (only) one I was addressing in my opening post. Something in here has to notice that new workunit and make the initial two tasks from it. Something has to notice when a result comes back, and decide what to do with it - wait for wingmate, validate now, make replacement task, etc., etc. The most critical component here is the scheduler, which I know Eric is loath to tamper with because even he doesn't fully understand how it works. It has to take some very complicated decisions in real time - processing 100 or more finished task reports, and then working out which of the 'available' tasks are fit for sending back to you, while you're hanging on to the internet connection waiting for the reply. Almost anything else can wait, but this bit can't: all the bits in this black box probably need to be within a single server, so that data can flow at RAM bus speeds (feeder to scheduler, as a case in point), and so that database queries can take optimum advantage from caching. If we spread that across two or more machines, the speed of light starts to become problematic... But on server-grade kit like this, they probably have access to multi-channel RAID controllers, and lots of drive bays, so my 'two database containers' suggestion doesn't mean two physical servers or even disk access contention. Finally, there's another 'insert' query (in Informix format) to take the validator output (the canonical result) from BB2 into... BB Three. This is where the huge (but to us invisible) Informix database holds court with the Nebula code. We haven't seen any outputs from that yet, but we will, one day. I'm not writing it off as a black hole yet. The interconnects between the black boxes are incredibly simple, and don't need to be time critical. They could be in different cities, with the data sent by Fed-Ex: about the only constraint is that the workunit file from the splitter is available in local storage before the 'insert' query is actioned. The rest could have long, slow, FIFO buffers to cushion the effects of any slowdowns. It's only what happens inside Black Box 2 that I'm worried about. ID: 1927808 ·

mrchips Send message Joined: 12 Dec 04 Posts: 17 Credit: 26,590,842 RAC: 8	Message 1927811 - Posted: 2 Apr 2018, 18:35:16 UTC - in response to Message 1927800. Since SETI has been around for many years, nobody would of guessed that so many people wanted to be a part of it. Maybe the number of users should be capped at a certain amount; and only allow new users as others uses drop out. That way a maximum user number could be establised and the system could be tuned to that max user number. It is like selling tickets to a concert, once the number of seats have been sold, no one else can get in unless someone drops out. Just my 2 cents worth. Happy Crunching..... ID: 1927811 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.