Marching on... (March 4, 2015)

Message boards : Technical News : Marching on... (March 4, 2015)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34041
Credit: 18,883,157
RAC: 18
Belgium
Message 1650184 - Posted: 7 Mar 2015, 0:46:53 UTC - in response to Message 1650094.  


[quote]Cosmic you are making a lot of sense.

I try.


He does make a lot of sense.

Thanx for the update Matt!
rOZZ
Music
Pictures
ID: 1650184 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1650197 - Posted: 7 Mar 2015, 1:33:49 UTC - in response to Message 1650150.  
Last modified: 7 Mar 2015, 1:41:08 UTC

By the way my estimates were off. Without any changes to what I was doing as of last week, it will take 5 months to rebuild the astropulse database!! But we can and will do a lot better than that....

- Matt

Is the bottleneck disk I/O, or the database itself I/O?


I remember back in the days of DBase with stepper motor HDDs I had a large (for the time) database that was taking ages to export to a new database. We got a new system with a Voice Coil HDD, and while things were a lot faster, they weren't nearly as fast as we expected them to be.
In the end we found we were better off transferring the old database to the new without any indexes, the rebuilding all the indexes on the new database from scratch.
Grant
Darwin NT
ID: 1650197 · Report as offensive
Profile Ananas
Volunteer tester

Send message
Joined: 14 Dec 01
Posts: 195
Credit: 2,503,252
RAC: 0
Germany
Message 1650401 - Posted: 7 Mar 2015, 19:13:10 UTC - in response to Message 1649772.  
Last modified: 7 Mar 2015, 19:35:51 UTC

... but has partitioning been looked at ...

That was my first idea too, I think it would be worth to think about. Our DataWarehouse guys do that with DB2 databases (on AIX, storage on EMC² SAN, temp dbspace on SSDs, which are also located in the EMC²) and I know that Informix tables have such an option too.

Another thing to fiddle with would be the extent sizes, things like making the first extent as large as possible and make the next extents very large too, in order to reduce the total amount of extents ... if that hasn't already been done anyways.

Default size is 16, which only makes sense for small tables up to 32k.
The rule how we calculate the extent sizes for large tables :
Size limit is 1963000, if a table fills that, we use it for the 1st extent.
If the table is large enough to fill more than 7 of those max. extents, we choose 1963000 as the next extent size too.
If the table is large enough to fill from 4 up to 7 more of those max. extents, we choose 1963000/2 as the next extent size.
If the table is large enough to fill from 1 up to 3 more of those max. extents, we choose 1963000/3 as the next extent size.
This is not my own knowledge, our Informix admin gave us those rules for optimizing tables with a too high total number of extents - he usually starts to get nervous when there are more than 50 extents for one table.


(I didn't even know that SETI uses Informix btw.)
ID: 1650401 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1650708 - Posted: 8 Mar 2015, 16:55:08 UTC

One possible reason a newer database hasn't been tried I would imagine is the cost, unless a suitable freeware db already exists. In that case I would imagine staff time in migration and integration would be the limiting factor.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1650708 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1650736 - Posted: 8 Mar 2015, 18:26:08 UTC - in response to Message 1650708.  

One possible reason a newer database hasn't been tried I would imagine is the cost, unless a suitable freeware db already exists. In that case I would imagine staff time in migration and integration would be the limiting factor.


Well, yeah, but it would seem to be time to bite the bullet on that one. The main problem seems to be the use of "academic" DBs (MySQL, Informix) which just were not designed with heavy usage (like SETI) in mind. Perhaps a license for something more suited could be obtained? Anybody?

And staff time is already being over-consumed either because of (IMO) unsuitable DBs, or (maybe) lack of staff knowledge of the intricacies of the DBs in use (not an attack, just a possibility), so what's to lose?
ID: 1650736 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1650798 - Posted: 8 Mar 2015, 20:33:04 UTC - in response to Message 1650736.  

One possible reason a newer database hasn't been tried I would imagine is the cost, unless a suitable freeware db already exists. In that case I would imagine staff time in migration and integration would be the limiting factor.

Well, yeah, but it would seem to be time to bite the bullet on that one. The main problem seems to be the use of "academic" DBs (MySQL, Informix) which just were not designed with heavy usage (like SETI) in mind. Perhaps a license for something more suited could be obtained? Anybody?

And staff time is already being over-consumed either because of (IMO) unsuitable DBs, or (maybe) lack of staff knowledge of the intricacies of the DBs in use (not an attack, just a possibility), so what's to lose?

Anybody know what the license fees for a "suitable" db program would be? Maybe that's where our next major fundraising campaign should go.
Donald
Infernal Optimist / Submariner, retired
ID: 1650798 · Report as offensive
Dad
Volunteer tester

Send message
Joined: 21 May 99
Posts: 44
Credit: 35,266,844
RAC: 10
United States
Message 1650836 - Posted: 8 Mar 2015, 21:57:39 UTC

Maybe we could lobby Eric to allow a fairly open ended Bitcoin Utopia campaign to start the funding until a new db product can be found. I would contribute considerable amount of computer time to it exclusively.

With all of the Seti users, 1 or 2 percent of computer resources from each, it should not take too long.

Perhaps it could be used to fund a db administrator/programmer for the project.

Just my thoughts

Dad
ID: 1650836 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1651168 - Posted: 9 Mar 2015, 23:22:46 UTC

Okay... Looks like using a method Eric suggests of rebuilding the database in 8 separate fragments and then merging them will actually speed things up considerably. Like 6-7 weeks instead of 5 months. That's something.

I'm still not exactly sure what the bottleneck is - the disk/cpu i/o is basically nothing - this is some mysterious informix internal voodoo.

Since we're getting this time down, we're considering in parallel restarting the actual Astropulse database server (not the temporary one I'm doing all this rebuilding on) just to insert new signals in the meantime. And then we can merge 6-7 weeks of new data into the fixed database and we'll be back on track. Anyway, still a work in progress, but we may get AP back online fairly soon. Key words: "may" and "fairly."

Also Dave reminded to turn of the "resend lost results" flag when the BOINC/mysql database is getting overload as it has recently, so I did that today. That will help general slowness we've been seeing with the scheduler lately.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1651168 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1651170 - Posted: 9 Mar 2015, 23:32:55 UTC - in response to Message 1651168.  
Last modified: 9 Mar 2015, 23:35:25 UTC

Thanks for the update Matt, and good to see you back.

Do you think a different database would behave any better? It looks like if there's a need, people are willing.

Wish they would do that for staff too, but hey, take your wins where you can!
ID: 1651170 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11358
Credit: 29,581,041
RAC: 66
United States
Message 1651190 - Posted: 10 Mar 2015, 0:12:24 UTC

The question that keeps bugging me is since so many more MBs are produced vs APs why hasn't this problem surfaced there?
ID: 1651190 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1651199 - Posted: 10 Mar 2015, 0:19:20 UTC - in response to Message 1651168.  

Also Dave reminded to turn of the "resend lost results" flag when the BOINC/mysql database is getting overload as it has recently, so I did that today. That will help general slowness we've been seeing with the scheduler lately.

- Matt

Err, Matt, are you sure that's a good idea? May I remind you of the events of Sunday, 4 November 2012, and my email thread "SETI server - runaway database bloat". Eric replied

I've stopped the splitters and doubled the httpd timeout. David
turned off "Resend Lost Result" yesterday in an attempt to fix the
timeout problem. It appears that made things worse, so I'm turning it
back on. I think we're going to need to at least temporarily go back
to restricting workunits in progress on a per host basis and per RPC
basis, regardless of what complaints we get about people being unable
to keep their hosts busy.

We got up to more than 10 million results "in the field" (and hence in the database too) that weekend.

David may be an inspired and instinctive coder, but I don't think he's aware of all the feedback loops in server administration.
ID: 1651199 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1651202 - Posted: 10 Mar 2015, 0:23:28 UTC - in response to Message 1651190.  

I would say because files didn't get deleted from the MB database.
ID: 1651202 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11358
Credit: 29,581,041
RAC: 66
United States
Message 1651206 - Posted: 10 Mar 2015, 0:45:48 UTC - in response to Message 1651202.  

I would say because files didn't get deleted from the MB database.

If that is true then the MB database keeps getting bigger, isn't that the problem that the AP database is too big?
ID: 1651206 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1651252 - Posted: 10 Mar 2015, 4:13:46 UTC - in response to Message 1651206.  
Last modified: 10 Mar 2015, 4:15:39 UTC

I would say because files didn't get deleted from the MB database.

If that is true then the MB database keeps getting bigger, isn't that the problem that the AP database is too big?

Are you talking about the Seti@Home/MB Science (MySQL) Database, or the Master & replica BOINC (Informix) databases?

The BOINC databases deal with results creation, distribution, & validation, and have records added & deleted frequently, whereas the Science databases only get added to.
Donald
Infernal Optimist / Submariner, retired
ID: 1651252 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1651273 - Posted: 10 Mar 2015, 5:14:08 UTC - in response to Message 1651252.  

I would say because files didn't get deleted from the MB database.

If that is true then the MB database keeps getting bigger, isn't that the problem that the AP database is too big?

Are you talking about the Seti@Home/MB Science (MySQL) Database, or the Master & replica BOINC (Informix) databases?

The BOINC databases deal with results creation, distribution, & validation, and have records added & deleted frequently, whereas the Science databases only get added to.

And also, the BOINC database keeps track of things like user accounts, machines, WUs that are in play, and these forums.

The science DB has much more data in it:

You know how every MB has spikes, triplets, pulses, gaussians, and auto-correlation counts? Every single one of those that gets found in every single WU.. becomes a new row in the respective tables. Based on what I'm seeing on my one cruncher, we've passed 4 billion tasks, and are nearing 1.75 billion WUs. Considering MB has five kinds of signals, and can have 30(?) of each, that adds up to a massive number of rows in the DB.

Plus there's everything that AP has found thus far, with its two kinds of signals.

I know I'm being rather generalized in the explanation here, but the science DB has easily 10x more rows than the BOINC DB, and much larger tables, and more of them, with a lot of indexes. It's no wonder the science DB is struggling very hard.

I know AP has been stored in a separate DB, but it is still tables and indexes. It would seem that the reason the MB side of things is fine and AP is not.. is probably because for whatever reason, the MB tables are configured..more efficiently? AP seems to have been playing "catch-up" for a while.. new problems get found, band-aids and quick fixes are applied to resolve the problem, and then a larger problem comes from that before anyone had the time to get around to fixing the first one, and it just kind of.. snowballed into a massive crash.

So now the only option there is.. is to fix it right and try to future-proof it, too.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1651273 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1651279 - Posted: 10 Mar 2015, 5:35:17 UTC - in response to Message 1651252.  

If I'm not mistaken, It is the Science Databases that are Infomix not the Master/User databases.
ID: 1651279 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1651373 - Posted: 10 Mar 2015, 13:26:31 UTC - in response to Message 1651273.  

The science DB has much more data in it:

You know how every MB has spikes, triplets, pulses, gaussians, and auto-correlation counts? Every single one of those that gets found in every single WU.. becomes a new row in the respective tables.

Really?

Based on what I'm seeing on my one cruncher, we've passed 4 billion tasks, and are nearing 1.75 billion WUs. Considering MB has five kinds of signals, and can have 30(?) of each, that adds up to a massive number of rows in the DB.

No, there can only be 30 signals total, in any combination of the 5 types. It can be 30 spikes, or 15 spikes and 15 pulses, or 22-2-2-2-2, but not 30-30-30-30-30.

Right now I have an inconclusive where my wingman overflowed with 24-4-2-0-0.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1651373 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1651384 - Posted: 10 Mar 2015, 13:52:53 UTC - in response to Message 1651373.  

The science DB has much more data in it:

You know how every MB has spikes, triplets, pulses, gaussians, and auto-correlation counts? Every single one of those that gets found in every single WU.. becomes a new row in the respective tables.

Really?

Yes, really.

The BOINC database is probably 'wider', in that it holds more data about any one WU/task, but the science database is much, much, 'deeper'.

The BOINC database contains the equivalent of about 3 days average crunching: the science database contains 15 years of results.
ID: 1651384 · Report as offensive
Bill Butler
Avatar

Send message
Joined: 26 Aug 03
Posts: 101
Credit: 4,270,697
RAC: 0
United States
Message 1651421 - Posted: 10 Mar 2015, 16:16:06 UTC - in response to Message 1651384.  

... the science database contains 15 years of results.

So I assume, then, that the science database is the one that Nitpicker will work on (when enough staff and time can be resourced (i.e., funded) ).

If so, it is a relief that an independent, largely uncoupled, database will be used, instead of the over loaded MB and AP databases. Of course messin' with that big beast is askin' for trouble too.
"It is often darkest just before it turns completely black."
ID: 1651421 · Report as offensive
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34041
Credit: 18,883,157
RAC: 18
Belgium
Message 1651457 - Posted: 10 Mar 2015, 22:04:38 UTC - in response to Message 1651384.  
Last modified: 10 Mar 2015, 22:08:33 UTC

The science DB has much more data in it:

You know how every MB has spikes, triplets, pulses, gaussians, and auto-correlation counts? Every single one of those that gets found in every single WU.. becomes a new row in the respective tables.

Really?

Yes, really.

The BOINC database is probably 'wider', in that it holds more data about any one WU/task, but the science database is much, much, 'deeper'.

The BOINC database contains the equivalent of about 3 days average crunching: the science database contains 15 years of results.


Paddym is the best! Marvin causes a lot of worries but without him, the project wouldn't be complete:)
rOZZ
Music
Pictures
ID: 1651457 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Technical News : Marching on... (March 4, 2015)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.