Message boards :
Number crunching :
Lets help somehow...
Message board moderation
Previous · 1 · 2 · 3 · Next
| Author | Message |
|---|---|
Shaun Neff Send message Joined: 15 Jul 05 Posts: 157 Credit: 34,715 RAC: 0
|
We have to help SETI if we all want to really crunch signals! My head is empty right now... I mean jeez.. 1Mil of WU waiting for validation! I actually sent them an email suggesting that Results waiting for validation be sent out to clients rather than new WUs. A couple thousand computers doing validation in this way would make the log all but dissapear. And, it would take some of the strain of the Seti servers. Absence of evidence is not evidence of absence. <img src="http://www.boincstats.com/signature/user_220108.gif"> +250 Classic Seti |
|
Jerry Camden Send message Joined: 10 Feb 01 Posts: 21 Credit: 716,374 RAC: 0
|
A.k.a. BLOB ... LOB, AKA: BLOB (Binary Large OBject) and CLOB (Character Large OBject), a way of storing large semi-free format/length data. No Paul, I certainly didn't miss the point that it is all still written to disk. In my experience, with the RDBMS I deal with that handles a couple million transactions every day, the accessing of data with-in the DB has less overhead than accessing individual files. Keep in mind that all these little files are already linked to records in the DB! Maybe you missed my point about all the DASD activity that occurs for every open, creation, & deletion of a file in the file-system. A search of each directory in the path to find the next part of the path and its pointer, until it points to a file. Then the security system has to validate the authority required to access the file. Then the file system has to provide pointers to all the segments of a file. These things take time and require DASD activity. Also, again in my experience, back-up and restores of individual files takes orders of magnitude more time than the back-up and restore of contiguous records in a well performing RDBMS. What I don't have any experience with is the RDBMS in use in this project, MySQL I believe. All of my comments are intend just as food for thought. If you already have experience with all the bits in use with this project and <B>know</B> that the file system is more efficient than the RDBMS, then I bow to your knowledge. If not, then I think it is unfair of you to say that "It just sounds like it will be faster." Jerry |
|
Jerry Camden Send message Joined: 10 Feb 01 Posts: 21 Credit: 716,374 RAC: 0
|
A.k.a. BLOB ... Yes, a change to use the DB instead of millions of small files is not a quick fix. I agree that we need to continue the current cleanup attempts. I'm just afraid that the file system will never handle the volume of activity that will be placed on it. Jerry |
Rom Walton (BOINC) Send message Joined: 28 Apr 00 Posts: 579 Credit: 130,733 RAC: 0
|
If you move the result data into the database you create a larger bottleneck at the database server. Now instead of just worrying about the filesystem being slow, you now have to worry about the entire system slowing down including the forums, stats, assemilator, file-deleter, db_purger, and transitioners. BOINC needs the database to be fast and nimble, which means you keep as much stuff out of it as possible. Adding result data to the database isn't even portable across projects. ----- Rom BOINC Development Team, U.C. Berkeley My Blog |
|
Jerry Camden Send message Joined: 10 Feb 01 Posts: 21 Credit: 716,374 RAC: 0
|
If you move the result data into the database you create a larger bottleneck at the database server. Now instead of just worrying about the filesystem being slow, you now have to worry about the entire system slowing down including the forums, stats, assemilator, file-deleter, db_purger, and transitioners. I guess I won't have to be told three times that it's a dumb idea. I'll quit at twice and shut up. [edit] O the hell I will.... Rom, BLOBs can store anything a file can. The data in the files are specific to the project so the content isn't portable across projects now. I can't see that the files are any more portable than any other storage method. But I will concede that you know more about the limitations of your software and hardware than I would. Good luck. Now, I'll shut up. Or, atleast try very hard to. ;-) [/edit]
|
Anigel Send message Joined: 5 Dec 99 Posts: 101 Credit: 643,544 RAC: 0
|
Ah but how would you validate the validation@home results? Would you then have to have a validation@home validation@home to validate all the results from the validation@home project ;P *grins, ducks and runs* Part of Teamseti For SetiBoinc status graphs visit Teamseti status graphs
|
|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0
|
"Can" and "should" are different things. Moving the result "files" into blobs is moving the problem "lots of results" from the filesystem to the database. That isn't necessarily good. |
Rom Walton (BOINC) Send message Joined: 28 Apr 00 Posts: 579 Credit: 130,733 RAC: 0
|
Ah but BLOBs have the unfortunate side-effect of increasing the number of page reads the database needs to traverse to get the data. Not only do you have to find the row within the result set, but then you have to traverse the pointer in that row to a different page which is generally in a different part of the database altogether. BLOBs just kill perf. If we ever get a chance to revisit the schema I would really like to move the BLOBs we already have into a separate table, move the fields involved with authentication out of the user table into their own table, and change a few fields from varchars to chars. Oh and when MySQL supports stored procedures I want to change BOINC's database layer so that it'll use them. MS did a study while I was still working for the MSN org about the cost of maintaining data within a database vs. a flat file. With the organization structure and server infrastructure of the day, it came down to something like $30 per magebyte to store data in the database vs. $5 a megabyte to store something on the file system. The bottom line for database design became, unless you are going to need to write queries against it, data should not be in the database. The result data is only useful to the scientists after it has been validated and assimilated, so the validation is handled via flat files and the assimilator stores the results in the master science database in a way that can be queried. ----- Rom BOINC Development Team, U.C. Berkeley My Blog |
|
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0
|
If we ever get a chance to revisit the schema I would really like to move the BLOBs we already have into a separate table, move the fields involved with authentication out of the user table into their own table, and change a few fields from varchars to chars. Oh and when MySQL supports stored procedures I want to change BOINC's database layer so that it'll use them. Be very careful what you move from varchar to char. If the strings can be of varied length, then they better be varchar as char means that the string is exactly X characters long (been there done that had that problem). BOINC WIKI |
Rom Walton (BOINC) Send message Joined: 28 Apr 00 Posts: 579 Credit: 130,733 RAC: 0
|
Sure, you have to trim the result. The primary goal is to get all the columns for a table to be fixed size. With most database engines, queries execute much faster with fixed column sizes. ----- Rom BOINC Development Team, U.C. Berkeley My Blog |
|
Janus Send message Joined: 4 Dec 01 Posts: 376 Credit: 967,976 RAC: 0
|
Moving stuff out Most certainly a good idea - and something that perhaps should be considered at some point. Already when the extra forum features where made we started moving stuff out of the user table to improve performance (which reminds me - where those columns ever deleted?). Keeping all the stderr out data in the result table really makes that table pretty large - especially in cases where a lot of debug info is saved etc. etc. Fixed size rows The speed increase in using fixed size rows is pretty impressive - it also reduces the number of I/O reads needed to find a particular row and there's no overhead due to cluttering of rows. Usually uses more diskspace though, but it would be well worth it. Anyways, the DB isn't an issue right now, but I agree that no more BLOBS should be moved into existing tables. |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0
|
How does one eliminate a file system and still have a functioning computer? Doesn't make sense to me. But what do I know. Jim, Actually, you don't. Sorry, I was being a little sarcastic. But, I am also not doing well still, so, the reply was terse. In essence, all that happens, as others have stated, is that the file system is "hidden" by an abstraction layer. The intent is to make it "seem" better. Conceptually all information is stored and accessed now the same way. It is one of the holy grails of object oriented designs. Sadly, though the later generations of Microsoft OS are touted as being OO, they really are not truly OO. YOu can see that by looking at those differences in the way things get handled in different areas of the system. These errors are minor and subtle, and I don't have a list of them for you to look at. But, it is also who cares because we still really don't have much of a choice in the matter. Last point on LOB, depending on the database the base record has a pointer to the DB location where the LOB is located. As Rom stated this increases the number of I/O operations. The worst part is that you have forced a physical I/O ... even more telling, though the directories are large, the index searches for individual files is against a smaller index than the index search through the database records, which contains a "flat" list of all of the files known to the system. All of these points are why is is so hard to build a system of this complexity. Until you work with "real" data, these is really know way to know how the system will perform in the real world. Test simulations never do find the operational problems because they only develop when the system is operational. Which is back to my statement to not optimize until you are operational. As BOINC went live we have found many performance problems and worked them down as we could. When Rom first started he suggested changes to the queries used to do the transitioner scans (If I recall correctly) which increased the performance by at least a factor of 4. |
Jim Baize Send message Joined: 6 May 00 Posts: 758 Credit: 149,536 RAC: 0
|
How does one eliminate a file system and still have a functioning computer? Doesn't make sense to me. But what do I know. Paul, Thanks for the information. I was kinda thinking along these lines, but didn't know if there was some other method that I was unaware of. sorry that I missed the sarcasm. You know how hard it is to hear the inflection of a persons voice through typed messages. |
Chilean Send message Joined: 6 Apr 03 Posts: 498 Credit: 3,200,504 RAC: 0
|
|
|
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0
|
|
|
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
Hang in there Paul, I'm about to start chemo myself. I finished my radiation last September 3rd. Hang in there Barry. I'm praying for you. tony |
Chilean Send message Joined: 6 Apr 03 Posts: 498 Credit: 3,200,504 RAC: 0
|
Hang in there Paul, I'm about to start chemo myself. I'll pray 4 you two as well... good luck :) They should really make grid.org (the cancer research) avalible for BOINC, I would crunch 4 them non-stop.
|
|
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0
|
|
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0
|
Hang in there Paul, I'm about to start chemo myself. Ugh. The good news is that this does not cost much. The bad news is that it is annoying to not be able to walk safely. Well, I am better enough that I can get through the forums each day. But not much else yet. Well see Monday... Is chemo still making you guys nauseated? I don't keep up with the bio stuff that much... and curiosity strikes. I hope it works in any case ... thankfully though the fatality rate for depression is high, I don't think it is going to get me ... but cancer, ugh ... we had a friend die of pancreatic cancer ... |
Daniel Michel Send message Joined: 2 Feb 04 Posts: 14925 Credit: 1,378,607 RAC: 6
|
my prayers and best wishes for Tony, Barry, and Paul...i hope to enjoy your company here for many years to come. PROUD TO BE TFFE! |
©2026 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.