Lets help somehow...

Message boards : Number crunching : Lets help somehow...
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Shaun Neff
Avatar

Send message
Joined: 15 Jul 05
Posts: 157
Credit: 34,715
RAC: 0
United States
Message 153653 - Posted: 19 Aug 2005, 17:44:48 UTC - in response to Message 153302.  

We have to help SETI if we all want to really crunch signals! My head is empty right now... I mean jeez.. 1Mil of WU waiting for validation!

Any Ideas?


I actually sent them an email suggesting that Results waiting for validation be sent out to clients rather than new WUs. A couple thousand computers doing validation in this way would make the log all but dissapear. And, it would take some of the strain of the Seti servers.

Absence of evidence is not evidence of absence.

<img src="http://www.boincstats.com/signature/user_220108.gif"> +250 Classic Seti
ID: 153653 · Report as offensive
Jerry Camden
Volunteer tester

Send message
Joined: 10 Feb 01
Posts: 21
Credit: 716,374
RAC: 0
United States
Message 153668 - Posted: 19 Aug 2005, 18:09:42 UTC - in response to Message 153505.  

A.k.a. BLOB ...

The point that is missed is that the database content is also written to disk. So, you still have the data written to disk. Now the difference is that you have the RDBMS doing the file searching. It does not mean that it is any more efficient. It just sounds like it will be faster.

Granted "Longhorn" was supposed to "eliminate" the file system ... Ha! ... Win95 was supposed to eliminate DOS too ...



LOB, AKA: BLOB (Binary Large OBject) and CLOB (Character Large OBject), a way of storing large semi-free format/length data.

No Paul, I certainly didn't miss the point that it is all still written to disk. In my experience, with the RDBMS I deal with that handles a couple million transactions every day, the accessing of data with-in the DB has less overhead than accessing individual files. Keep in mind that all these little files are already linked to records in the DB!
Maybe you missed my point about all the DASD activity that occurs for every open, creation, & deletion of a file in the file-system. A search of each directory in the path to find the next part of the path and its pointer, until it points to a file. Then the security system has to validate the authority required to access the file. Then the file system has to provide pointers to all the segments of a file. These things take time and require DASD activity.
Also, again in my experience, back-up and restores of individual files takes orders of magnitude more time than the back-up and restore of contiguous records in a well performing RDBMS. What I don't have any experience with is the RDBMS in use in this project, MySQL I believe.
All of my comments are intend just as food for thought. If you already have experience with all the bits in use with this project and <B>know</B> that the file system is more efficient than the RDBMS, then I bow to your knowledge. If not, then I think it is unfair of you to say that "It just sounds like it will be faster."

Jerry
ID: 153668 · Report as offensive
Jerry Camden
Volunteer tester

Send message
Joined: 10 Feb 01
Posts: 21
Credit: 716,374
RAC: 0
United States
Message 153671 - Posted: 19 Aug 2005, 18:15:17 UTC - in response to Message 153597.  

A.k.a. BLOB ...

The point that is missed is that the database content is also written to disk. So, you still have the data written to disk. Now the difference is that you have the RDBMS doing the file searching. It does not mean that it is any more efficient. It just sounds like it will be faster.

Granted "Longhorn" was supposed to "eliminate" the file system ... Ha! ... Win95 was supposed to eliminate DOS too ...


Actually, Paul....

The idea of storing them as a BLOB isn't a bad idea because the problem is simply dealing with a directory that is full of files, not the files themselves.

So, if they were stored as a BLOB instead, they'd all be in one file.

But, as Matt pointed out, any proposal to make a change, BLOBs, a bigger directory fanout, multiple results per file, or any other idea means making a change to all of the programs that use those files (transitioners, validators, assimilators, etc.).

So, the project has to be taken down while those components are altered, then all of the files have to be moved to the new format.

The conversion process would be subject to all of the same struggles we're having right now, and validation would stop while it's being done.

Yeah, you could do something else, like put in code to look in the new place, and then the old place. I think it'd be a good idea to store part of the path in the database so that the directory assignments happened in one spot (and the fanout could change in just one spot).

But most of these just move the problem around, and at great expense.

So, it makes the most sense to stay with the current course and try to clean up the directories before something more dramatic is tried.


Yes, a change to use the DB instead of millions of small files is not a quick fix. I agree that we need to continue the current cleanup attempts. I'm just afraid that the file system will never handle the volume of activity that will be placed on it.

Jerry
ID: 153671 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 153672 - Posted: 19 Aug 2005, 18:18:16 UTC

If you move the result data into the database you create a larger bottleneck at the database server. Now instead of just worrying about the filesystem being slow, you now have to worry about the entire system slowing down including the forums, stats, assemilator, file-deleter, db_purger, and transitioners.

BOINC needs the database to be fast and nimble, which means you keep as much stuff out of it as possible.

Adding result data to the database isn't even portable across projects.

----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 153672 · Report as offensive
Jerry Camden
Volunteer tester

Send message
Joined: 10 Feb 01
Posts: 21
Credit: 716,374
RAC: 0
United States
Message 153754 - Posted: 19 Aug 2005, 21:44:51 UTC - in response to Message 153672.  
Last modified: 19 Aug 2005, 22:01:48 UTC

If you move the result data into the database you create a larger bottleneck at the database server. Now instead of just worrying about the filesystem being slow, you now have to worry about the entire system slowing down including the forums, stats, assemilator, file-deleter, db_purger, and transitioners.

BOINC needs the database to be fast and nimble, which means you keep as much stuff out of it as possible.

Adding result data to the database isn't even portable across projects.


I guess I won't have to be told three times that it's a dumb idea. I'll quit at twice and shut up.


[edit]
O the hell I will....
Rom, BLOBs can store anything a file can. The data in the files are specific to the project so the content isn't portable across projects now. I can't see that the files are any more portable than any other storage method. But I will concede that you know more about the limitations of your software and hardware than I would. Good luck.
Now, I'll shut up. Or, atleast try very hard to. ;-)
[/edit]
ID: 153754 · Report as offensive
Profile Anigel
Volunteer tester
Avatar

Send message
Joined: 5 Dec 99
Posts: 101
Credit: 643,544
RAC: 0
United Kingdom
Message 153766 - Posted: 19 Aug 2005, 22:17:20 UTC - in response to Message 153303.  


Validation@Home



Ah but how would you validate the validation@home results?

Would you then have to have a validation@home validation@home
to validate all the results from the validation@home project ;P
*grins, ducks and runs*
Part of Teamseti
For SetiBoinc status graphs visit Teamseti status graphs
ID: 153766 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 153793 - Posted: 19 Aug 2005, 23:36:31 UTC - in response to Message 153754.  


[edit]
O the hell I will....
Rom, BLOBs can store anything a file can. The data in the files are specific to the project so the content isn't portable across projects now. I can't see that the files are any more portable than any other storage method. But I will concede that you know more about the limitations of your software and hardware than I would. Good luck.
Now, I'll shut up. Or, atleast try very hard to. ;-)
[/edit]

"Can" and "should" are different things.

Moving the result "files" into blobs is moving the problem "lots of results" from the filesystem to the database.

That isn't necessarily good.
ID: 153793 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 153799 - Posted: 20 Aug 2005, 0:05:07 UTC - in response to Message 153754.  


O the hell I will....
Rom, BLOBs can store anything a file can. The data in the files are specific to the project so the content isn't portable across projects now. I can't see that the files are any more portable than any other storage method. But I will concede that you know more about the limitations of your software and hardware than I would. Good luck.
Now, I'll shut up. Or, atleast try very hard to. ;-)


Ah but BLOBs have the unfortunate side-effect of increasing the number of page reads the database needs to traverse to get the data. Not only do you have to find the row within the result set, but then you have to traverse the pointer in that row to a different page which is generally in a different part of the database altogether. BLOBs just kill perf.

If we ever get a chance to revisit the schema I would really like to move the BLOBs we already have into a separate table, move the fields involved with authentication out of the user table into their own table, and change a few fields from varchars to chars. Oh and when MySQL supports stored procedures I want to change BOINC's database layer so that it'll use them.

MS did a study while I was still working for the MSN org about the cost of maintaining data within a database vs. a flat file. With the organization structure and server infrastructure of the day, it came down to something like $30 per magebyte to store data in the database vs. $5 a megabyte to store something on the file system. The bottom line for database design became, unless you are going to need to write queries against it, data should not be in the database.

The result data is only useful to the scientists after it has been validated and assimilated, so the validation is handled via flat files and the assimilator stores the results in the master science database in a way that can be queried.

----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 153799 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 153843 - Posted: 20 Aug 2005, 2:02:17 UTC - in response to Message 153799.  

If we ever get a chance to revisit the schema I would really like to move the BLOBs we already have into a separate table, move the fields involved with authentication out of the user table into their own table, and change a few fields from varchars to chars. Oh and when MySQL supports stored procedures I want to change BOINC's database layer so that it'll use them.

Be very careful what you move from varchar to char. If the strings can be of varied length, then they better be varchar as char means that the string is exactly X characters long (been there done that had that problem).


BOINC WIKI
ID: 153843 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 153857 - Posted: 20 Aug 2005, 2:22:10 UTC - in response to Message 153843.  


Be very careful what you move from varchar to char. If the strings can be of varied length, then they better be varchar as char means that the string is exactly X characters long (been there done that had that problem).


Sure, you have to trim the result. The primary goal is to get all the columns for a table to be fixed size. With most database engines, queries execute much faster with fixed column sizes.

----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 153857 · Report as offensive
Janus
Volunteer developer

Send message
Joined: 4 Dec 01
Posts: 376
Credit: 967,976
RAC: 0
Denmark
Message 153953 - Posted: 20 Aug 2005, 8:31:27 UTC
Last modified: 20 Aug 2005, 8:34:54 UTC

Moving stuff out

Most certainly a good idea - and something that perhaps should be considered at some point. Already when the extra forum features where made we started moving stuff out of the user table to improve performance (which reminds me - where those columns ever deleted?). Keeping all the stderr out data in the result table really makes that table pretty large - especially in cases where a lot of debug info is saved etc. etc.

Fixed size rows

The speed increase in using fixed size rows is pretty impressive - it also reduces the number of I/O reads needed to find a particular row and there's no overhead due to cluttering of rows. Usually uses more diskspace though, but it would be well worth it.

Anyways, the DB isn't an issue right now, but I agree that no more BLOBS should be moved into existing tables.
ID: 153953 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 154047 - Posted: 20 Aug 2005, 15:34:41 UTC - in response to Message 153547.  

How does one eliminate a file system and still have a functioning computer? Doesn't make sense to me. But what do I know.

Jim,

Actually, you don't. Sorry, I was being a little sarcastic. But, I am also not doing well still, so, the reply was terse.

In essence, all that happens, as others have stated, is that the file system is "hidden" by an abstraction layer. The intent is to make it "seem" better.

Conceptually all information is stored and accessed now the same way. It is one of the holy grails of object oriented designs. Sadly, though the later generations of Microsoft OS are touted as being OO, they really are not truly OO. YOu can see that by looking at those differences in the way things get handled in different areas of the system. These errors are minor and subtle, and I don't have a list of them for you to look at. But, it is also who cares because we still really don't have much of a choice in the matter.

Last point on LOB, depending on the database the base record has a pointer to the DB location where the LOB is located. As Rom stated this increases the number of I/O operations. The worst part is that you have forced a physical I/O ... even more telling, though the directories are large, the index searches for individual files is against a smaller index than the index search through the database records, which contains a "flat" list of all of the files known to the system.

All of these points are why is is so hard to build a system of this complexity. Until you work with "real" data, these is really know way to know how the system will perform in the real world. Test simulations never do find the operational problems because they only develop when the system is operational. Which is back to my statement to not optimize until you are operational. As BOINC went live we have found many performance problems and worked them down as we could. When Rom first started he suggested changes to the queries used to do the transitioner scans (If I recall correctly) which increased the performance by at least a factor of 4.
ID: 154047 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 154234 - Posted: 20 Aug 2005, 22:35:03 UTC - in response to Message 154047.  

How does one eliminate a file system and still have a functioning computer? Doesn't make sense to me. But what do I know.

Jim,

Actually, you don't. Sorry, I was being a little sarcastic. But, I am also not doing well still, so, the reply was terse.

In essence, all that happens, as others have stated, is that the file system is "hidden" by an abstraction layer. The intent is to make it "seem" better.


Paul, Thanks for the information. I was kinda thinking along these lines, but didn't know if there was some other method that I was unaware of.

sorry that I missed the sarcasm. You know how hard it is to hear the inflection of a persons voice through typed messages.
ID: 154234 · Report as offensive
Profile Chilean
Volunteer tester
Avatar

Send message
Joined: 6 Apr 03
Posts: 498
Credit: 3,200,504
RAC: 0
Chile
Message 154307 - Posted: 21 Aug 2005, 2:27:03 UTC

How about donating for better servers?? SETI should allow people to donate for sertain things. (Example: To donate for servers click HERE, To Donate for....)
ID: 154307 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 154327 - Posted: 21 Aug 2005, 2:59:46 UTC - in response to Message 154047.  

Hang in there Paul, I'm about to start chemo myself.




Actually, you don't. Sorry, I was being a little sarcastic. But, I am also not doing well still, so, the reply was terse.



ID: 154327 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 154332 - Posted: 21 Aug 2005, 3:15:37 UTC - in response to Message 154327.  

Hang in there Paul, I'm about to start chemo myself.




Actually, you don't. Sorry, I was being a little sarcastic. But, I am also not doing well still, so, the reply was terse.



I finished my radiation last September 3rd. Hang in there Barry. I'm praying for you.

tony
ID: 154332 · Report as offensive
Profile Chilean
Volunteer tester
Avatar

Send message
Joined: 6 Apr 03
Posts: 498
Credit: 3,200,504
RAC: 0
Chile
Message 154345 - Posted: 21 Aug 2005, 3:38:42 UTC - in response to Message 154332.  
Last modified: 21 Aug 2005, 3:40:37 UTC

Hang in there Paul, I'm about to start chemo myself.




Actually, you don't. Sorry, I was being a little sarcastic. But, I am also not doing well still, so, the reply was terse.



I finished my radiation last September 3rd. Hang in there Barry. I'm praying for you.

tony

I'll pray 4 you two as well... good luck :) They should really make grid.org (the cancer research) avalible for BOINC, I would crunch 4 them non-stop.
ID: 154345 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 154400 - Posted: 21 Aug 2005, 5:31:47 UTC - in response to Message 154332.  

Thanks for the thoughts -- for me it is chemo -- and should be once every two weeks for the next six months.


I finished my radiation last September 3rd. Hang in there Barry. I'm praying for you.

tony


ID: 154400 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 154557 - Posted: 21 Aug 2005, 14:56:21 UTC - in response to Message 154327.  

Hang in there Paul, I'm about to start chemo myself.

Ugh.

The good news is that this does not cost much. The bad news is that it is annoying to not be able to walk safely. Well, I am better enough that I can get through the forums each day. But not much else yet. Well see Monday...

Is chemo still making you guys nauseated? I don't keep up with the bio stuff that much... and curiosity strikes.

I hope it works in any case ... thankfully though the fatality rate for depression is high, I don't think it is going to get me ... but cancer, ugh ... we had a friend die of pancreatic cancer ...
ID: 154557 · Report as offensive
Profile Daniel Michel
Volunteer tester
Avatar

Send message
Joined: 2 Feb 04
Posts: 14925
Credit: 1,378,607
RAC: 6
United States
Message 154560 - Posted: 21 Aug 2005, 15:01:31 UTC
Last modified: 21 Aug 2005, 15:03:43 UTC

my prayers and best wishes for Tony, Barry, and Paul...i hope to enjoy your company here for many years to come.

PROUD TO BE TFFE!
ID: 154560 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Lets help somehow...


 
©2026 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.