Match Game (Apr 30 2007)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 556984 - Posted: 30 Apr 2007, 21:34:16 UTC Okay.. here's a better explanation to hopefully answer the question: why is it so hard to tie users with their processed workunits? First issue is that we are using the generalized BOINC backend. Projects using BOINC may not necessarily care who does which workunit. So this logic (which would require database overhead, including extra tables or fields in the schema) isn't hard-coded into the server backend. It is also up to the project to store their final BOINC products however they wish. In our case, we use an Informix database on a separate server. We require the database be as streamlined as possible due to performance constraints. So only science is allowed in the science db - the BOINC user ids have nothing to do with the eventual scientific analysis. If we put the user ids in the science database, this would increase disk usage and I/O (every completed result would require an additional table update, and an index update, on top of whatever is needed to do the actual selects on this user id data). So from a resource management and administrative cleanliness perspective, this isn't a good idea. SETI@home is also somewhat unique in that we process large numbers of results/workunits very quickly. We can't keep growing the result/workunit tables in the BOINC database as the table sizes would expand out of memory bounds and basically grind the database engine to a halt. Most other projects do a small fraction of the transactions we do, so this isn't a problem for them. We are forced to run a BOINC utility db_purge which removes completed results/workunits from the BOINC database once the scientific data has been assimilated, but with a buffer of N days so users can see recently assimilated results on their personal account pages. The db_purge program safely writes the result and workunit data safely to XML flat files before deleting outright. The weekly "database reorgs" are necessary as this constant random access deleting creates significant disk fragmentation in the tables and so we need to regularly compress them. What the BOINC backend does provide is a single floating point field in the workunit table called "opaque" for use as the specific projects see fit. In our case, the project-specific workunit creator (the splitter) creates a workunit in the science database and places its id in the opaque field in the BOINC database. This opaque data ends up in the aforementioned purged XML files. Until recently these files were collecting on a giant RAID filesystem and that was it. Only last week I wrote a script that parses the XML and finds a result id/user id pair in the files, ties that result id to the BOINC workunit id, and then via the opaque value ties that to the science database workunit it. Not very efficient, but given the architecture and hardware resources, this is the best we could do. The game plan now is to use this script to populate a completely separate third database. As well we'll retrofit the validator and add some logic to populate this database on the fly. It is only recently we had systems powerful enough to handle this extra load. It is still questionable whether or not this will clobber the system, or if the ensuing queries on this new data will clobber the system. Adding to the complication is that we do redundant analysis of our workunits - also not a requirement for every BOINC project. Because of that, we have multiple results for each workunit, and an arbitrary number at that (anywhere from 1 to N results for any particular workunit, where N is the maximum level of allowable redundancy during the history of the whole project). If we never did anything redundantly, we could have used the opaque field containing the remote science database's workunit id and left it at that. But since in our case any unique workunit can be tied to non-unique users/results, we had to create this new database which is really a simple table called "wuhash" which contains a workunit id, a user id, and a uniqueness constraint on the pair. I doubt this all makes things perfectly clear, but maybe it helps. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 556984 ·

Haos.PL Volunteer tester Send message Joined: 18 Mar 04 Posts: 63 Credit: 3,268,546 RAC: 0	Message 557002 - Posted: 30 Apr 2007, 22:00:26 UTC At least for me:) We`re gonna have a fourth database, after main, replica and science, and it is gonna store data on permanent basis (like science) not temporary (like boinc main and replica). It`s sole purpose is gonna be to tie all the data in science to the user that krunched it. Great job guys! It is really nice to see that you care for your small and humble krunchers:) I cant wait till we will be able to access it. ID: 557002 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 557025 - Posted: 30 Apr 2007, 22:47:03 UTC Addendum: After posting all that me, Jeff, and Dave had a chat and we decided after all to put the userid/wuid code in the BOINC backend framework after all (as some kind of feature that is turned off by default). We'll worry about database resources and all that once it is working. So ignore everything I said. Ha ha. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 557025 ·

Redshift Send message Joined: 3 Apr 99 Posts: 122 Credit: 1,244,536 RAC: 0	Message 557128 - Posted: 1 May 2007, 3:38:38 UTC - in response to Message 557025. ...decided after all to put the userid/wuid code in the BOINC backend framework after all... Sounds like this may be easier to manage, over time, than a new database instance. And maybe there will eventually be another project that will use the feature. ID: 557128 ·

Odysseus Volunteer tester Send message Joined: 26 Jul 99 Posts: 1808 Credit: 6,701,347 RAC: 6	Message 557269 - Posted: 1 May 2007, 7:53:12 UTC - in response to Message 557128. And maybe there will eventually be another project that will use the feature. SzTAKI published the names of the accounts under which the most significant results from their first run were crunched; I donÃ¢â‚¬â„¢t know how they collected the dataÃ¢â‚¬â€of course there would have been orders of magnitude less than there is hereÃ¢â‚¬â€but I imagine a BOINC facility for doing so would have saved them some work. By the same token, some project teams may have already considered the idea favourably but found it infeasible to develop their own solutions. ID: 557269 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20	Message 557385 - Posted: 1 May 2007, 13:05:16 UTC - in response to Message 556984. What the BOINC backend does provide is a single floating point field in the workunit table called "opaque" for use as the specific projects see fit... OK, a minor question: Any special reason for offering a floating point table as a general use 'opaque' rather than any other type such as say integer? I doubt this all makes things perfectly clear, but maybe it helps. Quite a nice and readable summary, thanks. And subsequently good that you're now getting better hooks into Boinc! Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 557385 ·

Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0	Message 557407 - Posted: 1 May 2007, 13:36:32 UTC - in response to Message 557025. Addendum: After posting all that me, Jeff, and Dave had a chat and we decided after all to put the userid/wuid code in the BOINC backend framework after all (as some kind of feature that is turned off by default). We'll worry about database resources and all that once it is working. So ignore everything I said. Ha ha. - Matt cute Sir!!! . . . nice work though @ Berkeley from All of You . . . BOINC Wiki . . . Science Status Page . . . ID: 557407 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.