Heads Up: Quorum Change

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 543372 - Posted: 9 Apr 2007, 22:29:09 UTC Hello, generous donators of CPU time. As we ramp down the current data analysis to make way for the hot, fresh multibeam data, we are going to change the quorums for result validation very soon. Perhaps tomorrow. Currently we generate four results per workunit, though three are enough for validation. We're changing this to three and two. This means less redundancy, which has the wonderful side effect of consuming less of the world's power to do our data analysis. However, be warned that this may cause us to run out of work to process from time to time over the coming weeks, especially if we lower the quorum levels even more (two and two). There's concern that "no work available" will cause us to lose valuable participants. Maybe it won't. In any case, hopefully we can get the multibeam data going before the well drains completely. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 543372 ·

Labbie Send message Joined: 19 Jun 06 Posts: 4083 Credit: 5,930,102 RAC: 0	Message 543378 - Posted: 9 Apr 2007, 22:34:03 UTC Good news Matt. Thanks. ID: 543378 ·

Misfit Volunteer tester Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0	Message 543551 - Posted: 10 Apr 2007, 3:05:55 UTC - in response to Message 543372. Hello, generous donators of CPU time. - Matt Never let it be said again we aren't recognized. :) (Unless I say it.) me@rescam.org ID: 543551 ·

Byron Leigh Hatch @ team Carl Sagan Volunteer tester Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4	Message 543586 - Posted: 10 Apr 2007, 4:37:42 UTC Hi Matt ... thanks very much for the Good news! _my Computer_id_5302_ _my Computer_id_5303_ _my Computer_id_5304_ _my Computer_id_5306_ ID: 543586 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67	Message 543611 - Posted: 10 Apr 2007, 5:37:27 UTC Thanks Matt, I think we better think carefully about the possibility of running short of work as the multibeam data on my C2D and Pent M reduce crunch time per task by about 70% on average. If this is true across all types of CPU/OS platforms then reducing replication to 3 will mean the number of tasks completed will approximately double. And as since enhanced came on line most new cpu's are dual cores I hope the new servers can cope better with the database, when more than million/day tasks is once more back upon us. Andy ID: 543611 ·

Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0	Message 543801 - Posted: 10 Apr 2007, 15:51:17 UTC Thanks, Matt, for the news. Less redundancy means more science done with the same amount of computation. So, Alfa datacrunching is just around the corner? Einstein will be proud if Seti data runs out, for me. ID: 543801 ·

PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1	Message 543818 - Posted: 10 Apr 2007, 16:35:12 UTC This is a good opportunity for me to get on my "change control soapbox". Would someone give a good (technical, quantitative) reason for the change? Would someone give a good (technical, quantitative) analysis of the risk involved? It just seems like good project management to go to the effort to engineer the change (and good PR to publish the results). ID: 543818 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 543844 - Posted: 10 Apr 2007, 21:11:56 UTC Reducing quorum means less redundancy, which in turn means we need to create more work to keep up with "demand." In theory, we still have a bunch of "standard" SETI@home data to work with, so we shouldn't run out... but they are on DLT IV tapes which are slow to read, and we don't have a carousel or robot to feed the drive over nights or weekends. Currently I can throw one or two tapes in a day (four days a week) and that's just enough. Reducing quorum to 3 will increase splitter demand by 25%. And we may soon reduce it to 2, meaning splitter demand will increase by 50%. We can't feed our (only) DLT drive fast enough. Another DLT drive won't help - our network nor the file server can't handle the bandwidth. Plus we're very close to being finished with this data. The tapes hold 35GB of data. The new multibeam data, however, are on 500GB/750GB hard drives. And we have two drive bays at a time available for reading. So we can keep a lot of data online without human intervention and feed it to the splitters over long weekends, etc. So running out of work is less of a worry, even with reduced redundancy. However! If we ever "catch up" and we have a high demand for work to analyze, we will most certainly run out of work to send. This is obvious, but has never been the case, so our user base might be annoyed by this even though this is a happy problem. We'll cross that bridge, etc. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 543844 ·

PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1	Message 543882 - Posted: 10 Apr 2007, 22:06:07 UTC Last modified: 10 Apr 2007, 22:08:42 UTC Thanks, Matt; this is enlightening. You have 1 tape drive that feeds the splitters. If the downstream demand increases due to reduced redundancy then someone needs to feed the drive at an increasing rate, which will lead to gaps during which the drive is dormant. You also have 'a bunch' of tapes left to analyze. So reduced redundancy will increase their processing rate, but potentially expose the project to gaps in WU availability. But why is the redundancy criterion being modified? The premise since Classic is that Seti needed a high level of redundancy to be believed. (I seem to remember questioning this premise a couple of years ago and getting blasted and later ignored by the community.) What model tells us that we can live with 3 or 2 in a quorem? The risk to the project is false positives as well as missing positives. (I am not aware of what kind of probability analyses have been performed on this topic.) A secondary question is how long would it take with the current redundancy criterion to finish processing the 'backlog' of data tapes, assuming the current computing capacity as provided by the volunteers? That is, what is the aggregate number of WU's on the tapes, how many have been processed, how many can 'we' process per day? I realize that the tapes are being re-analyzed with more stringent criteria, leading to either more WU's or more compute time, or both. Finally, is it possible that the new (multibeam) data is such an improvement in S/N (or similar criteria) that we risk wasting resources looking at the old tape data? Is this the driving factor for any change? ("hurry up and get to the new data") That makes sense, but it does seem risky to modify the redundancy criteria, unless we have been wrong all along by requiring a quorem of 4. In this case, I suggest just making a clean switch over as soon as possible, rather than having a bunch of 'half' analyzed datasets to worry about. ID: 543882 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 543912 - Posted: 10 Apr 2007, 23:11:41 UTC I forget the exact numbers, but if the redundancy level was 1, the reliability of the data was over 99% (i.e. less than 1% of the data was corrupt or bogus). Each level of redundancy above that is an improvement, but with diminishing returns. The current data being analyzed by SETI@home has already been analyzed before, but with older versions of the client that did far less analysis (so workunits wouldn't take weeks to finish). There's a scientific reason to do so, but we're chomping on the bit to get to the new data because it's "deeper" in both frequency space and in sensitivity. Plus RFI will be easier to reject because of the multiple beams, etc. So reducing redundancy will help us push through the remaining old data. One drawback of less redundancy is more "jagged" credit for workunits done (since they won't be averaged over the claimed credit of many users). Over the long term this will average out, but users who pay close attention will get annoyed by a few data points that make it seem they're being "short changed." Some PR will be necessary to smooth that over. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 543912 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 543917 - Posted: 10 Apr 2007, 23:18:32 UTC - in response to Message 543882. Thanks, Matt; this is enlightening. You have 1 tape drive that feeds the splitters. If the downstream demand increases due to reduced redundancy then someone needs to feed the drive at an increasing rate, which will lead to gaps during which the drive is dormant. You also have 'a bunch' of tapes left to analyze. So reduced redundancy will increase their processing rate, but potentially expose the project to gaps in WU availability. Remember that the new data is being shipped from Aricebo on nice, fast SATA hard drives, not tape. ID: 543917 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 543921 - Posted: 10 Apr 2007, 23:27:34 UTC - in response to Message 543882. Last modified: 10 Apr 2007, 23:30:57 UTC Thanks, Matt; this is enlightening. You have 1 tape drive that feeds the splitters... Ouch! No redundancy there for a failed drive or a jammed tape... But why is the redundancy criterion being modified? The premise since Classic is that Seti needed a high level of redundancy to be believed. (I seem to remember questioning this premise a couple of years ago and getting blasted and later ignored by the community.) What model tells us that we can live with 3 or 2 in a quorem? The risk to the project is false positives as well as missing positives. (I am not aware of what kind of probability analyses have been performed on this topic.) I guess the question is for what is the probability (possibility) of getting two similarly incorrect results to fool the quorum checking? Could two similar systems similarly overclocked or similarly overheated return results in the same incorrect way? A secondary question is how long would it take with the current redundancy criterion to finish processing the 'backlog' of data tapes, assuming the current computing capacity as provided by the volunteers? That is, what is the aggregate number of WU's on the tapes, how many have been processed, how many can 'we' process per day? I realize that the tapes are being re-analyzed with more stringent criteria, leading to either more WU's or more compute time, or both. Finally, is it possible that the new (multibeam) data is such an improvement in S/N (or similar criteria) that we risk wasting resources looking at the old tape data? Is this the driving factor for any change? ("hurry up and get to the new data") That makes sense, but it does seem risky to modify the redundancy criteria, unless we have been wrong all along by requiring a quorem of 4. In this case, I suggest just making a clean switch over as soon as possible, rather than having a bunch of 'half' analyzed datasets to worry about. All very good questions. I think that the strategy of improving the efficiency and holding recrunching of classic WUs in reserve is a good idea. I must admit that I too have the question as to whether the quorum logic and the real world can work well enough together for a minimum quorum of two. But then again, the main risk is that of false positives and for those positives, those few WUs can be reanalysed for confirmation. The other risk is that of false negatives and whether that can open the door for users to cheat in some way... Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 543921 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 544096 - Posted: 11 Apr 2007, 4:05:20 UTC - in response to Message 543818. PhonAcq You touched a Nerve! The Seti Staff and Many Volunteers have spent great amounts of Time, Effort and Money to move Seti to the Hope of Survival! This has been the best conversation from the Seti Staff and identifiable progress in years... Somewhere you missed that MultiBeam provides more data than 10X the users that are currently crunching... This does not mention that the change to the number of machines (Quorum) has been in testing (Seti Beta) for months... So now it getting ready for one of the next steps in Seti Evolution! So, You are Welcome to donate the First of the New Servers that will start Near Time Persistency checking that will produce the scientific results from the stored Results... Or required reloading of old tapes for verification... You obviously did not read one of Matts posts that the now have enough data to finally start comparing the results, to see if there is something actually there... Yes, it has been a very long Journey! I am a Supporter of the that Journey! I know that SETI Volunteers recently found between Seven and Eight+ thousand dollars worth of hardware to solve the Seti Server issues with the current Seti hardware... That work is still progressing... * Where were you then? * Why are you creating Hate and Discontent now? * Or Again! Yes I know that Matt answered parts of your questions... I still do not understand why you chose this point to drag up old cr@p! Obviously, you may have some preceived axe to grind, but your Credits show that you are happy with Seti (Status Quo) and have not left inspite some of your past complaints! Regards Pappa This is a good opportunity for me to get on my "change control soapbox". Would someone give a good (technical, quantitative) reason for the change? Would someone give a good (technical, quantitative) analysis of the risk involved? It just seems like good project management to go to the effort to engineer the change (and good PR to publish the results). Please consider a Donation to the Seti Project. ID: 544096 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 544144 - Posted: 11 Apr 2007, 5:15:08 UTC - in response to Message 543921. ... I guess the question is for what is the probability (possibility) of getting two similarly incorrect results to fool the quorum checking? Small, but the same as it has always been. A canonical result is one of a pair that is "strongly similar". Any additional results which are at least "weakly similar" are also granted credit. The "strongly similar" criteria are sufficiently tight to ensure reliable results are entered in the Master Science database, the "weakly similar" criteria loose enough that those using different platforms or CPU generations which may occasionally not be in complete agreement will still be rewarded. Could two similar systems similarly overclocked or similarly overheated return results in the same incorrect way? Highly unlikely, but of course possible. Joe ID: 544144 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 544265 - Posted: 11 Apr 2007, 14:21:17 UTC - in response to Message 544096. Last modified: 11 Apr 2007, 14:24:37 UTC This has been the best conversation from the Seti Staff and identifiable progress in years... And all excellent motivation too all round. Somewhere you missed that MultiBeam provides more data than 10X the users that are currently crunching... This does not mention that the change to the number of machines (Quorum) has been in testing (Seti Beta) for months... So now it getting ready for one of the next steps in Seti Evolution!... The "x10" data throughput gives very good impetus to look at speeding up the analysis, provided... This is a good opportunity for me to get on my "change control soapbox". Would someone give a good (technical, quantitative) reason for the change? ... (technical, quantitative) analysis of the risk involved?... ... that the Science isn't diluted. Given that WU verification is still done by Berkeley themselves for those 'interesting' WUs found by us, and that the real 'proof' is in subsequently finding a consistent ET beacon, then I consider that this change is 'affordable' and all good and better for progress. Especially so for doing something useful and timely with the new flood of data. It would still be good to see some statistics and/or numbers for what the quorum change 'means' for the 'reliability' of the Science results. Congratulations to Matt, Eric(?), Jeff(?) and All for pushing this far for getting the ALFA data onstream! I'm sure we're all looking forward to seeing the new data! Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 544265 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 544269 - Posted: 11 Apr 2007, 14:39:23 UTC - in response to Message 543912. I forget the exact numbers, but if the redundancy level was 1, the reliability of the data was over 99% (i.e. less than 1% of the data was corrupt or bogus). Each level of redundancy above that is an improvement, but with diminishing returns. Oooops, missed that comment. That answers one part of the quorum change questions. The current data being analyzed by SETI@home has already been analyzed before, but with older versions of the client that did far less analysis (so workunits wouldn't take weeks to finish). There's a scientific reason to do so, So still useful as data to use to search again but deeper if we run out of the more sensitive new data: but we're chomping on the bit to get to the new data because it's "deeper" in both frequency space and in sensitivity. Plus RFI will be easier to reject because of the multiple beams, etc. So reducing redundancy will help us push through the remaining old data. This following very important point (to some) is harking back to old discussions about having 'golden calibrations' so that credits can be more accurately awarded... One drawback of less redundancy is more "jagged" credit for workunits done (since they won't be averaged over the claimed credit of many users). Over the long term this will average out, but users who pay close attention will get annoyed by a few data points that make it seem they're being "short changed." Some PR will be necessary to smooth that over. - Matt Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 544269 ·

Labbie Send message Joined: 19 Jun 06 Posts: 4083 Credit: 5,930,102 RAC: 0	Message 544316 - Posted: 11 Apr 2007, 16:29:32 UTC I see that the change is already in process. I've gotten several WUs that are the 3/2 format. Yippee!!! ID: 544316 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 544331 - Posted: 11 Apr 2007, 17:10:27 UTC - in response to Message 544316. I see that the change is already in process. I've gotten several WUs that are the 3/2 format. Yippee!!! Yay, a few here too, :D. Here's to a leaner, tidier database. Nice smooth outage and recovery this week too. Keep up the great work ( but remember to take a little rest too!) Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 544331 ·

Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0	Message 544360 - Posted: 11 Apr 2007, 18:37:51 UTC Redundancy level of one--- Does that mean that there are two results that agree? If so, and the probability that those two results are truly correct is 99 percent--- Does that mean that each unchecked result has a 90 percent chance of bring correct, and that three checks would imply a certainty of 99.9 percent? Of course it would seem that any false positive found in the results would easily be disproven by more results. ID: 544360 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 544383 - Posted: 11 Apr 2007, 19:32:50 UTC - in response to Message 544269. This following very important point (to some) is harking back to old discussions about having 'golden calibrations' so that credits can be more accurately awarded... One drawback of less redundancy is more "jagged" credit for workunits done (since they won't be averaged over the claimed credit of many users). Over the long term this will average out, but users who pay close attention will get annoyed by a few data points that make it seem they're being "short changed." Some PR will be necessary to smooth that over. - Matt Happy crunchin', Martin With flop-counting, most machines claim exactly the same credit. It'll only be an issue when a current BOINC client gets paired with an older BOINC client, and I think the lower score is usually thrown out, so it'll likely raise credit a tiny bit on average. ID: 544383 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.