Heads Up: Quorum Change


log in

Advanced search

Message boards : SETI@home Staff Blog : Heads Up: Quorum Change

1 · 2 · 3 · 4 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1390
Credit: 74,079
RAC: 0
United States
Message 543372 - Posted: 9 Apr 2007, 22:29:09 UTC

Hello, generous donators of CPU time.

As we ramp down the current data analysis to make way for the hot, fresh multibeam data, we are going to change the quorums for result validation very soon. Perhaps tomorrow. Currently we generate four results per workunit, though three are enough for validation. We're changing this to three and two.

This means less redundancy, which has the wonderful side effect of consuming less of the world's power to do our data analysis. However, be warned that this may cause us to run out of work to process from time to time over the coming weeks, especially if we lower the quorum levels even more (two and two).

There's concern that "no work available" will cause us to lose valuable participants. Maybe it won't. In any case, hopefully we can get the multibeam data going before the well drains completely.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Labbie
Avatar
Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 543378 - Posted: 9 Apr 2007, 22:34:03 UTC

Good news Matt. Thanks.

Profile Misfit
Volunteer tester
Avatar
Send message
Joined: 21 Jun 01
Posts: 21790
Credit: 2,510,901
RAC: 0
United States
Message 543551 - Posted: 10 Apr 2007, 3:05:55 UTC - in response to Message 543372.

Hello, generous donators of CPU time. - Matt

Never let it be said again we aren't recognized. :) (Unless I say it.)
____________

Join BOINC Synergy!

Profile Byron Leigh Hatch @ team Carl SaganProject donor
Volunteer tester
Avatar
Send message
Joined: 5 Jul 99
Posts: 3621
Credit: 11,946,180
RAC: 1,134
Canada
Message 543586 - Posted: 10 Apr 2007, 4:37:42 UTC

Hi Matt ... thanks very much for the Good news!
_my Computer_id_5302_
_my Computer_id_5303_
_my Computer_id_5304_
_my Computer_id_5306_

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8748
Credit: 25,598,741
RAC: 8,199
United Kingdom
Message 543611 - Posted: 10 Apr 2007, 5:37:27 UTC

Thanks Matt,
I think we better think carefully about the possibility of running short of work as the multibeam data on my C2D and Pent M reduce crunch time per task by about 70% on average. If this is true across all types of CPU/OS platforms then reducing replication to 3 will mean the number of tasks completed will approximately double.

And as since enhanced came on line most new cpu's are dual cores I hope the new servers can cope better with the database, when more than million/day tasks is once more back upon us.

Andy

Profile Clyde C. Phillips, III
Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 543801 - Posted: 10 Apr 2007, 15:51:17 UTC

Thanks, Matt, for the news. Less redundancy means more science done with the same amount of computation. So, Alfa datacrunching is just around the corner? Einstein will be proud if Seti data runs out, for me.
____________

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1624
Credit: 22,526,648
RAC: 4,265
United States
Message 543818 - Posted: 10 Apr 2007, 16:35:12 UTC

This is a good opportunity for me to get on my "change control soapbox". Would someone give a good (technical, quantitative) reason for the change? Would someone give a good (technical, quantitative) analysis of the risk involved? It just seems like good project management to go to the effort to engineer the change (and good PR to publish the results).

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1390
Credit: 74,079
RAC: 0
United States
Message 543844 - Posted: 10 Apr 2007, 21:11:56 UTC

Reducing quorum means less redundancy, which in turn means we need to create more work to keep up with "demand." In theory, we still have a bunch of "standard" SETI@home data to work with, so we shouldn't run out... but they are on DLT IV tapes which are slow to read, and we don't have a carousel or robot to feed the drive over nights or weekends. Currently I can throw one or two tapes in a day (four days a week) and that's just enough. Reducing quorum to 3 will increase splitter demand by 25%. And we may soon reduce it to 2, meaning splitter demand will increase by 50%. We can't feed our (only) DLT drive fast enough. Another DLT drive won't help - our network nor the file server can't handle the bandwidth. Plus we're very close to being finished with this data.

The tapes hold 35GB of data. The new multibeam data, however, are on 500GB/750GB hard drives. And we have two drive bays at a time available for reading. So we can keep a lot of data online without human intervention and feed it to the splitters over long weekends, etc. So running out of work is less of a worry, even with reduced redundancy.

However! If we ever "catch up" and we have a high demand for work to analyze, we will most certainly run out of work to send. This is obvious, but has never been the case, so our user base might be annoyed by this even though this is a happy problem. We'll cross that bridge, etc.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1624
Credit: 22,526,648
RAC: 4,265
United States
Message 543882 - Posted: 10 Apr 2007, 22:06:07 UTC
Last modified: 10 Apr 2007, 22:08:42 UTC

Thanks, Matt; this is enlightening. You have 1 tape drive that feeds the splitters. If the downstream demand increases due to reduced redundancy then someone needs to feed the drive at an increasing rate, which will lead to gaps during which the drive is dormant. You also have 'a bunch' of tapes left to analyze. So reduced redundancy will increase their processing rate, but potentially expose the project to gaps in WU availability.

But why is the redundancy criterion being modified? The premise since Classic is that Seti needed a high level of redundancy to be believed. (I seem to remember questioning this premise a couple of years ago and getting blasted and later ignored by the community.) What model tells us that we can live with 3 or 2 in a quorem? The risk to the project is false positives as well as missing positives. (I am not aware of what kind of probability analyses have been performed on this topic.)

A secondary question is how long would it take with the current redundancy criterion to finish processing the 'backlog' of data tapes, assuming the current computing capacity as provided by the volunteers? That is, what is the aggregate number of WU's on the tapes, how many have been processed, how many can 'we' process per day? I realize that the tapes are being re-analyzed with more stringent criteria, leading to either more WU's or more compute time, or both.

Finally, is it possible that the new (multibeam) data is such an improvement in S/N (or similar criteria) that we risk wasting resources looking at the old tape data? Is this the driving factor for any change? ("hurry up and get to the new data") That makes sense, but it does seem risky to modify the redundancy criteria, unless we have been wrong all along by requiring a quorem of 4. In this case, I suggest just making a clean switch over as soon as possible, rather than having a bunch of 'half' analyzed datasets to worry about.

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1390
Credit: 74,079
RAC: 0
United States
Message 543912 - Posted: 10 Apr 2007, 23:11:41 UTC

I forget the exact numbers, but if the redundancy level was 1, the reliability of the data was over 99% (i.e. less than 1% of the data was corrupt or bogus). Each level of redundancy above that is an improvement, but with diminishing returns.

The current data being analyzed by SETI@home has already been analyzed before, but with older versions of the client that did far less analysis (so workunits wouldn't take weeks to finish). There's a scientific reason to do so, but we're chomping on the bit to get to the new data because it's "deeper" in both frequency space and in sensitivity. Plus RFI will be easier to reject because of the multiple beams, etc. So reducing redundancy will help us push through the remaining old data.

One drawback of less redundancy is more "jagged" credit for workunits done (since they won't be averaged over the claimed credit of many users). Over the long term this will average out, but users who pay close attention will get annoyed by a few data points that make it seem they're being "short changed." Some PR will be necessary to smooth that over.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 543917 - Posted: 10 Apr 2007, 23:18:32 UTC - in response to Message 543882.

Thanks, Matt; this is enlightening. You have 1 tape drive that feeds the splitters. If the downstream demand increases due to reduced redundancy then someone needs to feed the drive at an increasing rate, which will lead to gaps during which the drive is dormant. You also have 'a bunch' of tapes left to analyze. So reduced redundancy will increase their processing rate, but potentially expose the project to gaps in WU availability.

Remember that the new data is being shipped from Aricebo on nice, fast SATA hard drives, not tape.

____________

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8574
Credit: 4,234,019
RAC: 851
United Kingdom
Message 543921 - Posted: 10 Apr 2007, 23:27:34 UTC - in response to Message 543882.
Last modified: 10 Apr 2007, 23:30:57 UTC

Thanks, Matt; this is enlightening. You have 1 tape drive that feeds the splitters...

Ouch! No redundancy there for a failed drive or a jammed tape...

But why is the redundancy criterion being modified? The premise since Classic is that Seti needed a high level of redundancy to be believed. (I seem to remember questioning this premise a couple of years ago and getting blasted and later ignored by the community.) What model tells us that we can live with 3 or 2 in a quorem? The risk to the project is false positives as well as missing positives. (I am not aware of what kind of probability analyses have been performed on this topic.)

I guess the question is for what is the probability (possibility) of getting two similarly incorrect results to fool the quorum checking?

Could two similar systems similarly overclocked or similarly overheated return results in the same incorrect way?

A secondary question is how long would it take with the current redundancy criterion to finish processing the 'backlog' of data tapes, assuming the current computing capacity as provided by the volunteers? That is, what is the aggregate number of WU's on the tapes, how many have been processed, how many can 'we' process per day? I realize that the tapes are being re-analyzed with more stringent criteria, leading to either more WU's or more compute time, or both.

Finally, is it possible that the new (multibeam) data is such an improvement in S/N (or similar criteria) that we risk wasting resources looking at the old tape data? Is this the driving factor for any change? ("hurry up and get to the new data") That makes sense, but it does seem risky to modify the redundancy criteria, unless we have been wrong all along by requiring a quorem of 4. In this case, I suggest just making a clean switch over as soon as possible, rather than having a bunch of 'half' analyzed datasets to worry about.


All very good questions.

I think that the strategy of improving the efficiency and holding recrunching of classic WUs in reserve is a good idea.

I must admit that I too have the question as to whether the quorum logic and the real world can work well enough together for a minimum quorum of two. But then again, the main risk is that of false positives and for those positives, those few WUs can be reanalysed for confirmation. The other risk is that of false negatives and whether that can open the door for users to cheat in some way...


Happy crunchin',
Martin

____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Profile Pappa
Volunteer tester
Avatar
Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 544096 - Posted: 11 Apr 2007, 4:05:20 UTC - in response to Message 543818.

PhonAcq

You touched a Nerve! The Seti Staff and Many Volunteers have spent great amounts of Time, Effort and Money to move Seti to the Hope of Survival!

This has been the best conversation from the Seti Staff and identifiable progress in years...

Somewhere you missed that MultiBeam provides more data than 10X the users that are currently crunching... This does not mention that the change to the number of machines (Quorum) has been in testing (Seti Beta) for months... So now it getting ready for one of the next steps in Seti Evolution!

So, You are Welcome to donate the First of the New Servers that will start Near Time Persistency checking that will produce the scientific results from the stored Results... Or required reloading of old tapes for verification... You obviously did not read one of Matts posts that the now have enough data to finally start comparing the results, to see if there is something actually there...

Yes, it has been a very long Journey! I am a Supporter of the that Journey! I know that SETI Volunteers recently found between Seven and Eight+ thousand dollars worth of hardware to solve the Seti Server issues with the current Seti hardware... That work is still progressing...

* Where were you then?

* Why are you creating Hate and Discontent now?

* Or Again!


Yes I know that Matt answered parts of your questions... I still do not understand why you chose this point to drag up old cr@p! Obviously, you may have some preceived axe to grind, but your Credits show that you are happy with Seti (Status Quo) and have not left inspite some of your past complaints!

Regards

Pappa

This is a good opportunity for me to get on my "change control soapbox". Would someone give a good (technical, quantitative) reason for the change? Would someone give a good (technical, quantitative) analysis of the risk involved? It just seems like good project management to go to the effort to engineer the change (and good PR to publish the results).


____________
Please consider a Donation to the Seti Project.

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4335
Credit: 1,113,795
RAC: 779
United States
Message 544144 - Posted: 11 Apr 2007, 5:15:08 UTC - in response to Message 543921.

...
I guess the question is for what is the probability (possibility) of getting two similarly incorrect results to fool the quorum checking?


Small, but the same as it has always been. A canonical result is one of a pair that is "strongly similar". Any additional results which are at least "weakly similar" are also granted credit. The "strongly similar" criteria are sufficiently tight to ensure reliable results are entered in the Master Science database, the "weakly similar" criteria loose enough that those using different platforms or CPU generations which may occasionally not be in complete agreement will still be rewarded.

Could two similar systems similarly overclocked or similarly overheated return results in the same incorrect way?


Highly unlikely, but of course possible.
Joe

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8574
Credit: 4,234,019
RAC: 851
United Kingdom
Message 544265 - Posted: 11 Apr 2007, 14:21:17 UTC - in response to Message 544096.
Last modified: 11 Apr 2007, 14:24:37 UTC

This has been the best conversation from the Seti Staff and identifiable progress in years...

And all excellent motivation too all round.

Somewhere you missed that MultiBeam provides more data than 10X the users that are currently crunching... This does not mention that the change to the number of machines (Quorum) has been in testing (Seti Beta) for months... So now it getting ready for one of the next steps in Seti Evolution!...

The "x10" data throughput gives very good impetus to look at speeding up the analysis, provided...

This is a good opportunity for me to get on my "change control soapbox". Would someone give a good (technical, quantitative) reason for the change? ... (technical, quantitative) analysis of the risk involved?...

... that the Science isn't diluted.

Given that WU verification is still done by Berkeley themselves for those 'interesting' WUs found by us, and that the real 'proof' is in subsequently finding a consistent ET beacon, then I consider that this change is 'affordable' and all good and better for progress. Especially so for doing something useful and timely with the new flood of data.

It would still be good to see some statistics and/or numbers for what the quorum change 'means' for the 'reliability' of the Science results.


Congratulations to Matt, Eric(?), Jeff(?) and All for pushing this far for getting the ALFA data onstream! I'm sure we're all looking forward to seeing the new data!

Happy crunchin',
Martin

____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8574
Credit: 4,234,019
RAC: 851
United Kingdom
Message 544269 - Posted: 11 Apr 2007, 14:39:23 UTC - in response to Message 543912.

I forget the exact numbers, but if the redundancy level was 1, the reliability of the data was over 99% (i.e. less than 1% of the data was corrupt or bogus). Each level of redundancy above that is an improvement, but with diminishing returns.

Oooops, missed that comment. That answers one part of the quorum change questions.

The current data being analyzed by SETI@home has already been analyzed before, but with older versions of the client that did far less analysis (so workunits wouldn't take weeks to finish). There's a scientific reason to do so,

So still useful as data to use to search again but deeper if we run out of the more sensitive new data:
but we're chomping on the bit to get to the new data because it's "deeper" in both frequency space and in sensitivity. Plus RFI will be easier to reject because of the multiple beams, etc. So reducing redundancy will help us push through the remaining old data.



This following very important point (to some) is harking back to old discussions about having 'golden calibrations' so that credits can be more accurately awarded...

One drawback of less redundancy is more "jagged" credit for workunits done (since they won't be averaged over the claimed credit of many users). Over the long term this will average out, but users who pay close attention will get annoyed by a few data points that make it seem they're being "short changed." Some PR will be necessary to smooth that over.

- Matt



Happy crunchin',
Martin

____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

Profile Labbie
Avatar
Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 544316 - Posted: 11 Apr 2007, 16:29:32 UTC

I see that the change is already in process. I've gotten several WUs that are the 3/2 format.

Yippee!!!

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5081
Credit: 74,110,404
RAC: 4,310
Australia
Message 544331 - Posted: 11 Apr 2007, 17:10:27 UTC - in response to Message 544316.

I see that the change is already in process. I've gotten several WUs that are the 3/2 format.

Yippee!!!


Yay, a few here too, :D. Here's to a leaner, tidier database. Nice smooth outage and recovery this week too. Keep up the great work ( but remember to take a little rest too!)

Jason

____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Clyde C. Phillips, III
Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 544360 - Posted: 11 Apr 2007, 18:37:51 UTC

Redundancy level of one--- Does that mean that there are two results that agree? If so, and the probability that those two results are truly correct is 99 percent--- Does that mean that each unchecked result has a 90 percent chance of bring correct, and that three checks would imply a certainty of 99.9 percent? Of course it would seem that any false positive found in the results would easily be disproven by more results.
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 544383 - Posted: 11 Apr 2007, 19:32:50 UTC - in response to Message 544269.


This following very important point (to some) is harking back to old discussions about having 'golden calibrations' so that credits can be more accurately awarded...

One drawback of less redundancy is more "jagged" credit for workunits done (since they won't be averaged over the claimed credit of many users). Over the long term this will average out, but users who pay close attention will get annoyed by a few data points that make it seem they're being "short changed." Some PR will be necessary to smooth that over.

- Matt



Happy crunchin',
Martin

With flop-counting, most machines claim exactly the same credit. It'll only be an issue when a current BOINC client gets paired with an older BOINC client, and I think the lower score is usually thrown out, so it'll likely raise credit a tiny bit on average.
____________

1 · 2 · 3 · 4 · Next

Message boards : SETI@home Staff Blog : Heads Up: Quorum Change

Copyright © 2014 University of California