THE redundancy question

Message boards : Number crunching : THE redundancy question
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 93531 - Posted: 1 Apr 2005, 2:11:40 UTC
Last modified: 1 Apr 2005, 2:51:42 UTC

OK, I went and did it, but i haven't analysed it yet. I went through all my results back to Jan 8th. I've excluded WUs that are in progress or pending credit. I've counted the Successes to total WU sent out. and counted how many occurances of each there were. There were 160 Results in this study. Example, 3/4/18 means that 3 WU had success/4 total were sent out by seti, and this happened 18 times in the 160 total WUs

Successes/WU Sent/number of occurances
3/4/18
3/5/5
3/6/4
3/7/4
3/8/5
3/9/1
3/10/1
4/4/76
4/5/15
4/6/11
4/7/13
4/8/2
5/6/2
5/7/1
6/7/1
7/8/1

I'm sending this out prior to analysis, so you can think about it.

tony

[edit] OK if we just look at the results showing the original 4 wu sent out than we have 18 + 76 =94. so 94 of the 160 total validated successfully without sending it out again, or 59% of the time a wu is not needed to be resent.

Also, since 4/4/is 76, we can know that if it were only sent out 3 times it would have validated just fine. so 48% of the time the 4th WU is NOT needed to be sent out.
[end edit]
[second edit]
24% (38 times)of the 160 validated with 3 successful results.
73% (117 times)of the 160 validated with 4 successful results.
2% (3 times)of the 160 validated with 5 successful results.
less than 1% (1 time each)of the 160 validated with 6 or 7 successful results.
[end edit]

ID: 93531 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 93537 - Posted: 1 Apr 2005, 2:43:59 UTC - in response to Message 93531.  

Gold star for your effort.
ID: 93537 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 93569 - Posted: 1 Apr 2005, 3:31:20 UTC

OK, 59% of the time (94 times) it took sending it out 4 times.
13% of the time (20 times) it took sending it out 5 times.
11% of the time (17 times) it took sending it out 6 times.
12% of the time (19 times) it took sending it out 7 times.
6% of the time (8 times) it took sending it out 8 times.
less than 1% of the time it was sent out 9 or 10 times (one time each)
ID: 93569 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 93576 - Posted: 1 Apr 2005, 3:41:18 UTC - in response to Message 93569.  
Last modified: 1 Apr 2005, 3:41:32 UTC

So in 40%±2.5% of the cases, the WU required a redundant retransmission.
That's good enough for me - Redundancy is my friend.
ID: 93576 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 93582 - Posted: 1 Apr 2005, 3:49:18 UTC - in response to Message 93576.  

> So in 40%±2.5% of the cases, the WU required a redundant
> retransmission.
> That's good enough for me - Redundancy is my friend.
>
Actually it's closer to 50/50. the 59% indicates it WAS sent 4 times, and since 4 is the minimum it means 40% of the time it took more than 4 times. There were 94 times that the WU was sent out 4 times. 18 of those validated with only 3 successes. 76 of them had 100% success.

Certainly, all of the 76 would have validated with it being sent out only 3 times and probably a small percentage(let's use 4) of the 18 would have validated with only 3 being sent. 76 +4 =80 Now 80/160 = 50% of the time only sending it out 3 times would have been successful.
ID: 93582 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 93583 - Posted: 1 Apr 2005, 3:53:32 UTC - in response to Message 93582.  

The idea I was trying to express was that the odds of needing 4 or more hosts to crunch a WU is significant, and therefore "Redundancy is my friend".
ID: 93583 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 93587 - Posted: 1 Apr 2005, 3:57:26 UTC

Ok, so sending the WU out :

3 times = 50% validation success
4 times = 59% validation success
5 times = 71% validation success
6 times = 82% validation success
7 times = 94% validation success
8 times = 99% validation success

anyone wanna draw conclusions about redundancy?

tony

ID: 93587 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 93591 - Posted: 1 Apr 2005, 4:01:13 UTC - in response to Message 93587.  

> Ok, so sending the WU out :
>
> 3 times = 50% validation success
> 4 times = 59% validation success
> 5 times = 71% validation success
> 6 times = 82% validation success
> 7 times = 94% validation success
> 8 times = 99% validation success
>
> anyone wanna draw conclusions about redundancy?

Based on this data set (which comes from a dedicated cruncher, so we're pretty sure that 1 of 3 or 4 will always return a WU), I'd say there isn't a real strong argument for 3 vs. 4. 4 is better, but not that much better.

I'm amazed that we have to go to 8 to get above the 95th percentile.

The biggest conclusion: lots of folks with bad clients, or not much dedication.
ID: 93591 · Report as offensive
Profile RDC
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 544
Credit: 1,215,728
RAC: 0
United States
Message 93593 - Posted: 1 Apr 2005, 4:02:59 UTC - in response to Message 93587.  

> Ok, so sending the WU out :
>
> 3 times = 50% validation success
> 4 times = 59% validation success
> 5 times = 71% validation success
> 6 times = 82% validation success
> 7 times = 94% validation success
> 8 times = 99% validation success
>
> anyone wanna draw conclusions about redundancy?
>
> tony
>

Even though your example is based on a very small portion of the WU's sent out and there will be variations per individual user, your right on the money. If a full examination of all past WU's sent out were done, the numbers would most likely fall very close to the numbers you came up with.





To truly explore, one must keep an open mind...
ID: 93593 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 93594 - Posted: 1 Apr 2005, 4:03:36 UTC - in response to Message 93591.  

Damn you Ned, it was your post in the other thread that prompted me to go through all this. LOL Have a nice day. ;)
ID: 93594 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 93595 - Posted: 1 Apr 2005, 4:04:37 UTC - in response to Message 93591.  

1σ in 5×
2σ in 7×
3σ in 8×

Thems good stats!
ID: 93595 · Report as offensive
Alex

Send message
Joined: 26 Sep 01
Posts: 260
Credit: 2,327
RAC: 0
Canada
Message 93628 - Posted: 1 Apr 2005, 7:52:39 UTC
Last modified: 1 Apr 2005, 8:03:47 UTC

> 3 times = 50% validation success
> 4 times = 59% validation success
> 5 times = 71% validation success
> 6 times = 82% validation success
> 7 times = 94% validation success
> 8 times = 99% validation success

you need to divide your results by the amount of work the server had so do.
So...
So. suppose you have 100 work units you need crunched.

3. Send out 3x per result.
In a population of 100 results, send out 300 work units (3x100), 50 validate the first round. Resend out 150 again (3x50), and 25 validate. Send out 75 (3x25)and 13 validate. Send out 36 (3x12) and 6 more validate. Send out 18 copies and 3 results validate. Send out 9 and 2 validates. Send out 3 and the last one validates.
Sum of work needed to be done by boinc server is 300+150+75+36+18+9+3 = 581 work units send out to validate 100 results. 7 iterations required.


4. Send out 4x per result.
Immediately send out 400 work units. 59 validate. Send out 4x41= 164 and 24 validate. Send out 4x24= 96 and 14 validate. Send out 40 and you get 6 results validated. Send out 16 and get 3 results validated. Send out 4 and get last result. 400+164+96+40+16+4= 720 work units send out to get 100 valid results.
6 iterations required.

5. Send out 5x100, get 71 results validate. Send out 5x29=145 Smf 21 results validate. Send out 5x8=40 and 6 results valdidate. Send out 5x2 and 2 results come back. 500+145+40+10= 685 results sent out to validate 100 work units in only 4 iterations. This is less work than sending out 4x.

6. Send out 6x100 = 600, 82 validate. send out 6x18=49 and 15 validate. Send out 3x6=18, and 3 validate. so. 600+49+18= 657 work units send out to validate 100 results in 3 iterations.

7 send otu 7x100 = 700 results. 94 validate. Send out 7x6=72 and remaining validate. Work needed = 772 work units to get 100 results. 2 iterations.

8 semd pti 800 work units, 99 validate. send out 8 more work units, and last one validates. 808 work units. 2 iterations.

Reduncancy WU crunched iterations
3___________581__________7
4___________720__________6
5___________685__________5
6___________657__________3
7___________772__________2
8___________808__________2

Using your data, 3x redundancy requires the least amount of computing power, but you have to wait a long time to get all your results done, so the seti team would have to store old WU for a longer time. ie.. 7 weeks to get a response.

Second best option would be 6x redundancy per wu at 657.
it's only 13% more work for the boinc server, but you get your results back in half the time.

The Boinc guys likely have more accurate results than you have, and in reality, statistics tend to cluster.

addendum.
Also, if the boinc server sends out 6wu compared to 3 wu, it may be 13% more wu crunched and kept track of, but there's also the fact that the results come back faster as the faster machines respond first, so it may result in a lower work load for a boinc server in that it may keep track of wu's for 35% less time for example.
ID: 93628 · Report as offensive
Metod, S56RKO
Volunteer tester

Send message
Joined: 27 Sep 02
Posts: 309
Credit: 113,221,277
RAC: 9
Slovenia
Message 93635 - Posted: 1 Apr 2005, 9:27:11 UTC
Last modified: 1 Apr 2005, 9:27:41 UTC

@Alex: nice sum-up. It shows that there's a political decission to make in order to optimally fulfill several goals.

For the project, the most important goal should be to get as many verified results as possible. This means minimum redundancy (only minimal number of clients will crunch the same WU).

This, in turn, might require larger storage space to keep track of unfinished WUs. Project managers might want to decide to increase number of clients crunching the same WU in order to speed up the validation proces (by receiving more results of the same WU from many clients).
Which makes some users unhappy du to too high redundancy ;)

The third gola is keeping users happy. This is achieved through credit system. Which makes users happy if credit granting doesn't involve long delays.

This last one works against both of the above.

And this was main reason to increase number of initially sent WUs from 3 to 4.

On a side note: in Einstein there was a thread about increasing the deadline (from 1 to 2 weeks) ... the argument was that this would enable users with slower machines to participate in project. Bruce (project manager) argued that this would stress project servers quite much as there would be larger number of unfinished WUs.


@Tony: it would be great to know how many of 4-times sent WUs which verified with only 3 results would need sending additional WU if only 3 would be sent out initially. This number would actually show the benefit (if any) of current vs. previous scheme.
Metod ...
ID: 93635 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 93637 - Posted: 1 Apr 2005, 9:50:22 UTC
Last modified: 1 Apr 2005, 9:52:15 UTC

And here I thought that we went from 3× to 4× to prevent cheating. Isn't the 4th WU sent out when there's a disagreement between the three WUs, be it for validity or credit claim?
ID: 93637 · Report as offensive
Alex

Send message
Joined: 26 Sep 01
Posts: 260
Credit: 2,327
RAC: 0
Canada
Message 93640 - Posted: 1 Apr 2005, 10:19:29 UTC
Last modified: 1 Apr 2005, 10:30:15 UTC

It's one thing to use our results based on a sample from one user. It's another to use the actual project results to see what the real reduncandy versus reliability data really is.

You'll proably find that the reliability number is more likely a variable, based on network traffic, server outages, and the number of people running unstable client software.

The 'fifty percent' success rate for a 3 wu redundancy seems a bit low to me.
For all we know, the data that the admins are working with likely point to 4 resulting in a more efficient system than 3.

As for the 'unhappy because of to high redundancy'. I disagree.
People want to have valid work units. If 50% of your work units fail because the project sends out only 3 wu's and the group can't validate half the time, then we end up crunching more useless work units.
I think that the 4 actually results in more happy crunchers than 3. (speculating that boinc's data backs this us better than our own sample of data')

Edit:
if you quickly multiply the 'success rate' versus number of crunching required, you will see that you get more 'happy crunchers' if they sucdessfully validate more often.
Estimates...
3: 290 happy customers
4: 424
5: 486
6: 538
7: 725

ID: 93640 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 93653 - Posted: 1 Apr 2005, 11:28:13 UTC - in response to Message 93628.  
Last modified: 1 Apr 2005, 11:34:13 UTC

> 3. Send out 3x per result.
> In a population of 100 results, send out 300 work units (3x100), 50
> validate the first round. Resend out 150 again (3x50), and 25 validate. Send
> out 75 (3x25)and 13 validate. Send out 36 (3x12) and 6 more validate. Send
> out 18 copies and 3 results validate. Send out 9 and 2 validates. Send out 3
> and the last one validates.
> Sum of work needed to be done by boinc server is 300+150+75+36+18+9+3 = 581
> work units send out to validate 100 results. 7 iterations required.
>
Morning Alex, As I was going through MY results I noticed that they would send out the WU to only one new host when there was a "Download Error". For example,
12 March the WU was sent to 4 hosts. One reported back immediately that it had a download error. On 13 March Seti sent the WU back out to ONE new host. I'm not sure how this might change you numbers.

I can appreciate your effort.

Tony

Metod Wrote:
>@Tony: it would be great to know how many of 4-times sent WUs which verified >with only 3 results would need sending additional WU if only 3 would be sent >out initially. This number would actually show the benefit (if any) of current >vs. previous scheme.

Hmmm, that would require me to go through all 160 again to look at response times..... Hmmm, I might do this later, but as I stated it would probably be a low number like 4 times out of the 18 occurances.

ID: 93653 · Report as offensive
Metod, S56RKO
Volunteer tester

Send message
Joined: 27 Sep 02
Posts: 309
Credit: 113,221,277
RAC: 9
Slovenia
Message 93659 - Posted: 1 Apr 2005, 11:55:35 UTC - in response to Message 93640.  
Last modified: 1 Apr 2005, 12:02:07 UTC

> You'll proably find that the reliability number is more likely a variable,
> based on network traffic, server outages, and the number of people running
> unstable client software.

The whole thing about sending the same WU out more than once is to be positive (to certain extent) that the results are OK. As it has been already said, the minimum number of received results to verify the correctness of a result is 3: if at least 2 agree, then this is (most probably) correct result. The certainty of correctness increases with higher number of independant results that are the same.
You can calculate the certainty something like this: say 1 out of 100 results is wrong. Statistically you have 1/100 probability that any result is not correct. If you only receive one result, you have to live with that. If you receive 2 results and both agree, the probability of uncorrect result drops to 1/10.000 (1/100 * 1/100). If the results don't agree, you don't know anything. If you receive 3 results and all of them agree, the probability of wrong answer drops further. However, you have some possibility of saying which result is correct and which one is not if not all agree (deciding between 1/100 over 1/10.000 of probability of a result being not correct - either can be wrong of course).
The number above (100) is arbitrary, but can be determined out of validator statistics.

Now, if 1 out of 3 received results is not there or not all results agree, we need more results. Which takes up to 2 weeks longer. And then we retry the validation.

And this is the moment where initial sending of WU to 4 machines comes into play: shorter time to validate.


Metod ...
ID: 93659 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 93669 - Posted: 1 Apr 2005, 12:51:35 UTC
Last modified: 1 Apr 2005, 13:21:53 UTC

OK, Alex caused me to go back and re-examine my results. This time I looked at the number of days it took Seti to send out results until it got a successful validation.

for example: 4 were sent out on 12 March, then one more was sent on 13 march. this would equal 2 days worth of time. It does not take into account how long it took to crunch.

My sample data is now 161 results.

Days/number of occurences/% successful after X days of sending
1/102/63%
2/13/8%
3/7/4%
4/7/4%
5/6/4%
6/5
7/3
8/3
10/2
15/4
17/2
18/1
19/1
20/1
21/2
23/3
46/1

In general it did not take the two weeks for validation to occur.

so,
63% of the time it only took one days worth of sending for a success result
71% of the time it only took two days worth
76% of the time it took three days
80% for 4 days
84% for 5 days
87% for 6 days
89% for 7 days


tony

[edit] Not once did seti resend a wu to 4 new hosts in one bulk sending, It always just sent it to the number that failed, and as they were reported as failed. [end edit]
ID: 93669 · Report as offensive
Daniel Schaalma
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 297
Credit: 16,953,703
RAC: 0
United States
Message 93680 - Posted: 1 Apr 2005, 13:28:42 UTC

Perhaps in a future version of the BOINC software, once the scheduler gets
the 3 required validated results, the scheduler could instruct any hosts
that have a redundant WU in cache to ignore the redundant WU, and move on to
the next WU. I am not a programmer, so I don't know if it would be feasable
to implement this feature...

Regards, Daniel.
ID: 93680 · Report as offensive
N/A
Volunteer tester

Send message
Joined: 18 May 01
Posts: 3718
Credit: 93,649
RAC: 0
Message 93684 - Posted: 1 Apr 2005, 13:42:04 UTC - in response to Message 93680.  

I've also had that thought, but that depends on the user's machine being online full time. The only time when the client is sure to be online is when it attaches, asks for work, resets, or detaches.

What I'd like is the ability to tell the server that I've stopped crunching a WU or lost it, and it would send out another copy of the WU to another client. That way users that have a spotty history (read: me) don't hold back other crunchers from getting their credit.
ID: 93684 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : THE redundancy question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.