Question on this granted credit

留言板 : Number crunching : Question on this granted credit
留言板合理

To post messages, you must log in.

1 · 2 · 后

作者消息
Profile Paul D. Buck
志愿者测试人员

发送消息
已加入:19 Jul 00
贴子:3898
积分:1,158,042
近期平均积分:0
United States
消息 165537 - 发表于:10 Sep 2005, 6:29:19 UTC - 回复消息 165288.  

I think some of the confusion comes from the terms Valid and Successful, if you look at his WU you will see it was completed successfully but it failed validation.

Success means only that the client side processing did not abend. The fact that a Result Data File was created with data is the bigest part of this.

*IF*, that file contains data that does not match with the contents of the other Result Data Files, or contains some other detectable error (I do not know for sure if they actually scan the files for "correctness" or not ... but it is a possiblility during the project specific validation ... numerical errors, Not a Number values, etc.) will cause the result to be tagged as invalid.

I hope this is more understandable and clears up the confusion. Not sure if this is or is not clearly stated in the Wiki ... and I am too tired to look tonight. But, if you look up the processing and this type of detail is not there ... send me a note tomorrow...
ID: 165537 · 举报违规帖子
Idefix
志愿者测试人员

发送消息
已加入:7 Sep 99
贴子:154
积分:482,193
近期平均积分:0
Germany
消息 165384 - 发表于:10 Sep 2005, 0:35:44 UTC - 回复消息 165361.  

One point on the "create another copy" comment somewhere; there is only one server copy of the WU ever created

I didn't mean that the copies are on the server disk (ok, maybe it sounded like that).
There are separate results files, but not separate WU files.

There are at least four copies of the WU: one on each client which crunches this WU... ;-)
This is what I meant with "copies are created and are sent out". You may also call it "clients download the WU"... ;-)

Carsten
ID: 165384 · 举报违规帖子
Sergey Broudkov
Avatar

发送消息
已加入:24 May 04
贴子:221
积分:561,897
近期平均积分:0
Russia
消息 165376 - 发表于:10 Sep 2005, 0:11:01 UTC - 回复消息 165361.  
最近的修改日期:10 Sep 2005, 0:11:44 UTC

Thanks. You maybe right, there is the logic in it, though not so straight-forward as I have imagined. Just one note:

One point on the "create another copy" comment somewhere; there is only one server copy of the WU ever created, not one for every person who downloads it. There are separate results files, but not separate WU files. The "extra" one that shows up on the results web page is just another row in the database to hold the new result number

Yes, it's clear. "Create another copy" was used only as a figure of speech for simplicity, because it looks so from client side. Surely there is only one file on a server, and the scheduler just gives out a link to it.


Sometimes pure efficiency must be sacrificed to make a system more robust.


Yes, good point, and I missed it. Thanks again.

EDIT: mistypes corrected.
Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
ID: 165376 · 举报违规帖子
Profile Bill Michael
志愿者测试人员
Avatar

发送消息
已加入:4 Dec 03
贴子:1122
积分:13,376,822
近期平均积分:44
United States
消息 165361 - 发表于:9 Sep 2005, 23:40:23 UTC - 回复消息 165325.  


A result was late. Whatever program is responsible for doing so immediately sent this WU to another participant. This has nothing to do with validation, nothing to do with "seeing 3 results", or anything else. Result late = send out another. Result returns with client error = send out another.

Is it so simple? Do you know it for sure, or it's only your belief? I can't belive that the system is so stupid and non-optimal to send another WU and wait for it (maybe another 2 weeks) when it already has all it needs. That's why I'm asking.


I have not looked at the server code, so I don't know it "for sure" - but I've looked at a LOT of result quorums, and read (for purposes of editing, so not just 'skimmed') most everything in the Wiki on this, and what I describe above matches to the best of my ability what I've seen.

I can definitely state that the validator is separate from the program that decides if sending the WU out again is necessary, because resends happen even when the validator is down or backlogged. It may be non-optimal to do such a resend in some cases, but there is a timing issue. There is no way to know WHEN the validator is going to come along and determine that two of the results already in place are matching closely, and therefore another result is unnecessary.

It won't _wait_ another 2 weeks for the resent result to come back, unless that result was truly necessary; the results already present will be validated and credit awarded when the validator comes along, if possible, and whenever the 'resend' does make it in, it'll just be given credit. If it WAS necessary however, then it's been sent at the earliest possible moment, and (if the scheduler is truly working at its best) to a participant likely to get it back quickly. Maybe a waste of a few CPU cycles, but the only way to get the credit awarded and the canonical result filed quickly - otherwise it could be days or weeks later. Imagine if they waited on the validator during the last outage and large WFV queue; two-week delay before finding you needed another result, then two-week delay for that result to come back in. Meanwhile, the WU file and 4 or 5 results files are sitting on disk and can't be deleted.

One point on the "create another copy" comment somewhere; there is only one server copy of the WU ever created, not one for every person who downloads it. There are separate results files, but not separate WU files. The "extra" one that shows up on the results web page is just another row in the database to hold the new result number while it is sent, returned, validated, deleted... for whatever reason, that record is created before it is absolutely known that it will actually be sent out, and thus is sometimes marked "not needed". Again, a timing issue, they can't 100% depend on every program seeing this particular WU "in order".

Sometimes pure efficiency must be sacrificed to make a system more robust. If every process had to occur in order on every WU, they'd never send out unneeded work, but the entire process would be held back by the slowest program, and any failure at all would stop the entire thing.
ID: 165361 · 举报违规帖子
Sergey Broudkov
Avatar

发送消息
已加入:24 May 04
贴子:221
积分:561,897
近期平均积分:0
Russia
消息 165325 - 发表于:9 Sep 2005, 22:22:15 UTC - 回复消息 165314.  


I still don't understand the question or problem.


Well, there is no problem, just my curiosity. We cats are very curious creatures, you know ;) I'm just trying to understand how things work. A kind of reverse engineering, if you like, as I don't know all internal details :)


A result was late. Whatever program is responsible for doing so immediately sent this WU to another participant. This has nothing to do with validation, nothing to do with "seeing 3 results", or anything else. Result late = send out another. Result returns with client error = send out another.

Is it so simple? Do you know it for sure, or it's only your belief? I can't belive that the system is so stupid and non-optimal to send another WU and wait for it (maybe another 2 weeks) when it already has all it needs. That's why I'm asking.

Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
ID: 165325 · 举报违规帖子
Idefix
志愿者测试人员

发送消息
已加入:7 Sep 99
贴子:154
积分:482,193
近期平均积分:0
Germany
消息 165323 - 发表于:9 Sep 2005, 22:20:20 UTC - 回复消息 165267.  
最近的修改日期:9 Sep 2005, 22:22:05 UTC

Just to remind others: I'm talking about Sergey's WU 24481018 with now five valid results, and not about Randy's WU with only two valid results.

OK, whatever component it was, even let it be validator itself) this component X must see 3 results before making another WU.


Maybe that's the way it should be. But it is not the way it is.

Have a look at this WU:
WU 23804912

This is the history of this WU:

- three results were send back before the deadline
- one result missed the deadline
- a fifth copy of the WU was created immediately after the deadline
- the validator had a look at the three results and noticed: "we have already three valid results, we don't need the others"
- the validator marked the fifth result as "didn't need"

If your "this component X must see 3 results before making another WU." were correct, this unnecessary fifth copy wouldn't have been created.

And most likely, in your case nothing different happened. But the copy was sent out before anything could mark this copy as "didn't need". Of course, there is still a chance that the first three results weren't consensous and a fourth result was needed, but you can't tell from the results what exactly happened.

Carsten

ID: 165323 · 举报违规帖子
Profile Bill Michael
志愿者测试人员
Avatar

发送消息
已加入:4 Dec 03
贴子:1122
积分:13,376,822
近期平均积分:44
United States
消息 165314 - 发表于:9 Sep 2005, 21:59:35 UTC - 回复消息 165267.  

But the validator has nothing to do with it. The server component who created the 5th copy (BTW who does it, a transitioner? OK, whatever component it was, even let it be validator itself) this component X must see 3 results before making another WU. Though it looks like it was the validator who evaluated one of the results as "not good enough" and that's why it asked for another one.


I still don't understand the question or problem. A result was late. Whatever program is responsible for doing so immediately sent this WU to another participant. This has nothing to do with validation, nothing to do with "seeing 3 results", or anything else. Result late = send out another. Result returns with client error = send out another. Neither thing related to validator that requires 3 results before validation. Once at least three non-error results present - validate. Those that are valid get credit, those that aren't, don't. If no two "closely match", so we still don't know what "valid" is, send out yet another and try again later. ???
ID: 165314 · 举报违规帖子
J D K
志愿者测试人员
Avatar

发送消息
已加入:26 May 04
贴子:1295
积分:311,371
近期平均积分:0
United States
消息 165288 - 发表于:9 Sep 2005, 21:02:10 UTC
最近的修改日期:9 Sep 2005, 21:04:46 UTC

I think some of the confusion comes from the terms Valid and Successful, if you look at his WU you will see it was completed successfully but it failed validation.

There was a discussion on this in another project, I think it was LHC, if I remember correctly.

Now the question is should he get credit for having a successful WU or should he only get credit if the WU meets Validation parameters?????

The other question is what status is used to determine a quorum, three successful WUs or three Valid WUs...
And the beat goes on
Sonny and Cher

BOINC Wiki

ID: 165288 · 举报违规帖子
Sergey Broudkov
Avatar

发送消息
已加入:24 May 04
贴子:221
积分:561,897
近期平均积分:0
Russia
消息 165267 - 发表于:9 Sep 2005, 20:34:08 UTC - 回复消息 165183.  

but the quorum was already there at the moment

Are you sure that the validator came across the WU before the deadline? If it didn't the system didn't know that there were already three consensous results, created a fifth copy of the WU immediately at he deadline and sent it out to you.


No, I'm not sure. But the validator has nothing to do with it. The server component who created the 5th copy (BTW who does it, a transitioner? OK, whatever component it was, even let it be validator itself) this component X must see 3 results before making another WU. Though it looks like it was the validator who evaluated one of the results as "not good enough" and that's why it asked for another one.

Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
ID: 165267 · 举报违规帖子
Idefix
志愿者测试人员

发送消息
已加入:7 Sep 99
贴子:154
积分:482,193
近期平均积分:0
Germany
消息 165183 - 发表于:9 Sep 2005, 17:37:22 UTC - 回复消息 164835.  

but the quorum was already there at the moment

Are you sure that the validator came across the WU before the deadline? If it didn't the system didn't know that there were already three consensous results, created a fifth copy of the WU immediately at he deadline and sent it out to you.

Carsten

ID: 165183 · 举报违规帖子
Idefix
志愿者测试人员

发送消息
已加入:7 Sep 99
贴子:154
积分:482,193
近期平均积分:0
Germany
消息 165181 - 发表于:9 Sep 2005, 17:35:33 UTC - 回复消息 164659.  

Once a third result returns the WU is eligible to be validated, but normally SETI waits for four before doing so.

This behavior was "normal" during the last weeks due to the growing "waiting for validation"-queue. But normally the validation process does not wait for the fourth result (no. 4 will get the same credit later as long as it is valid and it is reported in time).

Carsten
ID: 165181 · 举报违规帖子
Bill Barto

发送消息
已加入:28 Jun 99
贴子:864
积分:58,712,313
近期平均积分:91
United States
消息 165172 - 发表于:9 Sep 2005, 17:20:49 UTC - 回复消息 165165.  

If it takes at least 3 valid results to be returned to grant credit how come 2 people where granted credit on this one and I was given zero. If there was an error on my returned result how come they granted credit? I have never had an error on returned result and it does not say I had an error, I was just given zero credit?

http://setiweb.ssl.berkeley.edu/workunit.php?wuid=25010645

Thanks


It says your result was invalid here.

But like you, i thought it took 3 valid result before credit is granted. If your result is invalid, the other two results should not have had credit granted, until a 3rd valid result came in. You would still get zero, but the others credit would be decieded by the claimed credit of the 3 valid ones.

ID: 165172 · 举报违规帖子
Profile Dorsai
Avatar

发送消息
已加入:7 Sep 04
贴子:474
积分:4,504,838
近期平均积分:0
United Kingdom
消息 165165 - 发表于:9 Sep 2005, 16:55:09 UTC - 回复消息 164598.  

If it takes at least 3 valid results to be returned to grant credit how come 2 people where granted credit on this one and I was given zero. If there was an error on my returned result how come they granted credit? I have never had an error on returned result and it does not say I had an error, I was just given zero credit?

http://setiweb.ssl.berkeley.edu/workunit.php?wuid=25010645

Thanks


It says your result was invalid here.

But like you, i thought it took 3 valid result before credit is granted. If your result is invalid, the other two results should not have had credit granted, until a 3rd valid result came in. You would still get zero, but the others credit would be decieded by the claimed credit of the 3 valid ones.

Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 165165 · 举报违规帖子
Alinator
志愿者测试人员

发送消息
已加入:19 Apr 05
贴子:4178
积分:4,647,982
近期平均积分:0
United States
消息 165153 - 发表于:9 Sep 2005, 16:19:53 UTC

Keep in mind most of us are accustomed to seeing the validation process working with the normal steady state flow of work through the system. Lately, with the performance problems and repair efforts going on, we've had a chance to see what happens during more unusual circumstances. These are documented, but rarely show up when things are running smoothly.

Alinator
ID: 165153 · 举报违规帖子
krgm
志愿者测试人员

发送消息
已加入:2 Jun 05
贴子:30
积分:72,152
近期平均积分:0
Canada
消息 165007 - 发表于:9 Sep 2005, 6:06:44 UTC - 回复消息 164598.  

If it takes at least 3 valid results to be returned to grant credit how come 2 people where granted credit on this one and I was given zero. If there was an error on my returned result how come they granted credit? I have never had an error on returned result and it does not say I had an error, I was just given zero credit?

http://setiweb.ssl.berkeley.edu/workunit.php?wuid=25010645

Thanks


I have had simular happen to me with a E@H WU. I got very little credit, as the WU that failed validation was used in deciding the granted credit.

http://einstein.phys.uwm.edu/workunit.php?wuid=1945766

The top three resutls were the first in. The result passed validation, even though the 3rd result is invalid.
I wonder if there has been a new version of the validator out, and it has a bug in it?


I noticed that both of you are using Windows Millenium Edition. I wonder if that has something to do with it?
ID: 165007 · 举报违规帖子
Sergey Broudkov
Avatar

发送消息
已加入:24 May 04
贴子:221
积分:561,897
近期平均积分:0
Russia
消息 164852 - 发表于:9 Sep 2005, 1:15:21 UTC - 回复消息 164851.  

The quorum of 3 may not have been close enought, so it wanted a 4th,


Yes, indeed. Thanks.
Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
ID: 164852 · 举报违规帖子
Profile Pooh Bear 27
志愿者测试人员
Avatar

发送消息
已加入:14 Jul 03
贴子:3222
积分:4,603,826
近期平均积分:0
United States
消息 164851 - 发表于:9 Sep 2005, 1:12:20 UTC
最近的修改日期:9 Sep 2005, 1:30:19 UTC

The quorum of 3 may not have been close enough, so it wanted a 4th, but the late 4th came in after a 5th was asked for, so everyone got credit when they all came within the limits.


ID: 164851 · 举报违规帖子
Sergey Broudkov
Avatar

发送消息
已加入:24 May 04
贴子:221
积分:561,897
近期平均积分:0
Russia
消息 164835 - 发表于:9 Sep 2005, 0:28:50 UTC - 回复消息 164794.  

It's even more interesting that my WU was generated almost exactly at the moment of deadline for 103229007. Is it the reason?


Yes... when a WU is late, another is issued - there's no way to know that it's going to show up in another couple of hours.


Yes, that's clear, but the quorum was already there at the moment, it's enough to assign credits and mark WU ready for assimilation, why to ask one more result?
Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
ID: 164835 · 举报违规帖子
Profile Bill Michael
志愿者测试人员
Avatar

发送消息
已加入:4 Dec 03
贴子:1122
积分:13,376,822
近期平均积分:44
United States
消息 164794 - 发表于:8 Sep 2005, 22:56:01 UTC - 回复消息 164781.  

It's even more interesting that my WU was generated almost exactly at the moment of deadline for 103229007. Is it the reason?


Yes... when a WU is late, another is issued - there's no way to know that it's going to show up in another couple of hours.
ID: 164794 · 举报违规帖子
Sergey Broudkov
Avatar

发送消息
已加入:24 May 04
贴子:221
积分:561,897
近期平均积分:0
Russia
消息 164781 - 发表于:8 Sep 2005, 22:27:14 UTC

Can anybody explain the history of WU 24481018? For me it looks rather strange. As you can see, 4 WUs have been sent before the Outage, and one result (103229005) came back. 3 others came right after the outage finished. It's OK so far. Yes, the result 103229007 was returned 2 hours later its deadline, but still in time to be validated. But why an additional WU (for computer 1147710, it's mine) has been issued when there are already 3 results ready for quorum? It's even more interesting that my WU was generated almost exactly at the moment of deadline for 103229007. Is it the reason?
Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
ID: 164781 · 举报违规帖子
1 · 2 · 后

留言板 : Number crunching : Question on this granted credit


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.