留言板 :
Number crunching :
Question on this granted credit
留言板合理
| 作者 | 消息 |
|---|---|
Paul D. Buck 发送消息 已加入:19 Jul 00 贴子:3898 积分:1,158,042 近期平均积分:0
|
I think some of the confusion comes from the terms Valid and Successful, if you look at his WU you will see it was completed successfully but it failed validation. Success means only that the client side processing did not abend. The fact that a Result Data File was created with data is the bigest part of this. *IF*, that file contains data that does not match with the contents of the other Result Data Files, or contains some other detectable error (I do not know for sure if they actually scan the files for "correctness" or not ... but it is a possiblility during the project specific validation ... numerical errors, Not a Number values, etc.) will cause the result to be tagged as invalid. I hope this is more understandable and clears up the confusion. Not sure if this is or is not clearly stated in the Wiki ... and I am too tired to look tonight. But, if you look up the processing and this type of detail is not there ... send me a note tomorrow... |
|
Idefix 发送消息 已加入:7 Sep 99 贴子:154 积分:482,193 近期平均积分:0
|
One point on the "create another copy" comment somewhere; there is only one server copy of the WU ever created I didn't mean that the copies are on the server disk (ok, maybe it sounded like that). There are separate results files, but not separate WU files. There are at least four copies of the WU: one on each client which crunches this WU... ;-) This is what I meant with "copies are created and are sent out". You may also call it "clients download the WU"... ;-) Carsten |
|
Sergey Broudkov 发送消息 已加入:24 May 04 贴子:221 积分:561,897 近期平均积分:0
|
Thanks. You maybe right, there is the logic in it, though not so straight-forward as I have imagined. Just one note: One point on the "create another copy" comment somewhere; there is only one server copy of the WU ever created, not one for every person who downloads it. There are separate results files, but not separate WU files. The "extra" one that shows up on the results web page is just another row in the database to hold the new result number Yes, it's clear. "Create another copy" was used only as a figure of speech for simplicity, because it looks so from client side. Surely there is only one file on a server, and the scheduler just gives out a link to it.
Yes, good point, and I missed it. Thanks again. EDIT: mistypes corrected. Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
|
Bill Michael 发送消息 已加入:4 Dec 03 贴子:1122 积分:13,376,822 近期平均积分:44
|
I have not looked at the server code, so I don't know it "for sure" - but I've looked at a LOT of result quorums, and read (for purposes of editing, so not just 'skimmed') most everything in the Wiki on this, and what I describe above matches to the best of my ability what I've seen. I can definitely state that the validator is separate from the program that decides if sending the WU out again is necessary, because resends happen even when the validator is down or backlogged. It may be non-optimal to do such a resend in some cases, but there is a timing issue. There is no way to know WHEN the validator is going to come along and determine that two of the results already in place are matching closely, and therefore another result is unnecessary. It won't _wait_ another 2 weeks for the resent result to come back, unless that result was truly necessary; the results already present will be validated and credit awarded when the validator comes along, if possible, and whenever the 'resend' does make it in, it'll just be given credit. If it WAS necessary however, then it's been sent at the earliest possible moment, and (if the scheduler is truly working at its best) to a participant likely to get it back quickly. Maybe a waste of a few CPU cycles, but the only way to get the credit awarded and the canonical result filed quickly - otherwise it could be days or weeks later. Imagine if they waited on the validator during the last outage and large WFV queue; two-week delay before finding you needed another result, then two-week delay for that result to come back in. Meanwhile, the WU file and 4 or 5 results files are sitting on disk and can't be deleted. One point on the "create another copy" comment somewhere; there is only one server copy of the WU ever created, not one for every person who downloads it. There are separate results files, but not separate WU files. The "extra" one that shows up on the results web page is just another row in the database to hold the new result number while it is sent, returned, validated, deleted... for whatever reason, that record is created before it is absolutely known that it will actually be sent out, and thus is sometimes marked "not needed". Again, a timing issue, they can't 100% depend on every program seeing this particular WU "in order". Sometimes pure efficiency must be sacrificed to make a system more robust. If every process had to occur in order on every WU, they'd never send out unneeded work, but the entire process would be held back by the slowest program, and any failure at all would stop the entire thing. |
|
Sergey Broudkov 发送消息 已加入:24 May 04 贴子:221 积分:561,897 近期平均积分:0
|
Well, there is no problem, just my curiosity. We cats are very curious creatures, you know ;) I'm just trying to understand how things work. A kind of reverse engineering, if you like, as I don't know all internal details :)
Is it so simple? Do you know it for sure, or it's only your belief? I can't belive that the system is so stupid and non-optimal to send another WU and wait for it (maybe another 2 weeks) when it already has all it needs. That's why I'm asking. Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
|
|
Idefix 发送消息 已加入:7 Sep 99 贴子:154 积分:482,193 近期平均积分:0
|
Just to remind others: I'm talking about Sergey's WU 24481018 with now five valid results, and not about Randy's WU with only two valid results. OK, whatever component it was, even let it be validator itself) this component X must see 3 results before making another WU. Maybe that's the way it should be. But it is not the way it is. Have a look at this WU: WU 23804912 This is the history of this WU: - three results were send back before the deadline - one result missed the deadline - a fifth copy of the WU was created immediately after the deadline - the validator had a look at the three results and noticed: "we have already three valid results, we don't need the others" - the validator marked the fifth result as "didn't need" If your "this component X must see 3 results before making another WU." were correct, this unnecessary fifth copy wouldn't have been created. And most likely, in your case nothing different happened. But the copy was sent out before anything could mark this copy as "didn't need". Of course, there is still a chance that the first three results weren't consensous and a fourth result was needed, but you can't tell from the results what exactly happened. Carsten |
Bill Michael 发送消息 已加入:4 Dec 03 贴子:1122 积分:13,376,822 近期平均积分:44
|
But the validator has nothing to do with it. The server component who created the 5th copy (BTW who does it, a transitioner? OK, whatever component it was, even let it be validator itself) this component X must see 3 results before making another WU. Though it looks like it was the validator who evaluated one of the results as "not good enough" and that's why it asked for another one. I still don't understand the question or problem. A result was late. Whatever program is responsible for doing so immediately sent this WU to another participant. This has nothing to do with validation, nothing to do with "seeing 3 results", or anything else. Result late = send out another. Result returns with client error = send out another. Neither thing related to validator that requires 3 results before validation. Once at least three non-error results present - validate. Those that are valid get credit, those that aren't, don't. If no two "closely match", so we still don't know what "valid" is, send out yet another and try again later. ??? |
|
J D K 发送消息 已加入:26 May 04 贴子:1295 积分:311,371 近期平均积分:0
|
I think some of the confusion comes from the terms Valid and Successful, if you look at his WU you will see it was completed successfully but it failed validation. There was a discussion on this in another project, I think it was LHC, if I remember correctly. Now the question is should he get credit for having a successful WU or should he only get credit if the WU meets Validation parameters????? The other question is what status is used to determine a quorum, three successful WUs or three Valid WUs... And the beat goes on Sonny and Cher BOINC Wiki |
|
Sergey Broudkov 发送消息 已加入:24 May 04 贴子:221 积分:561,897 近期平均积分:0
|
but the quorum was already there at the moment No, I'm not sure. But the validator has nothing to do with it. The server component who created the 5th copy (BTW who does it, a transitioner? OK, whatever component it was, even let it be validator itself) this component X must see 3 results before making another WU. Though it looks like it was the validator who evaluated one of the results as "not good enough" and that's why it asked for another one. Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
|
|
Idefix 发送消息 已加入:7 Sep 99 贴子:154 积分:482,193 近期平均积分:0
|
but the quorum was already there at the moment Are you sure that the validator came across the WU before the deadline? If it didn't the system didn't know that there were already three consensous results, created a fifth copy of the WU immediately at he deadline and sent it out to you. Carsten |
|
Idefix 发送消息 已加入:7 Sep 99 贴子:154 积分:482,193 近期平均积分:0
|
Once a third result returns the WU is eligible to be validated, but normally SETI waits for four before doing so. This behavior was "normal" during the last weeks due to the growing "waiting for validation"-queue. But normally the validation process does not wait for the fourth result (no. 4 will get the same credit later as long as it is valid and it is reported in time). Carsten |
|
Bill Barto 发送消息 已加入:28 Jun 99 贴子:864 积分:58,712,313 近期平均积分:91
|
If it takes at least 3 valid results to be returned to grant credit how come 2 people where granted credit on this one and I was given zero. If there was an error on my returned result how come they granted credit? I have never had an error on returned result and it does not say I had an error, I was just given zero credit? |
Dorsai 发送消息 已加入:7 Sep 04 贴子:474 积分:4,504,838 近期平均积分:0
|
If it takes at least 3 valid results to be returned to grant credit how come 2 people where granted credit on this one and I was given zero. If there was an error on my returned result how come they granted credit? I have never had an error on returned result and it does not say I had an error, I was just given zero credit? It says your result was invalid here. But like you, i thought it took 3 valid result before credit is granted. If your result is invalid, the other two results should not have had credit granted, until a 3rd valid result came in. You would still get zero, but the others credit would be decieded by the claimed credit of the 3 valid ones. Foamy is "Lord and Master". (Oh, + some Classic WUs too.) |
|
Alinator 发送消息 已加入:19 Apr 05 贴子:4178 积分:4,647,982 近期平均积分:0
|
Keep in mind most of us are accustomed to seeing the validation process working with the normal steady state flow of work through the system. Lately, with the performance problems and repair efforts going on, we've had a chance to see what happens during more unusual circumstances. These are documented, but rarely show up when things are running smoothly. Alinator |
|
krgm 发送消息 已加入:2 Jun 05 贴子:30 积分:72,152 近期平均积分:0
|
If it takes at least 3 valid results to be returned to grant credit how come 2 people where granted credit on this one and I was given zero. If there was an error on my returned result how come they granted credit? I have never had an error on returned result and it does not say I had an error, I was just given zero credit? I have had simular happen to me with a E@H WU. I got very little credit, as the WU that failed validation was used in deciding the granted credit. I noticed that both of you are using Windows Millenium Edition. I wonder if that has something to do with it?
|
|
Sergey Broudkov 发送消息 已加入:24 May 04 贴子:221 积分:561,897 近期平均积分:0
|
The quorum of 3 may not have been close enought, so it wanted a 4th, Yes, indeed. Thanks. Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
|
Pooh Bear 27 发送消息 已加入:14 Jul 03 贴子:3222 积分:4,603,826 近期平均积分:0
|
The quorum of 3 may not have been close enough, so it wanted a 4th, but the late 4th came in after a 5th was asked for, so everyone got credit when they all came within the limits.
|
|
Sergey Broudkov 发送消息 已加入:24 May 04 贴子:221 积分:561,897 近期平均积分:0
|
It's even more interesting that my WU was generated almost exactly at the moment of deadline for 103229007. Is it the reason? Yes, that's clear, but the quorum was already there at the moment, it's enough to assign credits and mark WU ready for assimilation, why to ask one more result? Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
|
Bill Michael 发送消息 已加入:4 Dec 03 贴子:1122 积分:13,376,822 近期平均积分:44
|
It's even more interesting that my WU was generated almost exactly at the moment of deadline for 103229007. Is it the reason? Yes... when a WU is late, another is issued - there's no way to know that it's going to show up in another couple of hours. |
|
Sergey Broudkov 发送消息 已加入:24 May 04 贴子:221 积分:561,897 近期平均积分:0
|
Can anybody explain the history of WU 24481018? For me it looks rather strange. As you can see, 4 WUs have been sent before the Outage, and one result (103229005) came back. 3 others came right after the outage finished. It's OK so far. Yes, the result 103229007 was returned 2 hours later its deadline, but still in time to be validated. But why an additional WU (for computer 1147710, it's mine) has been issued when there are already 3 results ready for quorum? It's even more interesting that my WU was generated almost exactly at the moment of deadline for 103229007. Is it the reason? Kitty@SETI team (Russia). Our cats also want to know if there is ETI out there
|
©2020 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.