Completed Too Late To Validate - the Hostage AP 5.05 Work Units |
![]() |
| log in |
Message boards : Number crunching : Completed Too Late To Validate - the Hostage AP 5.05 Work Units
1 · 2 · Next
| Author | Message |
|---|---|
|
Apparently the hostage situation of AP work units has been resolved in favor of giving credit to those of us that completed those tasks in a timely manner but our results are flagged as invalid, Completed Too Late To Validate. So much for claims that the science is more important than the credit since the results reported in a timely manner are treated as errors and invalid while the tasks returned days or weeks after the deadline are validated. Backwards to me – the errors were on the part of the delinquent computers or users, those results should be the ones flagged to be invalid and in error. | |
| ID: 1228824 · | |
|
I think, you are getting it wrong (or Im not getting you ;D ). | |
| ID: 1228841 · | |
|
In the case of the AP work units being held hostage this is a problem of flagging the wrong user as submitting results too late. | |
| ID: 1228851 · | |
In the case of the AP work units being held hostage this is a problem of flagging the wrong user as submitting results too late. Thanks for the clear explanation, Les. Can you hang on to that idea, please, and keep it (plus some sample data) for use in evidence. With this sudden breaking of the log-jam, and the rapid database purge settings at SETI, there's a danger that lessons for the future may be lost: we haven't even tried to work out whether the workunits have been held hostage by a SETI mis-configuration, or some flaw in the underlying BOINC code. I had hoped that when we got down to zero, the staff would have a chance to turn a keen analytic eye on the problem, but instead they seem to have swept it under the carpet - until the time comes for APv6 to be replaced. | |
| ID: 1228860 · | |
|
I agree that the "Completed, too late to validate" status looks like a black mark, but if BOINC managed to delete the canonical result before the last wingmate had uploaded and reported it's the simple truth. I think the project staff is doing the civilized thing by granting credit. Joe | |
| ID: 1228864 · | |
I agree that the "Completed, too late to validate" status looks like a black mark, but if BOINC managed to delete the canonical result before the last wingmate had uploaded and reported it's the simple truth. I think the project staff is doing the civilized thing by granting credit. I think this is a case where allowing the numbers to count down on the Server Status Page at the end of the run helped to clarify the cause. Having something in the low 40s of tasks 'in the field', with 12 thousand or more tasks awaiting validation, represents an unfeasibly large number of tasks per WU. I hope there's enough data left to enable some preventative code to be devised to prevent it happening again. | |
| ID: 1228870 · | |
Thank You Richard. I performed a screen capture of the Work Unit for tasks directly affecting me/my computer but these task/work units will be purged by this time tomorrow. As I have complained about in related posts the additional problem is that since the task was validated without considering the work unit that was submitted in a timely manner then the assignment of canonical result was flawed because valid work was ignored. While the differences may be small between the actual analyses, it precludes proper crediting of results because the best analysis was ignored. As Josef pointed out, there was a decision made as to whom to punish for this problem. I prefer that the offender be punished, not the person or computer that played by the rules and reported results in a timely manner. ____________ | |
| ID: 1228880 · | |
|
I'll just keep my eye on these two v6 AP's to see if the same v505 problem occurs again. ;) | |
| ID: 1228889 · | |
Thanks for the clear explanation, Les. Joe will be the best person to comment on this, but I don't think the validator has any concept of 'best' result, over and above the requirement that the 'canonical result' be strongly similar to at least one other result. Since, for the vast majority of WUs, there are only two results, and each is strongly similar to the other, there's an element of luck here. The only possible remaining element of 'punishment' here would be if, by some remote, remote chance, one of the affected WUs might contain the 'discovery' signal for some new pulsar or other astronomical phenomenon. If the problem has only been considered at the level of credit (Eric's Benevolence, as I once called it), isn't there a danger that somebody might be left off the list of co-authors when the scientific discovery paper comes to be written? | |
| ID: 1228900 · | |
|
I had something similar just occur on this. Nearly 2 1/2 months after I submitted it finally awarded credit but said too late to validate yet I had sent it back in about a week. | |
| ID: 1228932 · | |
I'll just keep my eye on these two v6 AP's to see if the same v505 problem occurs again. ;) It looks like task 953687194 is in that very state. I would have thought that returning a result after the deadline would mark the task as "to late to validate". Also the validator running while there are tasks "in progress" doesn't make much sense. I imagine there is no logic to check for a condition that "shouldn't exist". However it does. So hopefully the BOINC server dev guys have one or both defects in their queue for "will be fixed". ____________ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today! | |
| ID: 1229097 · | |
I would have thought that returning a result after the deadline would mark the task as "to late to validate". Also the validator running while there are tasks "in progress" doesn't make much sense. I imagine there is no logic to check for a condition that "shouldn't exist". However it does. So hopefully the BOINC server dev guys have one or both defects in their queue for "will be fixed". Well, it works for MB, if a result is returned after the deadline, the validator waits for the resend and validates all 3 (or more) at once, hence we don't have such issues with MB. Optimal would be to pre-validate the both results and if they are OK cancel the resend on the next sheduler request of the host, which has it. BOINC can do that. Otherwise, if it gets reported on the next request, than revalidate all 3 together. ____________ . | |
| ID: 1229099 · | |
I would have thought that returning a result after the deadline would mark the task as "to late to validate". Also the validator running while there are tasks "in progress" doesn't make much sense. I imagine there is no logic to check for a condition that "shouldn't exist". However it does. So hopefully the BOINC server dev guys have one or both defects in their queue for "will be fixed". It seems something would be better than the logic black hole these seem to fall into. ____________ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today! | |
| ID: 1229103 · | |
I would have thought that returning a result after the deadline would mark the task as "to late to validate". Also the validator running while there are tasks "in progress" doesn't make much sense. I imagine there is no logic to check for a condition that "shouldn't exist". However it does. So hopefully the BOINC server dev guys have one or both defects in their queue for "will be fixed". I dont like the idea of BOINC cancelling the resend especially with APs. On my slower PC, an AP takes 20 hours to process. If I had been processing it for 19.5 hours when the WU was cancelled I would be most displeased about the waste of my bandwidth, electricity and PC resources, to put it mildly! ____________ | |
| ID: 1229107 · | |
I would have thought that returning a result after the deadline would mark the task as "to late to validate". Also the validator running while there are tasks "in progress" doesn't make much sense. I imagine there is no logic to check for a condition that "shouldn't exist". However it does. So hopefully the BOINC server dev guys have one or both defects in their queue for "will be fixed". Cancelling resends is an available option within the BOINC server code, but it only cancels tasks NOT started, so you would be safe and get your credit if the task is started. | |
| ID: 1229112 · | |
I would have thought that returning a result after the deadline would mark the task as "to late to validate". Also the validator running while there are tasks "in progress" doesn't make much sense. I imagine there is no logic to check for a condition that "shouldn't exist". However it does. So hopefully the BOINC server dev guys have one or both defects in their queue for "will be fixed". BOINC has a couple of perfectly good bits of logic already, which normally cope with this without risk to the user. Say two tasks are created and sent out - the normal WU here at SETI. Say one is crunched and returned on time, but the other is late and not returned by deadline. As soon as the second task passes deadline, a replacement (third) task is created and put on the back of the queue for distribution - but it probably sits there for several hours. If the delayed task comes back in at this point, and validates, the third task can be cancelled without risk to anyone, before it has even been sent out. A little later, and the third task has been allocated and downloaded, but often enough it will remain unstarted in the new wingmate's cache for several hours or days. Even at this stage, BOINC can and will cancel the task if the belated original copy is returned and validated. BOINC will only do that if the new wingmate contacts the servers, and the servers therefore know that work hasn't started crunching - no CPU time is wasted, only a bit of download bandwidth. I think all the 'hostage' cases we're considering must be the final case: the original late copy is returned after the replacement has been created, allocated, downloaded, and crunching has already started. If that happens, BOINC lets work on the replacement continue until it has finished, which as we all know can be a long time for AP tasks. I really don't know why things go wrong for these few, but not insignificant number of, AP cases. It might be that BOINC, in general, doesn't cope well with the long delays we're seeing here: or it might be the SETI's AP validator (specifically) puts the wrong marker on the files involved when the original two results are - belatedly - validated. | |
| ID: 1229126 · | |
|
For the record or as a reminder - this has actually been going on for many years. For a long time the programming assigned zero credit and invalidated work units that fell into this situation as opposed to yesterday's invalidate the work done but award credit. It probably will be difficult to impossible to reconstruct the ignored results from over the years should subsequent analysis confirm some discovery such as the afore mentioned new pulsar or other astronomical phenomenon. | |
| ID: 1229131 · | |
|
I dont get it... | |
| ID: 1229141 · | |
I dont get it... You're right: nothing is lost, scientifically. Tasks have validated, and from the validated tasks, a canonical result for the WU as a whole has been chosen. That's as good as it gets. The rest of the concerns are for users who have lost something, or fear they might be at risk of losing something, or are worried they might lose something in the future. Things like: * reputation (tasks marked 'invalid') * credits * electricity (for a part-crunched task) * scientific kudos (a name-check in a discovery paper) | |
| ID: 1229149 · | |
|
I just found 3 5.05 units that should have cleared last year, but finally came up invalid. | |
| ID: 1229170 · | |
Message boards : Number crunching : Completed Too Late To Validate - the Hostage AP 5.05 Work Units
| Copyright © 2013 University of California |