Message boards :
Number crunching :
Completed, can't validate
Message board moderation
Author | Message |
---|---|
Ministry of Disinformation Send message Joined: 19 Sep 06 Posts: 8 Credit: 17,475,791 RAC: 0 |
Was looking through the results I had returned recently and noticed: http://setiathome.berkeley.edu/workunit.php?wuid=420803531 So because of client errors from other people, those who do end up returning valid results are going to effectively have their credits denied in future? "Other people couldn't do it, so you can't have your credit even though you could" is hardly a way to keep your crunchers happy. Just wait until you get people who aren't running optimised apps who take far longer to crunch each wu noticing this, fun times ahead me thinks. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=420803531 Made your link clickable Mark. Sorry to see that happen. PROUD MEMBER OF Team Starfire World BOINC |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Unfortunately this happens from time to time. I have had several of these. There is nothing you can do about it so no need to expend energy worrying about it. Just keep on crunchin. Boinc....Boinc....Boinc....Boinc.... |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=420803531 Hmm, that WU reached the "too many errors" state 11 Mar 2009 23:38:15 UTC. Host 3955382 contacted the servers about 14 times between then and when it actually started crunching the WU some time on the 15th. The question is why a server abort didn't take place. Has anyone seen one lately? If they're enabled, maybe BOINC doesn't have a linkage from that particular condition to the abort and it needs to be fixed. Joe |
_hiVe* Send message Joined: 7 Aug 04 Posts: 9 Credit: 4,980,874 RAC: 0 |
Was looking through the results I had returned recently and noticed: I concur. Also happened to me quite a few times. Very unsettling >.> |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
The question is why a server abort didn't take place. Has anyone seen one lately? If they're enabled, maybe BOINC doesn't have a linkage from that particular condition to the abort and it needs to be fixed. Server-aborts puts too much load on the database, so has been turned off for many months. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
HAL Send message Joined: 28 Mar 03 Posts: 704 Credit: 870,617 RAC: 0 |
Was looking through the results I had returned recently and noticed: Oh ye of little faith the admins have blessed us with a new category different from PENDING - called VALIDATE INCONCLUSIVE - (as an explanation). Just got 2 of them and SETI was prioritized to process them and BINGO - IT WORKS. KUDOS to the admins - Classic WU= 7,237 Classic Hours= 42,079 |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
The question is why a server abort didn't take place. Has anyone seen one lately? If they're enabled, maybe BOINC doesn't have a linkage from that particular condition to the abort and it needs to be fixed. Double whammy, then. If it were on it wouldn't have helped since that kind of fatal error isn't considered by the send_result_abort option. I've emailed David and the boinc_dev list suggesting inclusion. Joe |
Ministry of Disinformation Send message Joined: 19 Sep 06 Posts: 8 Credit: 17,475,791 RAC: 0 |
Was looking through the results I had returned recently and noticed: That seems to be different though, or re you saying that they'll reverse the decision about the couldn't validate and send it out yet again? There is a fairly large difference between "Completed, can't validate" and "Completed, validation inconclusive." I'd like to think that multiple hours of cpu time haven't been wasted, but that's not how it's looking. |
Kinguni Send message Joined: 15 Feb 00 Posts: 239 Credit: 9,043,007 RAC: 0 |
Keep an eye on it and see if goes out again. I thought they would go out up to 10 times if needed? With these huge ap wu's it would make sense, especially after the initial application distribution problems for average non-optimized users. It's one thing to lose the crunch time of a MB wu, but not for someone to lose potentially days of crunching time in this way. If one does the work one should get the credit. Join Team Starfire BOINC Chat |
Platinum* Send message Joined: 28 Dec 08 Posts: 1 Credit: 4,013 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=417873086 Here is one more: Completed, can't validate. So, I have stop crunch SETI because of that. |
RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0 |
here is another 1178553096. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
RottenMutt, Yours is a bit different from what they are talking about here. One of you doing this WU found 30 spikes and the other one didn't so it has been sent out to a third cruncher to see which is right. When the third cruncher completes this and sends it in two or all of you will get credit. The WUs they are talking about in this thread were sent out one too many times and though the sixth man completed it successfully he can't get the credit because too many people had already errored out on it. PROUD MEMBER OF Team Starfire World BOINC |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
It looks to be shaping up to me having one of those too. My wingman appears to have gone astray on this one http://setiathome.berkeley.edu/workunit.php?wuid=417846562. He has until the 24th but I'm not holding my breath. PROUD MEMBER OF Team Starfire World BOINC |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
If that original wingman (_0) times out, the WU will still be alive, but it will all hinge on the last wingman (_6). The error limit is 5, which is inclusive..so up to, and including 5 errors. More than 5 will effectively kill that WU. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
If that original wingman (_0) times out, the WU will still be alive, but it will all hinge on the last wingman (_6). The error limit is 5, which is inclusive..so up to, and including 5 errors. More than 5 will effectively kill that WU. Ahh, good then there is still hope. I might get my credits in a month or so. :) Guess I miscounted, I thought it was already up to the limit. PROUD MEMBER OF Team Starfire World BOINC |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
If that original wingman (_0) times out, the WU will still be alive, but it will all hinge on the last wingman (_6). The error limit is 5, which is inclusive..so up to, and including 5 errors. More than 5 will effectively kill that WU. Nope. Only 4 errors presently. Missed deadline I think counts as an error, which would be 5, meaning one more than that and it's dead. So yes, still hope on that one, but it's borderline. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Ministry of Disinformation Send message Joined: 19 Sep 06 Posts: 8 Credit: 17,475,791 RAC: 0 |
Well, the workunit looks to have been purged from the database now, the link now resolves to a page that contains a message saying "can't find workunit." The situation sucks badly in my opinion, I just take some small comfort from the fact it was only 10 hours or so CPU time, although that's not going to be of comfort to other people if they run into the same sort of problem. I'd hate to think how the people with only 1 older computer for instance and it takes 40 hours or more for them to crunch a single AP wu will feel. I don't think that this policy has been thought through very well to be honest. While I appreciate we are running seti@home by choice, it is nice to think that when we do, we will receive credit for the work we actually do. Not be told "tough, other people screwed you up by getting errors." I would like to see a response from one of the actual seti team about this, with a positive message that they are at least aware of the issues and they are going to try to fix it. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
It's not a policy issue, per se. The problem stems from the network issue that happened a few weeks ago and caused a TON of failed downloads. Under normal circumstances, even with some of the bugs in the app, one would expect there to never be more than 4 tasks for one WU. Compute errors happen for apparently no reason at all, but it's not predictable, so we just kind of hope for the best. That network issue that caused a lot of failed downloads is a problem though. A task being aborted by the client because the application does not exist due to a failed download is a logistical problem, in my opinion. I think the client should not download any tasks until the application is successfully downloaded. That would have mitigated all of those issues with that network problem. So, once we get all of those WUs out of the way that were involved in that network issue, things will be smooth sailing again, but we just have to get past the problem WUs. Some people will be burned by it, and though unfortunate, there's not really much that can be done about it. True there are going to be voids and holes in the science database since results were effectively just thrown out, but I think we may end up keeping the "tapes" that were affected and split them again at another date..maybe we have already. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
It looks to be shaping up to me having one of those too. My wingman appears to have gone astray on this one http://setiathome.berkeley.edu/workunit.php?wuid=417846562. He has until the 24th but I'm not holding my breath. The wingmate's laptop may have delayed starting those 2 AP_v5 units it has, and would probably take a week or so to do one. Even if it misses deadline or otherwise errors, the WU will only have 5 errors so another wingmate would be assigned. Holding your breath until April 23rd would indeed be uncomfortable. Joe |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.