Completed, can't validate


log in

Advanced search

Message boards : Number crunching : Completed, can't validate

1 · 2 · 3 · Next
Author Message
Profile Ministry of Disinformation
Send message
Joined: 19 Sep 06
Posts: 8
Credit: 16,863,028
RAC: 0
United Kingdom
Message 875793 - Posted: 15 Mar 2009, 14:36:54 UTC

Was looking through the results I had returned recently and noticed:

http://setiathome.berkeley.edu/workunit.php?wuid=420803531

So because of client errors from other people, those who do end up returning valid results are going to effectively have their credits denied in future?

"Other people couldn't do it, so you can't have your credit even though you could" is hardly a way to keep your crunchers happy.

Just wait until you get people who aren't running optimised apps who take far longer to crunch each wu noticing this, fun times ahead me thinks.

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,212,255
RAC: 4,912
United States
Message 875796 - Posted: 15 Mar 2009, 14:39:45 UTC - in response to Message 875793.

http://setiathome.berkeley.edu/workunit.php?wuid=420803531



Made your link clickable Mark. Sorry to see that happen.
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Geek@Play
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 279
United States
Message 875832 - Posted: 15 Mar 2009, 16:48:53 UTC

Unfortunately this happens from time to time. I have had several of these. There is nothing you can do about it so no need to expend energy worrying about it. Just keep on crunchin.
____________
Boinc....Boinc....Boinc....Boinc....

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4335
Credit: 1,113,795
RAC: 779
United States
Message 875871 - Posted: 15 Mar 2009, 18:27:50 UTC - in response to Message 875796.

http://setiathome.berkeley.edu/workunit.php?wuid=420803531


Hmm, that WU reached the "too many errors" state 11 Mar 2009 23:38:15 UTC. Host 3955382 contacted the servers about 14 times between then and when it actually started crunching the WU some time on the 15th.

The question is why a server abort didn't take place. Has anyone seen one lately? If they're enabled, maybe BOINC doesn't have a linkage from that particular condition to the abort and it needs to be fixed.
Joe

_hiVe*
Send message
Joined: 7 Aug 04
Posts: 9
Credit: 4,267,226
RAC: 48
Slovakia
Message 875872 - Posted: 15 Mar 2009, 18:29:20 UTC - in response to Message 875793.

Was looking through the results I had returned recently and noticed:

http://setiathome.berkeley.edu/workunit.php?wuid=420803531

So because of client errors from other people, those who do end up returning valid results are going to effectively have their credits denied in future?

"Other people couldn't do it, so you can't have your credit even though you could" is hardly a way to keep your crunchers happy.

Just wait until you get people who aren't running optimised apps who take far longer to crunch each wu noticing this, fun times ahead me thinks.


I concur. Also happened to me quite a few times.
Very unsettling >.>

____________

Ingleside
Volunteer developer
Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 4,339,598
RAC: 293
Norway
Message 875911 - Posted: 15 Mar 2009, 20:28:23 UTC - in response to Message 875871.

The question is why a server abort didn't take place. Has anyone seen one lately? If they're enabled, maybe BOINC doesn't have a linkage from that particular condition to the abort and it needs to be fixed.

Server-aborts puts too much load on the database, so has been turned off for many months.

____________
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

HAL
Send message
Joined: 28 Mar 03
Posts: 704
Credit: 870,617
RAC: 0
United States
Message 875919 - Posted: 15 Mar 2009, 20:47:26 UTC - in response to Message 875793.

Was looking through the results I had returned recently and noticed:

http://setiathome.berkeley.edu/workunit.php?wuid=420803531

So because of client errors from other people, those who do end up returning valid results are going to effectively have their credits denied in future?

"Other people couldn't do it, so you can't have your credit even though you could" is hardly a way to keep your crunchers happy.

Just wait until you get people who aren't running optimised apps who take far longer to crunch each wu noticing this, fun times ahead me thinks.

Oh ye of little faith
the admins have blessed us with a new category different from PENDING - called VALIDATE INCONCLUSIVE - (as an explanation). Just got 2 of them and SETI was prioritized to process them and BINGO - IT WORKS. KUDOS to the admins -
____________

Classic WU= 7,237 Classic Hours= 42,079

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4335
Credit: 1,113,795
RAC: 779
United States
Message 875988 - Posted: 16 Mar 2009, 0:13:21 UTC - in response to Message 875911.

The question is why a server abort didn't take place. Has anyone seen one lately? If they're enabled, maybe BOINC doesn't have a linkage from that particular condition to the abort and it needs to be fixed.

Server-aborts puts too much load on the database, so has been turned off for many months.

Double whammy, then. If it were on it wouldn't have helped since that kind of fatal error isn't considered by the send_result_abort option. I've emailed David and the boinc_dev list suggesting inclusion.
Joe

Profile Ministry of Disinformation
Send message
Joined: 19 Sep 06
Posts: 8
Credit: 16,863,028
RAC: 0
United Kingdom
Message 876039 - Posted: 16 Mar 2009, 2:19:49 UTC - in response to Message 875919.

Was looking through the results I had returned recently and noticed:

http://setiathome.berkeley.edu/workunit.php?wuid=420803531

So because of client errors from other people, those who do end up returning valid results are going to effectively have their credits denied in future?

"Other people couldn't do it, so you can't have your credit even though you could" is hardly a way to keep your crunchers happy.

Just wait until you get people who aren't running optimised apps who take far longer to crunch each wu noticing this, fun times ahead me thinks.

Oh ye of little faith
the admins have blessed us with a new category different from PENDING - called VALIDATE INCONCLUSIVE - (as an explanation). Just got 2 of them and SETI was prioritized to process them and BINGO - IT WORKS. KUDOS to the admins -


That seems to be different though, or re you saying that they'll reverse the decision about the couldn't validate and send it out yet again?

There is a fairly large difference between "Completed, can't validate" and "Completed, validation inconclusive."

I'd like to think that multiple hours of cpu time haven't been wasted, but that's not how it's looking.

Profile Kinguni
Volunteer tester
Avatar
Send message
Joined: 15 Feb 00
Posts: 239
Credit: 9,043,007
RAC: 0
Canada
Message 876043 - Posted: 16 Mar 2009, 2:45:12 UTC - in response to Message 876039.

Keep an eye on it and see if goes out again. I thought they would go out up to 10 times if needed? With these huge ap wu's it would make sense, especially after the initial application distribution problems for average non-optimized users. It's one thing to lose the crunch time of a MB wu, but not for someone to lose potentially days of crunching time in this way. If one does the work one should get the credit.
____________
Join Team Starfire
BOINC Chat

Profile Platinum*
Send message
Joined: 28 Dec 08
Posts: 1
Credit: 4,013
RAC: 0
Finland
Message 876078 - Posted: 16 Mar 2009, 5:03:40 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=417873086
Here is one more: Completed, can't validate. So, I have stop crunch SETI because of that.

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 995
Credit: 208,412,651
RAC: 23,881
United States
Message 876138 - Posted: 16 Mar 2009, 13:25:28 UTC - in response to Message 876078.

here is another 1178553096.
____________

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,212,255
RAC: 4,912
United States
Message 876141 - Posted: 16 Mar 2009, 13:36:15 UTC - in response to Message 876138.

RottenMutt,

Yours is a bit different from what they are talking about here. One of you doing this WU found 30 spikes and the other one didn't so it has been sent out to a third cruncher to see which is right. When the third cruncher completes this and sends it in two or all of you will get credit.

The WUs they are talking about in this thread were sent out one too many times and though the sixth man completed it successfully he can't get the credit because too many people had already errored out on it.
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,212,255
RAC: 4,912
United States
Message 876155 - Posted: 16 Mar 2009, 14:59:35 UTC

It looks to be shaping up to me having one of those too. My wingman appears to have gone astray on this one http://setiathome.berkeley.edu/workunit.php?wuid=417846562. He has until the 24th but I'm not holding my breath.
____________


PROUD MEMBER OF Team Starfire World BOINC

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2327
Credit: 8,868,786
RAC: 781
United States
Message 876162 - Posted: 16 Mar 2009, 15:13:38 UTC

If that original wingman (_0) times out, the WU will still be alive, but it will all hinge on the last wingman (_6). The error limit is 5, which is inclusive..so up to, and including 5 errors. More than 5 will effectively kill that WU.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,212,255
RAC: 4,912
United States
Message 876166 - Posted: 16 Mar 2009, 15:34:28 UTC - in response to Message 876162.

If that original wingman (_0) times out, the WU will still be alive, but it will all hinge on the last wingman (_6). The error limit is 5, which is inclusive..so up to, and including 5 errors. More than 5 will effectively kill that WU.



Ahh, good then there is still hope. I might get my credits in a month or so. :)

Guess I miscounted, I thought it was already up to the limit.

____________


PROUD MEMBER OF Team Starfire World BOINC

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2327
Credit: 8,868,786
RAC: 781
United States
Message 876172 - Posted: 16 Mar 2009, 15:44:46 UTC - in response to Message 876166.

If that original wingman (_0) times out, the WU will still be alive, but it will all hinge on the last wingman (_6). The error limit is 5, which is inclusive..so up to, and including 5 errors. More than 5 will effectively kill that WU.



Ahh, good then there is still hope. I might get my credits in a month or so. :)

Guess I miscounted, I thought it was already up to the limit.

Nope. Only 4 errors presently. Missed deadline I think counts as an error, which would be 5, meaning one more than that and it's dead. So yes, still hope on that one, but it's borderline.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Ministry of Disinformation
Send message
Joined: 19 Sep 06
Posts: 8
Credit: 16,863,028
RAC: 0
United Kingdom
Message 876176 - Posted: 16 Mar 2009, 16:10:44 UTC - in response to Message 875793.

Well, the workunit looks to have been purged from the database now, the link now resolves to a page that contains a message saying "can't find workunit."

The situation sucks badly in my opinion, I just take some small comfort from the fact it was only 10 hours or so CPU time, although that's not going to be of comfort to other people if they run into the same sort of problem.

I'd hate to think how the people with only 1 older computer for instance and it takes 40 hours or more for them to crunch a single AP wu will feel.

I don't think that this policy has been thought through very well to be honest.

While I appreciate we are running seti@home by choice, it is nice to think that when we do, we will receive credit for the work we actually do. Not be told "tough, other people screwed you up by getting errors."

I would like to see a response from one of the actual seti team about this, with a positive message that they are at least aware of the issues and they are going to try to fix it.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2327
Credit: 8,868,786
RAC: 781
United States
Message 876193 - Posted: 16 Mar 2009, 17:10:18 UTC

It's not a policy issue, per se. The problem stems from the network issue that happened a few weeks ago and caused a TON of failed downloads. Under normal circumstances, even with some of the bugs in the app, one would expect there to never be more than 4 tasks for one WU. Compute errors happen for apparently no reason at all, but it's not predictable, so we just kind of hope for the best.

That network issue that caused a lot of failed downloads is a problem though. A task being aborted by the client because the application does not exist due to a failed download is a logistical problem, in my opinion. I think the client should not download any tasks until the application is successfully downloaded. That would have mitigated all of those issues with that network problem.

So, once we get all of those WUs out of the way that were involved in that network issue, things will be smooth sailing again, but we just have to get past the problem WUs. Some people will be burned by it, and though unfortunate, there's not really much that can be done about it. True there are going to be voids and holes in the science database since results were effectively just thrown out, but I think we may end up keeping the "tapes" that were affected and split them again at another date..maybe we have already.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4335
Credit: 1,113,795
RAC: 779
United States
Message 876226 - Posted: 16 Mar 2009, 19:23:31 UTC - in response to Message 876155.

It looks to be shaping up to me having one of those too. My wingman appears to have gone astray on this one http://setiathome.berkeley.edu/workunit.php?wuid=417846562. He has until the 24th but I'm not holding my breath.

The wingmate's laptop may have delayed starting those 2 AP_v5 units it has, and would probably take a week or so to do one. Even if it misses deadline or otherwise errors, the WU will only have 5 errors so another wingmate would be assigned. Holding your breath until April 23rd would indeed be uncomfortable.
Joe

1 · 2 · 3 · Next

Message boards : Number crunching : Completed, can't validate

Copyright © 2014 University of California