Completed, can't validate

Message boards : Number crunching : Completed, can't validate

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Ministry of Disinformation

Send message
Joined: 19 Sep 06
Posts: 8
Credit: 16,873,992
RAC: 18
United Kingdom
Message 875793 - Posted: 15 Mar 2009, 14:36:54 UTC

Was looking through the results I had returned recently and noticed:

http://setiathome.berkeley.edu/workunit.php?wuid=420803531

So because of client errors from other people, those who do end up returning valid results are going to effectively have their credits denied in future?

"Other people couldn't do it, so you can't have your credit even though you could" is hardly a way to keep your crunchers happy.

Just wait until you get people who aren't running optimised apps who take far longer to crunch each wu noticing this, fun times ahead me thinks.

ID: 875793 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 875796 - Posted: 15 Mar 2009, 14:39:45 UTC - in response to Message 875793.

http://setiathome.berkeley.edu/workunit.php?wuid=420803531



Made your link clickable Mark. Sorry to see that happen.




PROUD MEMBER OF Team Starfire World BOINC

ID: 875796 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 875832 - Posted: 15 Mar 2009, 16:48:53 UTC

Unfortunately this happens from time to time. I have had several of these. There is nothing you can do about it so no need to expend energy worrying about it. Just keep on crunchin.


Boinc....Boinc....Boinc....Boinc....

ID: 875832 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 875871 - Posted: 15 Mar 2009, 18:27:50 UTC - in response to Message 875796.

http://setiathome.berkeley.edu/workunit.php?wuid=420803531


Hmm, that WU reached the "too many errors" state 11 Mar 2009 23:38:15 UTC. Host 3955382 contacted the servers about 14 times between then and when it actually started crunching the WU some time on the 15th.

The question is why a server abort didn't take place. Has anyone seen one lately? If they're enabled, maybe BOINC doesn't have a linkage from that particular condition to the abort and it needs to be fixed.
                                                                Joe

ID: 875871 · Report as offensive
_hiVe*

Send message
Joined: 7 Aug 04
Posts: 9
Credit: 4,274,980
RAC: 0
Slovakia
Message 875872 - Posted: 15 Mar 2009, 18:29:20 UTC - in response to Message 875793.

Was looking through the results I had returned recently and noticed:

http://setiathome.berkeley.edu/workunit.php?wuid=420803531

So because of client errors from other people, those who do end up returning valid results are going to effectively have their credits denied in future?

"Other people couldn't do it, so you can't have your credit even though you could" is hardly a way to keep your crunchers happy.

Just wait until you get people who aren't running optimised apps who take far longer to crunch each wu noticing this, fun times ahead me thinks.


I concur. Also happened to me quite a few times.
Very unsettling >.>

ID: 875872 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 6,172,180
RAC: 550
Norway
Message 875911 - Posted: 15 Mar 2009, 20:28:23 UTC - in response to Message 875871.

The question is why a server abort didn't take place. Has anyone seen one lately? If they're enabled, maybe BOINC doesn't have a linkage from that particular condition to the abort and it needs to be fixed.

Server-aborts puts too much load on the database, so has been turned off for many months.

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

ID: 875911 · Report as offensive
HAL

Send message
Joined: 28 Mar 03
Posts: 704
Credit: 870,617
RAC: 0
United States
Message 875919 - Posted: 15 Mar 2009, 20:47:26 UTC - in response to Message 875793.

Was looking through the results I had returned recently and noticed:

http://setiathome.berkeley.edu/workunit.php?wuid=420803531

So because of client errors from other people, those who do end up returning valid results are going to effectively have their credits denied in future?

"Other people couldn't do it, so you can't have your credit even though you could" is hardly a way to keep your crunchers happy.

Just wait until you get people who aren't running optimised apps who take far longer to crunch each wu noticing this, fun times ahead me thinks.

Oh ye of little faith
the admins have blessed us with a new category different from PENDING - called VALIDATE INCONCLUSIVE - (as an explanation). Just got 2 of them and SETI was prioritized to process them and BINGO - IT WORKS. KUDOS to the admins -

Classic WU= 7,237 Classic Hours= 42,079

ID: 875919 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 875988 - Posted: 16 Mar 2009, 0:13:21 UTC - in response to Message 875911.

The question is why a server abort didn't take place. Has anyone seen one lately? If they're enabled, maybe BOINC doesn't have a linkage from that particular condition to the abort and it needs to be fixed.

Server-aborts puts too much load on the database, so has been turned off for many months.

Double whammy, then. If it were on it wouldn't have helped since that kind of fatal error isn't considered by the send_result_abort option. I've emailed David and the boinc_dev list suggesting inclusion.
                                                                Joe

ID: 875988 · Report as offensive
Profile Ministry of Disinformation

Send message
Joined: 19 Sep 06
Posts: 8
Credit: 16,873,992
RAC: 18
United Kingdom
Message 876039 - Posted: 16 Mar 2009, 2:19:49 UTC - in response to Message 875919.

Was looking through the results I had returned recently and noticed:

http://setiathome.berkeley.edu/workunit.php?wuid=420803531

So because of client errors from other people, those who do end up returning valid results are going to effectively have their credits denied in future?

"Other people couldn't do it, so you can't have your credit even though you could" is hardly a way to keep your crunchers happy.

Just wait until you get people who aren't running optimised apps who take far longer to crunch each wu noticing this, fun times ahead me thinks.

Oh ye of little faith
the admins have blessed us with a new category different from PENDING - called VALIDATE INCONCLUSIVE - (as an explanation). Just got 2 of them and SETI was prioritized to process them and BINGO - IT WORKS. KUDOS to the admins -


That seems to be different though, or re you saying that they'll reverse the decision about the couldn't validate and send it out yet again?

There is a fairly large difference between "Completed, can't validate" and "Completed, validation inconclusive."

I'd like to think that multiple hours of cpu time haven't been wasted, but that's not how it's looking.

ID: 876039 · Report as offensive
Profile Kinguni
Volunteer tester
Avatar

Send message
Joined: 15 Feb 00
Posts: 239
Credit: 9,043,007
RAC: 0
Canada
Message 876043 - Posted: 16 Mar 2009, 2:45:12 UTC - in response to Message 876039.

Keep an eye on it and see if goes out again. I thought they would go out up to 10 times if needed? With these huge ap wu's it would make sense, especially after the initial application distribution problems for average non-optimized users. It's one thing to lose the crunch time of a MB wu, but not for someone to lose potentially days of crunching time in this way. If one does the work one should get the credit.


Join Team Starfire
BOINC Chat

ID: 876043 · Report as offensive
Profile Platinum*

Send message
Joined: 28 Dec 08
Posts: 1
Credit: 4,013
RAC: 0
Finland
Message 876078 - Posted: 16 Mar 2009, 5:03:40 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=417873086
Here is one more: Completed, can't validate. So, I have stop crunch SETI because of that.

ID: 876078 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,274,184
RAC: 0
United States
Message 876138 - Posted: 16 Mar 2009, 13:25:28 UTC - in response to Message 876078.

here is another 1178553096.


ID: 876138 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 876141 - Posted: 16 Mar 2009, 13:36:15 UTC - in response to Message 876138.

RottenMutt,

Yours is a bit different from what they are talking about here. One of you doing this WU found 30 spikes and the other one didn't so it has been sent out to a third cruncher to see which is right. When the third cruncher completes this and sends it in two or all of you will get credit.

The WUs they are talking about in this thread were sent out one too many times and though the sixth man completed it successfully he can't get the credit because too many people had already errored out on it.




PROUD MEMBER OF Team Starfire World BOINC

ID: 876141 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 876155 - Posted: 16 Mar 2009, 14:59:35 UTC

It looks to be shaping up to me having one of those too. My wingman appears to have gone astray on this one http://setiathome.berkeley.edu/workunit.php?wuid=417846562. He has until the 24th but I'm not holding my breath.




PROUD MEMBER OF Team Starfire World BOINC

ID: 876155 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,621,407
RAC: 321
United States
Message 876162 - Posted: 16 Mar 2009, 15:13:38 UTC

If that original wingman (_0) times out, the WU will still be alive, but it will all hinge on the last wingman (_6). The error limit is 5, which is inclusive..so up to, and including 5 errors. More than 5 will effectively kill that WU.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 876162 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 876166 - Posted: 16 Mar 2009, 15:34:28 UTC - in response to Message 876162.

If that original wingman (_0) times out, the WU will still be alive, but it will all hinge on the last wingman (_6). The error limit is 5, which is inclusive..so up to, and including 5 errors. More than 5 will effectively kill that WU.



Ahh, good then there is still hope. I might get my credits in a month or so. :)

Guess I miscounted, I thought it was already up to the limit.



PROUD MEMBER OF Team Starfire World BOINC

ID: 876166 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,621,407
RAC: 321
United States
Message 876172 - Posted: 16 Mar 2009, 15:44:46 UTC - in response to Message 876166.

If that original wingman (_0) times out, the WU will still be alive, but it will all hinge on the last wingman (_6). The error limit is 5, which is inclusive..so up to, and including 5 errors. More than 5 will effectively kill that WU.



Ahh, good then there is still hope. I might get my credits in a month or so. :)

Guess I miscounted, I thought it was already up to the limit.

Nope. Only 4 errors presently. Missed deadline I think counts as an error, which would be 5, meaning one more than that and it's dead. So yes, still hope on that one, but it's borderline.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 876172 · Report as offensive
Profile Ministry of Disinformation

Send message
Joined: 19 Sep 06
Posts: 8
Credit: 16,873,992
RAC: 18
United Kingdom
Message 876176 - Posted: 16 Mar 2009, 16:10:44 UTC - in response to Message 875793.

Well, the workunit looks to have been purged from the database now, the link now resolves to a page that contains a message saying "can't find workunit."

The situation sucks badly in my opinion, I just take some small comfort from the fact it was only 10 hours or so CPU time, although that's not going to be of comfort to other people if they run into the same sort of problem.

I'd hate to think how the people with only 1 older computer for instance and it takes 40 hours or more for them to crunch a single AP wu will feel.

I don't think that this policy has been thought through very well to be honest.

While I appreciate we are running seti@home by choice, it is nice to think that when we do, we will receive credit for the work we actually do. Not be told "tough, other people screwed you up by getting errors."

I would like to see a response from one of the actual seti team about this, with a positive message that they are at least aware of the issues and they are going to try to fix it.

ID: 876176 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,621,407
RAC: 321
United States
Message 876193 - Posted: 16 Mar 2009, 17:10:18 UTC

It's not a policy issue, per se. The problem stems from the network issue that happened a few weeks ago and caused a TON of failed downloads. Under normal circumstances, even with some of the bugs in the app, one would expect there to never be more than 4 tasks for one WU. Compute errors happen for apparently no reason at all, but it's not predictable, so we just kind of hope for the best.

That network issue that caused a lot of failed downloads is a problem though. A task being aborted by the client because the application does not exist due to a failed download is a logistical problem, in my opinion. I think the client should not download any tasks until the application is successfully downloaded. That would have mitigated all of those issues with that network problem.

So, once we get all of those WUs out of the way that were involved in that network issue, things will be smooth sailing again, but we just have to get past the problem WUs. Some people will be burned by it, and though unfortunate, there's not really much that can be done about it. True there are going to be voids and holes in the science database since results were effectively just thrown out, but I think we may end up keeping the "tapes" that were affected and split them again at another date..maybe we have already.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 876193 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 876226 - Posted: 16 Mar 2009, 19:23:31 UTC - in response to Message 876155.

It looks to be shaping up to me having one of those too. My wingman appears to have gone astray on this one http://setiathome.berkeley.edu/workunit.php?wuid=417846562. He has until the 24th but I'm not holding my breath.

The wingmate's laptop may have delayed starting those 2 AP_v5 units it has, and would probably take a week or so to do one. Even if it misses deadline or otherwise errors, the WU will only have 5 errors so another wingmate would be assigned. Holding your breath until April 23rd would indeed be uncomfortable.
                                                               Joe

ID: 876226 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Completed, can't validate


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.