Server Not Working Correctly


log in

Advanced search

Questions and Answers : Web site : Server Not Working Correctly

Author Message
Profile BlackLuke
Volunteer tester
Send message
Joined: 19 Jul 99
Posts: 170
Credit: 75,609,147
RAC: 101,628
United States
Message 1134747 - Posted: 1 Aug 2011, 9:43:39 UTC

Here are two WUs that were marked too late to validate and the actual due dates as shown in a copy of client_state.xml:

CS.xml
Due Date Due Date WU Name WU ID
1314112265 8/23/2011 15:11:05 19no10ad.7092.69008.16.10.52_3 741571872
1314149027 8/24/2011 1:23:47 30ap11ad.29805.4975.12.10.253.vlar_0 778944033

____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1134790 - Posted: 1 Aug 2011, 15:09:57 UTC - in response to Message 1134747.

In both cases, a quorum was already met and a canonical result was already chosen before you returned your work, making your work invalid even if it was before the deadline you were given.


What caused this?


People have been having problems getting and returning work to SETI's servers. Most likely scenario was that someone's quorum partner could not return their work in time, so the server generated another copy and sent it to you. In the meantime the original quorum partner was able to get their upload through, making your result invalid.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2792
Credit: 6,305,470
RAC: 7,360
Bulgaria
Message 1134887 - Posted: 1 Aug 2011, 17:40:53 UTC - in response to Message 1134790.
Last modified: 1 Aug 2011, 18:15:56 UTC


I don't think this "likely scenario" is supposed to happen (it's not by design).

All the sent tasks (replications of the same WU) have to get credit (by design)
if they are returned (reported) before the deadline and the results are correct (compare OK with the other results).

There is maybe bug in the validate logic.


http://setiathome.berkeley.edu/workunit.php?wuid=741571872

Task Computer Sent Time reported or deadline Status Run time CPU time Credit Application 1905523477 4414279 11 May 2011 | 4:22:00 UTC 18 May 2011 | 3:43:01 UTC Completed and validated 1,239.48 48.44 116.19 SETI@home Enhanced Anonymous platform (NVIDIA GPU) 1905523478 5453750 11 May 2011 | 4:21:59 UTC 12 May 2011 | 19:26:35 UTC Completed and validated 19,365.73 19,065.21 116.19 SETI@home Enhanced v6.03 1914542516 5813623 18 May 2011 | 8:21:43 UTC 6 Jul 2011 | 1:20:17 UTC Completed and validated 11,407.08 11,329.26 116.19 SETI@home Enhanced v6.03 1982544584 6120747 6 Jul 2011 | 0:57:32 UTC 1 Aug 2011 | 6:34:32 UTC Completed,too late to validate 9,591.20 8,826.14 0.00 SETI@home Enhanced Anonymous platform (CPU)







____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1134890 - Posted: 1 Aug 2011, 17:50:10 UTC - in response to Message 1134887.


I don't think this "likely scenario" is supposed to happen (it's not by design).

All the sent tasks (replications of the same WU) have to get credit (by design)
if they are returned (reported) before the deadline and the results are correct (compare OK with the other results).

There is maybe bug in the validate logic.


In a perfect world, it's not supposed to happen, and this is rarity to be certain. But once a canonical result has been chosen, all other copies sent out will be "late" and not receive credit for their work. This has always been the case for many years now.

This is the reason why server-side deletes was instituted into the BOINC server code, but the option is very stressful on servers, and SETI's servers simply cannot handle the extra load so it was disabled.

No bug here. This is the design of BOINC for a while now.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2792
Credit: 6,305,470
RAC: 7,360
Bulgaria
Message 1134896 - Posted: 1 Aug 2011, 18:29:16 UTC - in response to Message 1134890.


But why the canonical result has been chosen before all already sent tasks are reported or miss the deadline?


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1134899 - Posted: 1 Aug 2011, 18:51:17 UTC - in response to Message 1134896.

Because there is no code that dictates that all tasks must be returned before a canonical result is chosen and credit is granted.

Only two tasks need to match closely in order for the validator to choose a canonical result. Any other results are disregarded.

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1134902 - Posted: 1 Aug 2011, 18:59:58 UTC - in response to Message 1134887.
Last modified: 1 Aug 2011, 19:17:04 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=741571872
Task Computer Sent Time reported or deadline Status Run time CPU time Credit Application 1905523477 4414279 11 May 2011 | 4:22:00 UTC 18 May 2011 | 3:43:01 UTC Completed and validated 1,239.48 48.44 116.19 SETI@home Enhanced Anonymous platform (NVIDIA GPU) 1905523478 5453750 11 May 2011 | 4:21:59 UTC 12 May 2011 | 19:26:35 UTC Completed and validated 19,365.73 19,065.21 116.19 SETI@home Enhanced v6.03 1914542516 5813623 18 May 2011 | 8:21:43 UTC 6 Jul 2011 | 1:20:17 UTC Completed and validated 11,407.08 11,329.26 116.19 SETI@home Enhanced v6.03 1982544584 6120747 6 Jul 2011 | 0:57:32 UTC 1 Aug 2011 | 6:34:32 UTC Completed,too late to validate 9,591.20 8,826.14 0.00 SETI@home Enhanced Anonymous platform (CPU)





In this specific case, we can see that the first two copies were sent out at the same time (May 11th, 2011), with the first returned result on the 12th. The second copy came in on the 18th. On that same date of the 18th, a third copy was generated, suggesting that the first two results did not match closely enough and so a "tie breaker" was needed.

This third workunit had a deadline of July 6th, which the third computer did not finish or could not upload in time before the deadline. So a fourth workunit was sent out on the deadline of July 6th at 0:57 UTC, which was sent to BlackLuke's computer.

Then exactly at 1:27 UTC (exactly one half hour after the fourth workunit was sent) the third computer was finally able to upload it's result, thereby concluding the "tie breaker" and granting credit to the original three machines. BlackLuke's workunit may have been returned before his deadline, but it was already irrelevant because a quorum had been met, a canonical result chosen, and credit granted.

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1134909 - Posted: 1 Aug 2011, 19:12:18 UTC - in response to Message 1134747.

1314149027 8/24/2011 1:23:47 30ap11ad.29805.4975.12.10.253.vlar_0 778944033


In the second example here, a workunit was sent out to BlackLuke's computer and a quorum partner on July 8th, 2011. The quorum partner returned his a day later on July 9th, 2011.

BlackLuke actually missed this deadline on July 31st, 2011, so a third copy was sent out. The third computer returned their copy on August 1st at 5:07 UTC, matching closely with the first result and completing the quorum (canonical result chosen and credit granted).

BlackLuke's computer didn't return his copy until August 1st, 2011 at 8:34 UTC, which was too late, therefore he didn't receive credit.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2792
Credit: 6,305,470
RAC: 7,360
Bulgaria
Message 1135039 - Posted: 2 Aug 2011, 3:19:29 UTC


My letter to David Anderson:

On 01-Aug-2011 11:57 AM, BilBg NotGates wrote:
>
> Can you take a look at this thread?:
>
> http://setiathome.berkeley.edu/forum_thread.php?id=64992
>
> Is this the proper way validator is designed to work?
>


His reply:

That looks like a bug;
a task completed by its deadline should get credit.
I'll look into it.
-- David


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1135050 - Posted: 2 Aug 2011, 3:56:08 UTC - in response to Message 1135039.

Well I hope David Anderson at least posts here to let us know what the right answer is. As it is, I stand by my findings.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2792
Credit: 6,305,470
RAC: 7,360
Bulgaria
Message 1135078 - Posted: 2 Aug 2011, 5:18:18 UTC - in response to Message 1134902.
Last modified: 2 Aug 2011, 5:21:05 UTC


I agree with all you say here up to this point:
"... thereby concluding the "tie breaker" and granting credit to the original three machines."

Yes, that happened in reality - but this "concluding the "tie breaker"" is not supposed (IMHO) to happen if there is task still "In progress".
Fair play from the server will be to postpone the validation so credit can be granted to all fair playing clients.


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1135151 - Posted: 2 Aug 2011, 11:54:24 UTC - in response to Message 1135078.

Why should it not happen? If the first two results didn't match and a third one is required, then what does it matter if there's four out there so long as the third one meets the quorum?

I've actually seen this happen before, where a computer can't upload it's results due to the server problems, a third or fourth gets sent out because the second or third is marked late, then the other one is able to upload eventually causing the newly sent out workunit to be irrelevant.

Profile BlackLuke
Volunteer tester
Send message
Joined: 19 Jul 99
Posts: 170
Credit: 75,609,147
RAC: 101,628
United States
Message 1135217 - Posted: 2 Aug 2011, 14:37:46 UTC - in response to Message 1134890.

>No bug here. This is the design of BOINC for a while now.

Nuts! You did not read my post. The actual due dates for the WUs shown was 8/23/11 and 8/24/11; I know this from a backup copy of client_state.xml my system made. Something changed the due dates to 8/1/11, the system sent out a new WU to another user shortly thereafter, and that user returned his result before mine. Why our hard-nosed, sometimes less than user-centered, masters declared my result invalid while the WU was still in the DB is another question. Why not better late than never? On a recent interview of a writer/producer of a hit TV series ("Everybody Loves Raymond?) on Public Radio, the interviewee said he did not start making any money until he consciously decided to concentrate on being good and being kind. I ask your prayers for more avaricious and kindly masters.


____________

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12324
Credit: 2,627,394
RAC: 948
Netherlands
Message 1135275 - Posted: 2 Aug 2011, 23:13:17 UTC - in response to Message 1135217.
Last modified: 2 Aug 2011, 23:16:36 UTC

As I see it, you have 5 tasks too late to validate. All with a due date of 31 Jul 2011 | 21:55:44 UTC.
None of them are the tasks you point out, so either those are still on your system, or they were already done and flushed.

I don't know why your computer was so late returning those tasks, but I do see that in all cases the computer returning the correct task was one with the stock application. So either your computer had too much work to do and was swapping between these tasks to get them in before the deadline, or since that had passed already, at least as fast as possible, or your computer's been doing something differently.

I mean, to be trumped by a stock application that takes 22,000 seconds, while your optimized application does it in 10,000 and even then misses out...

Your other computer doesn't have any invalid tasks. Errors yes, but no invalids.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1135305 - Posted: 3 Aug 2011, 0:37:40 UTC - in response to Message 1135217.

>No bug here. This is the design of BOINC for a while now.

Nuts! You did not read my post.


There wasn't much to your post, how could I not read it? I even investigated each case enough to find out what happened, yet somehow I didn't read it.

The actual due dates for the WUs shown was 8/23/11 and 8/24/11; I know this from a backup copy of client_state.xml my system made. Something changed the due dates to 8/1/11, the system sent out a new WU to another user shortly thereafter, and that user returned his result before mine.


All I can tell you is what I see. If you have proof of the "real" deadlines, then you should have provided that in your post. As it is, the results only stay visible for 24 hours, so there's a short window of opportunity to see what happened.

If the deadlines somehow changed, this would be the first known instance of the deadlines changing. It's more likely that you read the deadlines wrong.

Why our hard-nosed, sometimes less than user-centered, masters declared my result invalid while the WU was still in the DB is another question. Why not better late than never? On a recent interview of a writer/producer of a hit TV series ("Everybody Loves Raymond?) on Public Radio, the interviewee said he did not start making any money until he consciously decided to concentrate on being good and being kind. I ask your prayers for more avaricious and kindly masters.


So BilBg posts in here that lead Project Administrator David Anderson seems to agree with you, and you call him "hard-nosed, less than user-centered, "masters""? Sounds like you're being quite harsh and over-dramatic.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2792
Credit: 6,305,470
RAC: 7,360
Bulgaria
Message 1135343 - Posted: 3 Aug 2011, 2:17:13 UTC - in response to Message 1135151.


"I've actually seen this happen before ..." - I don't say it does not happen, I say it should not happen
(not giving credit for correct work which the server asked from the client to do).

"... causing the newly sent out workunit to be irrelevant" is true considering science.

But considering the "deal" made between the server and the client it's unfair.
Almost anyone will be upset if the computing time spent by his computer is not rewarded/acknowledged by credit.
And it reduces "Max tasks per day".

The best what the server can do is ask the client to cancel the already made "deal" - send the abort request if the task is not already started at the client.
If the client aborts the task the workunit is free to become "case closed".
If the client do not report the task (as aborted or finished) the server have to wait up until the deadline.

(If somebody makes a deal with you saying: "Do this work for me and I will pay you"
you will want your payment (for good, in-time completed work) even if the contractor do not need your work anymore at the time you finish it.

You don't care (and don't even know) that the work was given to you only because another person failed to complete the same work in-time.

What will be your verdict in this case?
(if you "play" judge this time ;) )
)


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1135345 - Posted: 3 Aug 2011, 2:30:56 UTC - in response to Message 1135343.

"I've actually seen this happen before ..." - I don't say it does not happen, I say it should not happen
(not giving credit for correct work which the server asked from the client to do).


And I say it's fair to not reward credit for a workunit that isn't returned on time or misses the quorum.

On the other hand, SETI@Home has been known to give out credits in situations like this when it's their servers and connection that causes the uploading issues. I don't have a problem with that.

"... causing the newly sent out workunit to be irrelevant" is true considering science.

But considering the "deal" made between the server and the client it's unfair.
Almost anyone will be upset if the computing time spent by his computer is not rewarded/acknowledged by credit.


What deal? A client asks for work, the server hands it out with a deadline. The only reason why the deadline even exists is out of respect for the quorum partner. It's not like the ET signal is going to be suddenly gone if it's not returned on time.

BOINC is centered around the fact that work is being sent out to computers of unknown reliability or loyalty. People quit all the time without returning their work.

The whole premise of credits is to pay for valid work, which must be returned on time.

And it reduces "Max tasks per day".


No it doesn't. Only invalid or erred tasks reduce the "Max tasks per day". Missed deadlines do not.

The best what the server can do is ask the client to cancel the already made "deal" - send the abort request if the task is not already started at the client.


...and that is built into BOINC, but SETI's servers cannot handle the extra cross-checking.

If the client aborts the task the workunit is free to become "case closed".
If the client do not report the task (as aborted or finished) the server have to wait up until the deadline.


Yes, that's how it works. It would seem it's unfair to the server that some people join and never finish a workunit without at least first telling the serer they aren't going to finish what they've downloaded.

(If somebody makes a deal with you saying: "Do this work for me and I will pay you"
you will want your payment (for good, in-time completed work) even if the contractor do not need your work anymore at the time you finish it.


But if it's made clear to me that I will only get paid for valid work, and I don't turn it in on time, I don't get credit.

My girlfriend is going to college, and that's the way her professors are. Her niece is going to grade school, and that's the way her teachers are. Kids don't get credit toward their grade if they return their work late.

I'm afraid your analogy doesn't work here.

You don't care (and don't even know) that the work was given to you only because another person failed to complete the same work in-time.


Actually, you do. If you spend the time to check all your tasks to see if they got credit, you have the time to check to see why you got the task in the first place. All you have to do is click on each task to see what other computers they're assigned to.

If you don't care, well then you shouldn't care enough to demand credits for work returned late.

What will be your verdict in this case?
(if you "play" judge this time ;) )
)


Well, I don't get to play judge. Only David or Eric or one of the other Project Admins do.

But if I did, I would probably manually intervene and grant credit only because the servers have been dropping connections, making it nearly impossible to return work on time.

But if the servers weren't dropping connections, then I would not grant credit for late work.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2792
Credit: 6,305,470
RAC: 7,360
Bulgaria
Message 1135362 - Posted: 3 Aug 2011, 3:15:15 UTC - in response to Message 1135345.


Sorry - most of my writing is misunderstood (judging by your responses)
Maybe because I'm Bulgarian and don't write English well


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24685
Credit: 522,659
RAC: 19
United States
Message 1135365 - Posted: 3 Aug 2011, 3:30:53 UTC

It is not fair to not give credit if a task is reported by its deadline - even if the quorum has already been met. You were asked to do the task, and you did the task and returned it by the time you were supposed to return the task.

It is fair to not give credit if the task is returned late and the quorum has already been met. You did not return it on time, and it is useless.

It is borderline whether to give credit if you are late and the quorum has not been met before you return the task. The design is that it this task will get credit as the project is using the result.

The design, and the way the server is supposed to work is based on the above. If you return the work on time or the project uses it anyway, AND you return a correct result, you get credit.
____________


BOINC WIKI

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,987,212
RAC: 20,033
United States
Message 1135371 - Posted: 3 Aug 2011, 4:00:00 UTC - in response to Message 1135362.

Sorry - most of my writing is misunderstood (judging by your responses)
Maybe because I'm Bulgarian and don't write English well


Probably not your fault at all. In fact, many people seem to read anger or an acerbic tone to my posts that are not there - and I'm a native English speaker!

It's hard to tell based on the fact that most people can't read tone or inflection in words. I'll usually tell people when they're making me angry. Otherwise, if you could hear me face to face, I'm probably just speaking matter-of-factly instead of emotionally (with anger).

Questions and Answers : Web site : Server Not Working Correctly

Copyright © 2014 University of California