Too late to validate?

Message boards : Number crunching : Too late to validate?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1240992 - Posted: 4 Jun 2012, 2:59:48 UTC
Last modified: 4 Jun 2012, 3:00:37 UTC

Here is my list of "invalids" for my main cruncher:http://setiathome.berkeley.edu/results.php?hostid=6371091&offset=0&show_names=0&state=4&appid=

Note they all show as "completed too late to validate", but had a less than 2 day turnaround. What gets me is that all of them were sent to me as the third system, even though the first two rigs had completed and returned their results. Anybody have a thought as to what's going on here?
ID: 1240992 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22217
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1241025 - Posted: 4 Jun 2012, 6:58:35 UTC

This happens periodically. Most often after a big outage we see clusters of WU coming out with impossibly short deadlines.

In case they were initially sent out just before the outage to two other users, who processed them during the outage. The server has decided to send them out again, just after the outage, but before the other two users have reported, but with impossibly short deadlines.

I suspect we are going to see lots of these in the next few days :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1241025 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1241026 - Posted: 4 Jun 2012, 7:05:48 UTC - in response to Message 1241025.  

This happens periodically. Most often after a big outage we see clusters of WU coming out with impossibly short deadlines.

In case they were initially sent out just before the outage to two other users, who processed them during the outage. The server has decided to send them out again, just after the outage, but before the other two users have reported, but with impossibly short deadlines.

I suspect we are going to see lots of these in the next few days :-(


Rob,

I understand what you're saying, as it happens with VLAR's being re-sent on a GPU work request. But these were sent to me as user #3 AFTER users 1&2 had reported them. And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was). I'm guessing that somehow, they got scheduled for resend even though they had been properly reported due to some timing issue between the validators and the scheduler, but I was looking for someone to confirm my theory.
ID: 1241026 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1241028 - Posted: 4 Jun 2012, 7:13:13 UTC - in response to Message 1240992.  

Note they all show as "completed too late to validate", but had a less than 2 day turnaround.

Well, they were returned about 5 minutes after the deadline and after both your wingmen returned their results, so technically that is correct.


What gets me is that all of them were sent to me as the third system, even though the first two rigs had completed and returned their results.

No, they were always send to you before the 2nd result was returned. So that was right as well.

The real question is, why did those WUs had so short dealines? Server bug?
ID: 1241028 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1241030 - Posted: 4 Jun 2012, 7:16:08 UTC - in response to Message 1241026.  

And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was).

You can see the deadline in the task details, for example:
Name 12mr10aa.19911.24347.3.10.48_2
Workunit 999039288
Created 1 Jun 2012 | 5:16:22 UTC
Sent 1 Jun 2012 | 8:05:14 UTC
Received 3 Jun 2012 | 17:29:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 6371091
Report deadline 3 Jun 2012 | 17:24:30 UTC
Run time 744.77
CPU time 107.31
Validate state Task was reported too late to validate
Credit 0.00
Application version SETI@home Enhanced
Anonymous platform (NVIDIA GPU)

ID: 1241030 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1241033 - Posted: 4 Jun 2012, 7:32:22 UTC - in response to Message 1241030.  
Last modified: 4 Jun 2012, 7:43:38 UTC

And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was).

You can see the deadline in the task details, for example:
Name 12mr10aa.19911.24347.3.10.48_2
Workunit 999039288
Created 1 Jun 2012 | 5:16:22 UTC
Sent 1 Jun 2012 | 8:05:14 UTC
Received 3 Jun 2012 | 17:29:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 6371091
Report deadline 3 Jun 2012 | 17:24:30 UTC
Run time 744.77
CPU time 107.31
Validate state Task was reported too late to validate
Credit 0.00
Application version SETI@home Enhanced
Anonymous platform (NVIDIA GPU)


OK, it's late here, and I'm tired, and these will all probably be deleted by tomorrow morning my time, but there are alot of timing issues about these I don't understand. It's not the credits; it's my not understanding how these came about in the first place. Hopefully, we won't, as Rob suggested, be seeing alot of these.

EDIT: For example, it's curious that the project "deadline" is the same in all 8 workunits, and is almost exactly 5 minutes before I returned the workunits. It's as if when I returned them, S@H said "oops, we don't need these, let's change the deadline and make them invalid".
ID: 1241033 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1241035 - Posted: 4 Jun 2012, 7:45:56 UTC - in response to Message 1241025.  

This happens periodically. Most often after a big outage we see clusters of WU coming out with impossibly short deadlines.

In case they were initially sent out just before the outage to two other users, who processed them during the outage. The server has decided to send them out again, just after the outage, but before the other two users have reported, but with impossibly short deadlines.

I suspect we are going to see lots of these in the next few days :-(


yep, had 250+ of these the other day ... good to see that others are getting the good news as well ...

ID: 1241035 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1241086 - Posted: 4 Jun 2012, 14:47:11 UTC - in response to Message 1241033.  

And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was).

You can see the deadline in the task details, for example:
Name 12mr10aa.19911.24347.3.10.48_2
Workunit 999039288
Created 1 Jun 2012 | 5:16:22 UTC
Sent 1 Jun 2012 | 8:05:14 UTC
Received 3 Jun 2012 | 17:29:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 6371091
Report deadline 3 Jun 2012 | 17:24:30 UTC
Run time 744.77
CPU time 107.31
Validate state Task was reported too late to validate
Credit 0.00
Application version SETI@home Enhanced
Anonymous platform (NVIDIA GPU)


OK, it's late here, and I'm tired, and these will all probably be deleted by tomorrow morning my time, but there are alot of timing issues about these I don't understand. It's not the credits; it's my not understanding how these came about in the first place. Hopefully, we won't, as Rob suggested, be seeing alot of these.

EDIT: For example, it's curious that the project "deadline" is the same in all 8 workunits, and is almost exactly 5 minutes before I returned the workunits. It's as if when I returned them, S@H said "oops, we don't need these, let's change the deadline and make them invalid".

Yes, it's another side effect of the server mod to only accept 64 at a time. At 17:24:30 UTC your host reported more than 64, so the excess became subject to the resend lost tasks logic. When that found some WUs already had a canonical result it expired them immediately (set the deadline to 'now') rather than resending them. Then on the next attempt to report them at 17:29:43 UTC they were seen as too late.
                                                                 Joe
ID: 1241086 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1241155 - Posted: 4 Jun 2012, 16:52:02 UTC - in response to Message 1241086.  

You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious...

However, that still does not explain, why some of the _0 and _1 results for these WUs had so short deadlines.
ID: 1241155 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1241167 - Posted: 4 Jun 2012, 17:03:21 UTC - in response to Message 1241086.  


Yes, it's another side effect of the server mod to only accept 64 at a time. At 17:24:30 UTC your host reported more than 64, so the excess became subject to the resend lost tasks logic. When that found some WUs already had a canonical result it expired them immediately (set the deadline to 'now') rather than resending them. Then on the next attempt to report them at 17:29:43 UTC they were seen as too late.
                                                                 Joe

Do we have any word if this wonderful kluge may be rescinded or fixed during tomorrow's outage?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1241167 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1241246 - Posted: 4 Jun 2012, 18:58:51 UTC - in response to Message 1241155.  

You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious...

Yes, probably any host with RAC of 5000 or above ought to be using that safety measure.

However, that still does not explain, why some of the _0 and _1 results for these WUs had so short deadlines.

I assume it was the same 64 limit causing the tasks to be resent, but didn't look at those details while the WUs were unpurged.
                                                                   Joe
ID: 1241246 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1241260 - Posted: 4 Jun 2012, 19:29:42 UTC - in response to Message 1241257.  


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1241260 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1241266 - Posted: 4 Jun 2012, 19:39:22 UTC - in response to Message 1241260.  


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

In related news I have had <max_tasks_reported>100</max_tasks_reported> set on my faster machines for some time. That seems to keep them happy when a large number of tasks build up.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1241266 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1241271 - Posted: 4 Jun 2012, 19:48:28 UTC - in response to Message 1241266.  


I think if you set NNT (until all is reported) the server bug will not make false resents to you?

Resends are still sent even when NNT is set.

Don't you have to actually ask for work like in:
02-Jun-2012 21:58:25 [SETI@home] Reporting 1 completed tasks, requesting new tasks for CPU and GPU
... for the Resends logic to kick in?


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1241271 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1241273 - Posted: 4 Jun 2012, 19:49:28 UTC - in response to Message 1241266.  
Last modified: 4 Jun 2012, 19:50:15 UTC


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

No they aren't, not at this project anyway, at Einstein and other projects with older schedulers, yes resends are sent with NNT set.

Changeset [trac]changeset:21823[/trac]

•scheduler: don't resend work if client isn't requesting work


Claggy
ID: 1241273 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1241281 - Posted: 4 Jun 2012, 19:58:50 UTC - in response to Message 1241273.  


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

No they aren't, not at this project anyway, at Einstein and other projects with older schedulers, yes resends are sent with NNT set.

Changeset [trac]changeset:21823[/trac]

•scheduler: don't resend work if client isn't requesting work


Claggy

Ah OK. I must have seen that with an older client version I was running then. By the date of that change set it looks like 6.10.58 and newer should have that change. I was probably using a 6.10.45 or seomthing when I had that occur.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1241281 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1241291 - Posted: 4 Jun 2012, 20:09:00 UTC - in response to Message 1241281.  
Last modified: 4 Jun 2012, 20:27:37 UTC


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

No they aren't, not at this project anyway, at Einstein and other projects with older schedulers, yes resends are sent with NNT set.

Changeset [trac]changeset:21823[/trac]

•scheduler: don't resend work if client isn't requesting work


Claggy

Ah OK. I must have seen that with an older client version I was running then. By the date of that change set it looks like 6.10.58 and newer should have that change. I was probably using a 6.10.45 or seomthing when I had that occur.

That's the scheduler on the server, ie on synergy, not the scheduler in the client,
older version of Boinc (pre 6.10.x) used to still ask for work even if the preferences were set to not use a resourse, (it was a server side preference then)
Boinc 6.10.x and later used different preferences (i think they were combined on the website later) that stops Boinc 6.10.x and later from even asking for work,

Claggy
ID: 1241291 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1241376 - Posted: 4 Jun 2012, 22:33:18 UTC - in response to Message 1241291.  


I think if you set NNT (until all is reported) the server bug will not make false resents to you?


Resends are still sent even when NNT is set.

No they aren't, not at this project anyway, at Einstein and other projects with older schedulers, yes resends are sent with NNT set.

Changeset [trac]changeset:21823[/trac]

•scheduler: don't resend work if client isn't requesting work


Claggy

Ah OK. I must have seen that with an older client version I was running then. By the date of that change set it looks like 6.10.58 and newer should have that change. I was probably using a 6.10.45 or seomthing when I had that occur.

That's the scheduler on the server, ie on synergy, not the scheduler in the client,
older version of Boinc (pre 6.10.x) used to still ask for work even if the preferences were set to not use a resourse, (it was a server side preference then)
Boinc 6.10.x and later used different preferences (i think they were combined on the website later) that stops Boinc 6.10.x and later from even asking for work,

Claggy

It seems like it was only a few months ago when I had set NNT on a machine and then proceeded to get numerous resends. Perhaps it was just much longer ago then it seems like.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1241376 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1241624 - Posted: 5 Jun 2012, 9:43:38 UTC - in response to Message 1241257.  

[quote]You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious...


But since my versions of Boinc does not support setting the limits in cc_config, and the likelyhood of me upgrading any of my clients is close to zero, my hope is that they fix the issue at the source.
/quote]

my thoughts exactly Sten ...

ID: 1241624 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1241627 - Posted: 5 Jun 2012, 9:53:39 UTC - in response to Message 1241624.  
Last modified: 5 Jun 2012, 9:54:16 UTC

[quote]You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious...


But since my versions of Boinc does not support setting the limits in cc_config, and the likelyhood of me upgrading any of my clients is close to zero, my hope is that they fix the issue at the source.
/quote]

my thoughts exactly Sten ...

David has already done a Changeset, about 6 hours ago, it might even be on Seti now (but i doubt it):

Changeset [trac]changeset:25733[/trac]

•scheduler: if we truncate the # of results accepted

(like we're doing in SETI@home)
don't resend lost results since we don't know what they are


Claggy
ID: 1241627 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Too late to validate?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.