Message boards :
Number crunching :
Too late to validate?
Message board moderation
Author | Message |
---|---|
Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0 |
Here is my list of "invalids" for my main cruncher:http://setiathome.berkeley.edu/results.php?hostid=6371091&offset=0&show_names=0&state=4&appid= Note they all show as "completed too late to validate", but had a less than 2 day turnaround. What gets me is that all of them were sent to me as the third system, even though the first two rigs had completed and returned their results. Anybody have a thought as to what's going on here? |
rob smith Send message Joined: 7 Mar 03 Posts: 22217 Credit: 416,307,556 RAC: 380 |
This happens periodically. Most often after a big outage we see clusters of WU coming out with impossibly short deadlines. In case they were initially sent out just before the outage to two other users, who processed them during the outage. The server has decided to send them out again, just after the outage, but before the other two users have reported, but with impossibly short deadlines. I suspect we are going to see lots of these in the next few days :-( Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0 |
This happens periodically. Most often after a big outage we see clusters of WU coming out with impossibly short deadlines. Rob, I understand what you're saying, as it happens with VLAR's being re-sent on a GPU work request. But these were sent to me as user #3 AFTER users 1&2 had reported them. And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was). I'm guessing that somehow, they got scheduled for resend even though they had been properly reported due to some timing issue between the validators and the scheduler, but I was looking for someone to confirm my theory. |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
Note they all show as "completed too late to validate", but had a less than 2 day turnaround. Well, they were returned about 5 minutes after the deadline and after both your wingmen returned their results, so technically that is correct. What gets me is that all of them were sent to me as the third system, even though the first two rigs had completed and returned their results. No, they were always send to you before the 2nd result was returned. So that was right as well. The real question is, why did those WUs had so short dealines? Server bug? |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was). You can see the deadline in the task details, for example: Name 12mr10aa.19911.24347.3.10.48_2 |
Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0 |
And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was). OK, it's late here, and I'm tired, and these will all probably be deleted by tomorrow morning my time, but there are alot of timing issues about these I don't understand. It's not the credits; it's my not understanding how these came about in the first place. Hopefully, we won't, as Rob suggested, be seeing alot of these. EDIT: For example, it's curious that the project "deadline" is the same in all 8 workunits, and is almost exactly 5 minutes before I returned the workunits. It's as if when I returned them, S@H said "oops, we don't need these, let's change the deadline and make them invalid". |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
This happens periodically. Most often after a big outage we see clusters of WU coming out with impossibly short deadlines. yep, had 250+ of these the other day ... good to see that others are getting the good news as well ... |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
And I don't think they were sent to me with short deadlines; I returned them completed in 2 days (so I don't know what the project deadline for them was). Yes, it's another side effect of the server mod to only accept 64 at a time. At 17:24:30 UTC your host reported more than 64, so the excess became subject to the resend lost tasks logic. When that found some WUs already had a canonical result it expired them immediately (set the deadline to 'now') rather than resending them. Then on the next attempt to report them at 17:29:43 UTC they were seen as too late. Joe |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious... However, that still does not explain, why some of the _0 and _1 results for these WUs had so short deadlines. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Do we have any word if this wonderful kluge may be rescinded or fixed during tomorrow's outage? "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious... Yes, probably any host with RAC of 5000 or above ought to be using that safety measure. However, that still does not explain, why some of the _0 and _1 results for these WUs had so short deadlines. I assume it was the same 64 limit causing the tasks to be resent, but didn't look at those details while the WUs were unpurged. Joe |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
I think if you set NNT (until all is reported) the server bug will not make false resents to you? Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Resends are still sent even when NNT is set. In related news I have had <max_tasks_reported>100</max_tasks_reported> set on my faster machines for some time. That seems to keep them happy when a large number of tasks build up. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Don't you have to actually ask for work like in: 02-Jun-2012 21:58:25 [SETI@home] Reporting 1 completed tasks, requesting new tasks for CPU and GPU ... for the Resends logic to kick in? Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
No they aren't, not at this project anyway, at Einstein and other projects with older schedulers, yes resends are sent with NNT set. Changeset [trac]changeset:21823[/trac] •scheduler: don't resend work if client isn't requesting work Claggy |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Ah OK. I must have seen that with an older client version I was running then. By the date of that change set it looks like 6.10.58 and newer should have that change. I was probably using a 6.10.45 or seomthing when I had that occur. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
That's the scheduler on the server, ie on synergy, not the scheduler in the client, older version of Boinc (pre 6.10.x) used to still ask for work even if the preferences were set to not use a resourse, (it was a server side preference then) Boinc 6.10.x and later used different preferences (i think they were combined on the website later) that stops Boinc 6.10.x and later from even asking for work, Claggy |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
It seems like it was only a few months ago when I had set NNT on a machine and then proceeded to get numerous resends. Perhaps it was just much longer ago then it seems like. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
[quote]You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious... But since my versions of Boinc does not support setting the limits in cc_config, and the likelyhood of me upgrading any of my clients is close to zero, my hope is that they fix the issue at the source. /quote] my thoughts exactly Sten ... |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
[quote]You mean basically every user is forced now to set the limit to max 64 tasks per report in his cc_config.xml, otherwise there's risk of loosing credits? Not that I'm going to run into such issues anytime soon, just curious... David has already done a Changeset, about 6 hours ago, it might even be on Seti now (but i doubt it): Changeset [trac]changeset:25733[/trac] •scheduler: if we truncate the # of results accepted Claggy |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.