Message boards :
Number crunching :
100+ WUs missed validation/back end processing!!
Message board moderation
Author | Message |
---|---|
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
wuid=331600495 wuid=343506138 wuid=343506230 One of my hosts reported ~150 results after the outage earlier this week. Apparently there was a hiccup doing the report as the messages from the log show that some of the results were then refused... 14-Oct-2008 19:00:11 [SETI@home] Sending scheduler request: To report completed tasks. Requesting 0 seconds of work, reporting 147 completed tasks 14-Oct-2008 19:01:03 [SETI@home] Scheduler request failed: HTTP internal server error 14-Oct-2008 19:02:53 [SETI@home] Computation for task 22au08ag.27142.10297.10.8.91_1 finished 14-Oct-2008 19:02:53 [SETI@home] Starting 22au08ag.27142.10297.10.8.79_1 14-Oct-2008 19:02:53 [SETI@home] Starting task 22au08ag.27142.10297.10.8.79_1 using setiathome_enhanced version 528 14-Oct-2008 19:02:55 [SETI@home] Started upload of 22au08ag.27142.10297.10.8.91_1_0 14-Oct-2008 19:03:00 [SETI@home] Finished upload of 22au08ag.27142.10297.10.8.91_1_0 14-Oct-2008 19:03:23 [SETI@home] Sending scheduler request: To report completed tasks. Requesting 0 seconds of work, reporting 148 completed tasks 14-Oct-2008 19:03:59 [SETI@home] Scheduler request succeeded: got 0 new tasks 14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 22au08ag.28785.4162.8.8.244_1 refused: result already reported as success 14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 23au08ab.29087.4162.4.8.224_1 refused: result already reported as success 14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 22au08ag.28785.4162.8.8.247_0 refused: result already reported as success ... 14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 24au08aa.15229.6616.11.8.56_2 refused: result already reported as success ... 14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 23au08ad.28372.3344.5.8.224_1 refused: result already reported as success The problem is that it seems that the validator was not invoked for ~108 of these returned results and are now stagnant, waiting for some kind of event to trigger them through the back end. WUs that my wingman reported after I reported have validated without issue... it is just the WUs that I reported completed results after my wingman had reported are "stuck". This host has been up and running BOINC without interruption for at least the past 10 days. Any ideas? |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
More whiskey? "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Daniel Send message Joined: 21 May 07 Posts: 562 Credit: 437,494 RAC: 0 |
|
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
To quote an old comedy album........from Woody Woodbury... 'Booze is the only Answer'..... "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Uli Send message Joined: 6 Feb 00 Posts: 10923 Credit: 5,996,015 RAC: 1 |
Maybe not, so on topic please!!! Pluto will always be a planet to me. Seti Ambassador Not to late to order an Anni Shirt |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Bump... Maybe I'm the only user noticing this problem... but, IMO, it should be addressed... Here are some of my WUs that seem to have been Lost in Transition to the backend. There are ~76 more in my queue if I need to list them for resolution... wuid=343341103 wuid=343341095 wuid=343341101 wuid=343341113 wuid=343341116 wuid=343341144 wuid=343341130 wuid=331395713 I am 3rd (missing?) wingman wuid=343341156 wuid=343341189 wuid=343341250 wuid=343341245 wuid=343341270 wuid=343341276 wuid=343341276 wuid=343341291 wuid=343341303 wuid=343341315 wuid=343341298 wuid=343341301 wuid=343341307 wuid=343341313 wuid=343341319 wuid=343341317 wuid=343341323 wuid=343341314 wuid=343341320 wuid=343341326 wuid=343454800 wuid=343454806 wuid=343454850 wuid=343454856 wuid=343454836 wuid=343473042 Regards, JDWhale |
john deneer Send message Joined: 16 Nov 06 Posts: 331 Credit: 20,996,606 RAC: 0 |
Bump... Is there a simple way of recognizing them? I would rather not check all my pendings by hand, that might take quite some time ..... Have you seen any pattern that I could use to hunt for them? Or would it be sufficient to check for that 'refused result' message in the logs of boincmanager? Regards, John. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Bump... My hosts 'stats' chart displays a 10% drop in RAC because of the 100+ WUs (about 1 days worth) lost in validation after last weeks outage. All my lost WUs were reported at 15 Oct 2008 0:01:18 UTC and seem to be limited to one host. |
john deneer Send message Joined: 16 Nov 06 Posts: 331 Credit: 20,996,606 RAC: 0 |
All my lost WUs were reported at 15 Oct 2008 0:01:18 UTC and seem to be limited to one host. I checked in the task lists of both my hosts, but I have none such wu's. I don't keep a very detailed history of all my wu's, but on 15 October the earliest reported I can still trace back were reported at 9:33 am and 8:12 am UTC resp. I have no idea whether I actually reported any wu's shortly after that outage, but if I did they were validated etc. normally. Regards, John. |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
Bump... A random sample of the examples given show the wingpeople reported different dates/times and all before the 15th. So if you're a wingperson to one of these it would be difficult to spot. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
A random sample of the examples given show the wingpeople reported different dates/times and all before the 15th. So if you're a wingperson to one of these it would be difficult to spot. Correct... if the wingman reported after I did, then the WU validated without incident was removed after 24 hours. It seems that the servers hiccup'd when I reported these results causing backend events to be lost. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Bump... And a few more of these Lost in Transition WUs... As near as I can tell, the deadline on most these WUs is coming up in ~5 days... Might that trigger validation/backend events? Only time will tell... Since no one has offered any insight, I'm assuming that this is an undocumented bug in some server side process :-( wuid=343473039 wuid=343473045 wuid=343473123 wuid=343473129 wuid=343473135 wuid=343473079 wuid=343473091 wuid=343473139 wuid=343473145 wuid=343473151 wuid=343473106 wuid=343473160 wuid=343473166 wuid=343473172 wuid=343473178 wuid=343473149 wuid=343473155 wuid=343473168 wuid=343473164 wuid=343473163 wuid=343473235 wuid=343473190 wuid=343473215 wuid=343473228 wuid=331551422 wuid=343473249 wuid=331551493 wuid=343473247 wuid=343473274 wuid=343473251 Regards, JDWhale |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
Bump... And a few more of these Lost in Transition WUs... Actually, now you mention it, I think you're right on both counts: Yes, it is a bug - well, at least a glitch - in the back-end processes. And yes, there's a fair chance that it will self-correct when the 'reached deadline' event is processed - I've just remembered (well, you've just reminded me) that we've seen this before. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Thanks for the info, I'll wait 'til after next weeks outage & deadlines have passed to post the rest of the affected WUs. (~50 more) |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
All but a couple of the 100+ WUs whos reports were Lost in Transition validated today when the original deadlines lapsed. The few unvalidated WUs either have MIA wingmen and were reissued, or are still awaiting deadline. This issue is resolved IMO, and I'm satisfied that BOINC handles the case of lost reports admirably. Regards, JDWhale BTW... a nice bump in RAC when 100 WUs validate instantaneously :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.