100+ WUs missed validation/back end processing!!

Message boards : Number crunching : 100+ WUs missed validation/back end processing!!
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 819567 - Posted: 17 Oct 2008, 5:54:37 UTC
Last modified: 17 Oct 2008, 6:21:25 UTC

wuid=331600495
wuid=343506138
wuid=343506230

One of my hosts reported ~150 results after the outage earlier this week. Apparently there was a hiccup doing the report as the messages from the log show that some of the results were then refused...

14-Oct-2008 19:00:11 [SETI@home] Sending scheduler request: To report completed tasks.  Requesting 0 seconds of work, reporting 147 completed tasks
14-Oct-2008 19:01:03 [SETI@home] Scheduler request failed: HTTP internal server error
14-Oct-2008 19:02:53 [SETI@home] Computation for task 22au08ag.27142.10297.10.8.91_1 finished
14-Oct-2008 19:02:53 [SETI@home] Starting 22au08ag.27142.10297.10.8.79_1
14-Oct-2008 19:02:53 [SETI@home] Starting task 22au08ag.27142.10297.10.8.79_1 using setiathome_enhanced version 528
14-Oct-2008 19:02:55 [SETI@home] Started upload of 22au08ag.27142.10297.10.8.91_1_0
14-Oct-2008 19:03:00 [SETI@home] Finished upload of 22au08ag.27142.10297.10.8.91_1_0
14-Oct-2008 19:03:23 [SETI@home] Sending scheduler request: To report completed tasks.  Requesting 0 seconds of work, reporting 148 completed tasks
14-Oct-2008 19:03:59 [SETI@home] Scheduler request succeeded: got 0 new tasks
14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 22au08ag.28785.4162.8.8.244_1 refused: result already reported as success
14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 23au08ab.29087.4162.4.8.224_1 refused: result already reported as success
14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 22au08ag.28785.4162.8.8.247_0 refused: result already reported as success
...
14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 24au08aa.15229.6616.11.8.56_2 refused: result already reported as success
...
14-Oct-2008 19:03:59 [SETI@home] Message from server: Completed result 23au08ad.28372.3344.5.8.224_1 refused: result already reported as success



The problem is that it seems that the validator was not invoked for ~108 of these returned results and are now stagnant, waiting for some kind of event to trigger them through the back end.

WUs that my wingman reported after I reported have validated without issue... it is just the WUs that I reported completed results after my wingman had reported are "stuck".

This host has been up and running BOINC without interruption for at least the past 10 days.

Any ideas?
ID: 819567 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 819570 - Posted: 17 Oct 2008, 6:00:52 UTC - in response to Message 819567.  


Any ideas?

More whiskey?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 819570 · Report as offensive
Profile Daniel
Volunteer tester
Avatar

Send message
Joined: 21 May 07
Posts: 562
Credit: 437,494
RAC: 0
United States
Message 819571 - Posted: 17 Oct 2008, 6:01:23 UTC - in response to Message 819570.  


Any ideas?

More whiskey?


Seconded.
Daniel

ID: 819571 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 819573 - Posted: 17 Oct 2008, 6:07:13 UTC - in response to Message 819571.  
Last modified: 17 Oct 2008, 6:10:34 UTC


Any ideas?

More whiskey?


Seconded.

To quote an old comedy album........from Woody Woodbury...
'Booze is the only Answer'.....
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 819573 · Report as offensive
Profile Uli
Volunteer tester
Avatar

Send message
Joined: 6 Feb 00
Posts: 10923
Credit: 5,996,015
RAC: 1
Germany
Message 819581 - Posted: 17 Oct 2008, 6:42:41 UTC

Maybe not, so on topic please!!!
Pluto will always be a planet to me.

Seti Ambassador
Not to late to order an Anni Shirt
ID: 819581 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 821384 - Posted: 21 Oct 2008, 4:38:33 UTC

Bump...
Maybe I'm the only user noticing this problem... but, IMO, it should be addressed...
Here are some of my WUs that seem to have been Lost in Transition to the backend.
There are ~76 more in my queue if I need to list them for resolution...

wuid=343341103
wuid=343341095
wuid=343341101
wuid=343341113
wuid=343341116
wuid=343341144
wuid=343341130
wuid=331395713 I am 3rd (missing?) wingman
wuid=343341156
wuid=343341189
wuid=343341250
wuid=343341245
wuid=343341270
wuid=343341276
wuid=343341276
wuid=343341291
wuid=343341303
wuid=343341315
wuid=343341298
wuid=343341301
wuid=343341307
wuid=343341313
wuid=343341319
wuid=343341317
wuid=343341323
wuid=343341314
wuid=343341320
wuid=343341326
wuid=343454800
wuid=343454806
wuid=343454850
wuid=343454856
wuid=343454836
wuid=343473042

Regards,
JDWhale

ID: 821384 · Report as offensive
john deneer
Volunteer tester
Avatar

Send message
Joined: 16 Nov 06
Posts: 331
Credit: 20,996,606
RAC: 0
Netherlands
Message 821461 - Posted: 21 Oct 2008, 11:12:16 UTC - in response to Message 821384.  
Last modified: 21 Oct 2008, 11:13:03 UTC

Bump...
Maybe I'm the only user noticing this problem... but, IMO, it should be addressed...
Here are some of my WUs that seem to have been Lost in Transition to the backend.
There are ~76 more in my queue if I need to list them for resolution...

Regards,
JDWhale

Is there a simple way of recognizing them? I would rather not check all my pendings by hand, that might take quite some time ..... Have you seen any pattern that I could use to hunt for them? Or would it be sufficient to check for that 'refused result' message in the logs of boincmanager?

Regards,
John.
ID: 821461 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 821467 - Posted: 21 Oct 2008, 11:34:47 UTC - in response to Message 821461.  

Bump...
Maybe I'm the only user noticing this problem... but, IMO, it should be addressed...
Here are some of my WUs that seem to have been Lost in Transition to the backend.
There are ~76 more in my queue if I need to list them for resolution...

Regards,
JDWhale

Is there a simple way of recognizing them? I would rather not check all my pendings by hand, that might take quite some time ..... Have you seen any pattern that I could use to hunt for them? Or would it be sufficient to check for that 'refused result' message in the logs of boincmanager?

Regards,
John.

My hosts 'stats' chart displays a 10% drop in RAC because of the 100+ WUs (about 1 days worth) lost in validation after last weeks outage.

All my lost WUs were reported at 15 Oct 2008 0:01:18 UTC and seem to be limited to one host.
ID: 821467 · Report as offensive
john deneer
Volunteer tester
Avatar

Send message
Joined: 16 Nov 06
Posts: 331
Credit: 20,996,606
RAC: 0
Netherlands
Message 821469 - Posted: 21 Oct 2008, 11:51:27 UTC - in response to Message 821467.  

All my lost WUs were reported at 15 Oct 2008 0:01:18 UTC and seem to be limited to one host.

I checked in the task lists of both my hosts, but I have none such wu's.

I don't keep a very detailed history of all my wu's, but on 15 October the earliest reported I can still trace back were reported at 9:33 am and 8:12 am UTC resp. I have no idea whether I actually reported any wu's shortly after that outage, but if I did they were validated etc. normally.

Regards,
John.
ID: 821469 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 821475 - Posted: 21 Oct 2008, 12:36:10 UTC - in response to Message 821467.  

Bump...
Maybe I'm the only user noticing this problem... but, IMO, it should be addressed...
Here are some of my WUs that seem to have been Lost in Transition to the backend.
There are ~76 more in my queue if I need to list them for resolution...

Regards,
JDWhale

Is there a simple way of recognizing them? I would rather not check all my pendings by hand, that might take quite some time ..... Have you seen any pattern that I could use to hunt for them? Or would it be sufficient to check for that 'refused result' message in the logs of boincmanager?

Regards,
John.

My hosts 'stats' chart displays a 10% drop in RAC because of the 100+ WUs (about 1 days worth) lost in validation after last weeks outage.

All my lost WUs were reported at 15 Oct 2008 0:01:18 UTC and seem to be limited to one host.

A random sample of the examples given show the wingpeople reported different dates/times and all before the 15th. So if you're a wingperson to one of these it would be difficult to spot.
ID: 821475 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 821518 - Posted: 21 Oct 2008, 15:12:44 UTC - in response to Message 821475.  

A random sample of the examples given show the wingpeople reported different dates/times and all before the 15th. So if you're a wingperson to one of these it would be difficult to spot.


Correct... if the wingman reported after I did, then the WU validated without incident was removed after 24 hours. It seems that the servers hiccup'd when I reported these results causing backend events to be lost.
ID: 821518 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 822241 - Posted: 23 Oct 2008, 14:38:51 UTC
Last modified: 23 Oct 2008, 14:40:10 UTC

Bump... And a few more of these Lost in Transition WUs...

As near as I can tell, the deadline on most these WUs is coming up in ~5 days...
Might that trigger validation/backend events? Only time will tell...

Since no one has offered any insight, I'm assuming that this is an undocumented bug in some server side process :-(

wuid=343473039
wuid=343473045
wuid=343473123
wuid=343473129
wuid=343473135
wuid=343473079
wuid=343473091
wuid=343473139
wuid=343473145
wuid=343473151
wuid=343473106
wuid=343473160
wuid=343473166
wuid=343473172
wuid=343473178
wuid=343473149
wuid=343473155
wuid=343473168
wuid=343473164
wuid=343473163
wuid=343473235
wuid=343473190
wuid=343473215
wuid=343473228
wuid=331551422
wuid=343473249
wuid=331551493
wuid=343473247
wuid=343473274
wuid=343473251

Regards,
JDWhale
ID: 822241 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 822246 - Posted: 23 Oct 2008, 14:42:59 UTC - in response to Message 822241.  

Bump... And a few more of these Lost in Transition WUs...

As near as I can tell, the deadline on most these WUs is coming up in ~5 days...
Might that trigger validation/backend events? Only time will tell...

Since no one has offered any insight, I'm assuming that this is an undocumented bug in some server side process :-(

Regards,
JDWhale

Actually, now you mention it, I think you're right on both counts:

Yes, it is a bug - well, at least a glitch - in the back-end processes.

And yes, there's a fair chance that it will self-correct when the 'reached deadline' event is processed - I've just remembered (well, you've just reminded me) that we've seen this before.
ID: 822246 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 822453 - Posted: 24 Oct 2008, 1:10:32 UTC - in response to Message 822246.  

Thanks for the info, I'll wait 'til after next weeks outage & deadlines have passed to post the rest of the affected WUs. (~50 more)
ID: 822453 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 824480 - Posted: 29 Oct 2008, 3:05:07 UTC

All but a couple of the 100+ WUs whos reports were Lost in Transition validated today when the original deadlines lapsed. The few unvalidated WUs either have MIA wingmen and were reissued, or are still awaiting deadline.

This issue is resolved IMO, and I'm satisfied that BOINC handles the case of lost reports admirably.

Regards,
JDWhale

BTW... a nice bump in RAC when 100 WUs validate instantaneously :)
ID: 824480 · Report as offensive

Message boards : Number crunching : 100+ WUs missed validation/back end processing!!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.