AP 505 in Server Status

Message boards : Number crunching : AP 505 in Server Status
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile JimHilty2
Avatar

Send message
Joined: 30 Apr 03
Posts: 75
Credit: 7,199,464
RAC: 0
Germany
Message 1213871 - Posted: 4 Apr 2012, 8:32:51 UTC

This may be a stupid query, but if server status is only showing AP 505 and only 2,538 are still out in the field! What are the 18,325 Waiting to validate going to validate with lol.
ID: 1213871 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1213875 - Posted: 4 Apr 2012, 9:03:35 UTC - in response to Message 1213871.  

Most of those would likely be the stuck 1's that a lot of people have been complaining about (I have quite a few myself dating back as far as last year) but hopefully once all have been returned the database can then be cleaned up of them.

Cheers.
ID: 1213875 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1213885 - Posted: 4 Apr 2012, 10:44:58 UTC - in response to Message 1213871.  

I wonder how many of the 3,317,781 MB Results awaiting validation are also stuck.
ID: 1213885 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1213887 - Posted: 4 Apr 2012, 11:06:18 UTC - in response to Message 1213885.  

I guess that we'll find out when v7 takes over but I've found an awful lot in my pendings from the 17th Feb to the 22nd Feb that are stuck in limbo.

Cheers.
ID: 1213887 · Report as offensive
Profile JimHilty2
Avatar

Send message
Joined: 30 Apr 03
Posts: 75
Credit: 7,199,464
RAC: 0
Germany
Message 1213912 - Posted: 4 Apr 2012, 13:11:03 UTC - in response to Message 1213885.  

I wonder how many of the 3,317,781 MB Results awaiting validation are also stuck.



Since the number swings between about 3 million and up to 5.5 million when the validators collapsed I would think a lot of then lol,
ID: 1213912 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1214012 - Posted: 4 Apr 2012, 18:34:11 UTC - in response to Message 1213887.  

red-ray wrote:
I wonder how many of the 3,317,781 MB Results awaiting validation are also stuck.

I guess that we'll find out when v7 takes over but I've found an awful lot in my pendings from the 17th Feb to the 22nd Feb that are stuck in limbo.

Cheers.

But none with a canonical result already chosen, AFAICT. All seem to be just delayed by timeouts and such.

The problem seen on AP associated with hosts reporting after deadline is obviously far less likely on MB. I haven't seen any actual cases, though that can either be because I haven't looked hard enough or because the intended logic is actually working for MB even though it fails for AP. Can anyone provide a link which demonstrates it actually happens on MB?
                                                                   Joe
ID: 1214012 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1214044 - Posted: 4 Apr 2012, 20:48:25 UTC - in response to Message 1214012.  

red-ray wrote:
I wonder how many of the 3,317,781 MB Results awaiting validation are also stuck.

I guess that we'll find out when v7 takes over but I've found an awful lot in my pendings from the 17th Feb to the 22nd Feb that are stuck in limbo.

Cheers.

But none with a canonical result already chosen, AFAICT. All seem to be just delayed by timeouts and such.

The problem seen on AP associated with hosts reporting after deadline is obviously far less likely on MB. I haven't seen any actual cases, though that can either be because I haven't looked hard enough or because the intended logic is actually working for MB even though it fails for AP. Can anyone provide a link which demonstrates it actually happens on MB?
                                                                   Joe

These are not the same problem as AP's but here are some of the ones that I'm talking about that are just stuck for some other reason;
Workunit 933438282
Workunit 933480801
Workunit 933480813
Workunit 933333183
Workunit 933573617
Workunit 933638829
Workunit 933876854
Workunit 933967999
Workunit 933968041
Workunit 933967800
Workunit 933438278
There are many more of them.

Cheers.
ID: 1214044 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1214086 - Posted: 4 Apr 2012, 23:30:29 UTC - in response to Message 1214044.  
Last modified: 4 Apr 2012, 23:32:43 UTC

red-ray wrote:
I wonder how many of the 3,317,781 MB Results awaiting validation are also stuck.

I guess that we'll find out when v7 takes over but I've found an awful lot in my pendings from the 17th Feb to the 22nd Feb that are stuck in limbo.

Cheers.

But none with a canonical result already chosen, AFAICT. All seem to be just delayed by timeouts and such.

The problem seen on AP associated with hosts reporting after deadline is obviously far less likely on MB. I haven't seen any actual cases, though that can either be because I haven't looked hard enough or because the intended logic is actually working for MB even though it fails for AP. Can anyone provide a link which demonstrates it actually happens on MB?
                                                                   Joe

These are not the same problem as AP's but here are some of the ones that I'm talking about that are just stuck for some other reason;
Workunit 933438282
Workunit 933480801
Workunit 933480813
Workunit 933333183
Workunit 933573617
Workunit 933638829
Workunit 933876854
Workunit 933967999
Workunit 933968041
Workunit 933967800
Workunit 933438278
There are many more of them.

Cheers.

There is supposed to be a safety factor such that if the validator doesn't get triggered when a result is reported it will be triggered when the original deadline of the last sent task is reached. The first and last WUs in your list have just passed that point, and are showing the tasks as validated, so may be in the one day wait before purging. The others seem to not have reached the task deadlines yet.

It is surely uncomfortable to wait, and the need for that safety factor indicates a weakness in BOINC. It's the fact that even the safety factor doesn't get some AP WUs properly cleaned up which is a puzzle.
                                                                  Joe
ID: 1214086 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1214101 - Posted: 5 Apr 2012, 0:40:20 UTC - in response to Message 1214086.  

red-ray wrote:
I wonder how many of the 3,317,781 MB Results awaiting validation are also stuck.

I guess that we'll find out when v7 takes over but I've found an awful lot in my pendings from the 17th Feb to the 22nd Feb that are stuck in limbo.

Cheers.

But none with a canonical result already chosen, AFAICT. All seem to be just delayed by timeouts and such.

The problem seen on AP associated with hosts reporting after deadline is obviously far less likely on MB. I haven't seen any actual cases, though that can either be because I haven't looked hard enough or because the intended logic is actually working for MB even though it fails for AP. Can anyone provide a link which demonstrates it actually happens on MB?
                                                                   Joe

These are not the same problem as AP's but here are some of the ones that I'm talking about that are just stuck for some other reason;
Workunit 933438282
Workunit 933480801
Workunit 933480813
Workunit 933333183
Workunit 933573617
Workunit 933638829
Workunit 933876854
Workunit 933967999
Workunit 933968041
Workunit 933967800
Workunit 933438278
There are many more of them.

Cheers.

There is supposed to be a safety factor such that if the validator doesn't get triggered when a result is reported it will be triggered when the original deadline of the last sent task is reached. The first and last WUs in your list have just passed that point, and are showing the tasks as validated, so may be in the one day wait before purging. The others seem to not have reached the task deadlines yet.

It is surely uncomfortable to wait, and the need for that safety factor indicates a weakness in BOINC. It's the fact that even the safety factor doesn't get some AP WUs properly cleaned up which is a puzzle.
                                                                  Joe

It just seems strange that they all occur in that 5 day period (not that I'm worried about them as I know they'll eventually cleaned up just like the v505 AP's will be).

Cheers.
ID: 1214101 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1214109 - Posted: 5 Apr 2012, 1:11:03 UTC

Yeah, I've heard there is some master command that can be issued that resets the transitioner state back to the first step and make it run through all of the "stuck" tasks again. It's somewhere between a rumor and buried very very deep inside server code where nobody would find it. There hasn't been a whole lot of mention of it though.

I think it was either Josef or Richard that mentioned something about it a long time ago (over a year) in some other thread about this very subject. That's the only mention of that option.

I believe there's some danger in running it though if there are active tasks that are legitimately waiting for someone to return a task though, and I imagine it would be a pretty heavy load on the database. I'm not entirely sure if it may be a good idea to run that before or after the big spring-cleaning event Matt mentioned.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1214109 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1214288 - Posted: 5 Apr 2012, 16:17:39 UTC

I seem to remember a few years back we had a validate problem something like this and we had to give a link to the stuck ones so that the guys could go in and manually give credit and remove them. Possibly the guys are just too busy right now to do that and will let us know if we need to do it again when they get time.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1214288 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1214313 - Posted: 5 Apr 2012, 17:04:40 UTC - in response to Message 1214288.  

I seem to remember a few years back we had a validate problem something like this and we had to give a link to the stuck ones so that the guys could go in and manually give credit and remove them. Possibly the guys are just too busy right now to do that and will let us know if we need to do it again when they get time.

Somehow I can't imagine a proper script could not be written to go through the DB and resolve these issues.
However, these are busier times than usual in the lab, and as you said, the boyz probably have more important things on their plate right now.
As the last of the AP WUs in the field start to filter in, I suspect it shall be quite apparent that there is a lot of stuck data for whatever reason.
There are still 505s being resent, so it might be awhile before they are all back home.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1214313 · Report as offensive

Message boards : Number crunching : AP 505 in Server Status


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.