Completed, can't validate


log in

Advanced search

Message boards : Number crunching : Completed, can't validate

Previous · 1 · 2 · 3 · Next
Author Message
OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13614
Credit: 30,276,534
RAC: 20,852
United States
Message 876253 - Posted: 16 Mar 2009, 20:36:46 UTC - in response to Message 876176.

I would like to see a response from one of the actual seti team about this, with a positive message that they are at least aware of the issues and they are going to try to fix it.


I have invited Project Administrator Dr. Eric Korpela to visit this thread and offer his input.
____________

Zydor
Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876297 - Posted: 16 Mar 2009, 22:30:39 UTC - in response to Message 876253.
Last modified: 16 Mar 2009, 22:45:15 UTC

I think this is a case where the Validity of the science needs to bend a little to the needs of the Cruncher. Astropulse has brought in the "new" genre (to SETI) of a long(er) WU. The dynamics of the the whole credit award changes with the longer WU.

Before going further, let me be crystal clear - we cannot allow an unvalidated result to be assimilated as a done deal - end of story.

However..... we also cant get into the situation where a Cruncher has in good faith beavered away, sometimes for a few days with lower powered beasties, ending up with a result thats good then is dumped, no credits merely because he/she was unlucky enough to hit the above scenario. Rare it maybe, however each time it happens we could loose a (rightly) disgruntled Cruncher.

The successful Cruncher should be given the credit on submission of the WU with the longer AP WUs, validation (for credit purposes) should not be necessary. The tracking of the WU meanwhile should continue, and the Project decides what to do with it guided by its own set of agreed internal criteria. That scenario decouples the Cruncher from a situation that is not of their making, and provides no less of a problem for the SETI Team than under the current system - ie what to do with the errant WU. This time however we do not loose a Cruncher, and no increase in problem is faced by the science - its a win/win.

This has been faced by many Projects with long WUs, notably ClimatePrediction where credit is not tied to validation. Whilst its clear their dilemae is far worse, and the instant credit on submission irrefutable after 3 months of crunching one WU, the principle is the same.

AP is a tipping point for SETI. To do this for MB makes no sense, its so rare that a few minutes on an MB WU lost will not cause major grief. AP raises new "culture" questions in how credit is handled, past "best practice" no longer applies as we will loose Crunchers at a steady rate, at no additional increase in quality of the science. The latter is a bad trade off. There is a risk that such a system could allow credit cheats in without validation, but I dont think this would be a reality, the length of the WU will assist greatly in inhibiting that scenario. In any case the basis of most "cheats" has no baring on this issue, they have accomplished the cheat using other means (avoiding giving publicity to the detail of any such method).

SETI should impliment instant credit on submission of the longer type of WU. Its a mindset change, has no penalty for the Science, but does have a big guesture of good faith to the Cruncher. Without the Cruncher, SETI goes no where. Its a small change with substantial effect and no change to the science validity or otherwise of the errant WUs.

Regards
Zy
____________

Eric KorpelaProject donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 3 Apr 99
Posts: 1088
Credit: 8,819,175
RAC: 12,942
United States
Message 876307 - Posted: 16 Mar 2009, 22:48:18 UTC - in response to Message 876253.

We used to run a script daily to grant credit for cases where a workunit errors out, but it appears that was disabled some time ago. I'll re-enable it temporarily and ask Matt and Jeff why we disabled it. Probably because of database speed issue.

Eric
____________

Profile Ministry of Disinformation
Send message
Joined: 19 Sep 06
Posts: 8
Credit: 16,863,028
RAC: 1
United Kingdom
Message 876313 - Posted: 16 Mar 2009, 23:07:31 UTC - in response to Message 876253.

I would like to see a response from one of the actual seti team about this, with a positive message that they are at least aware of the issues and they are going to try to fix it.


I have invited Project Administrator Dr. Eric Korpela to visit this thread and offer his input.


Thanks for that OzzFan, and thank you too to Dr. Eric Korpela for the reply.

At least if the issue is known about, a solution can hopefully be worked to suit the project and the crunchers.

Zydor
Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876314 - Posted: 16 Mar 2009, 23:09:08 UTC - in response to Message 876307.
Last modified: 16 Mar 2009, 23:24:50 UTC

I am not aware of the coding required for such a routine database task in SETI not being familiar with the code concerned, although I can understand the potential load such a daily database search task could impose.

However, on the face of it, it does seem that when a WU arrives it goes through a series of (whatever) criteria before being allocated the (in old terminology) "Pending" status. Whilst my knowledge of the SETI code is whoefully inadequate it would seem on the face of it a relatively simple and nearly load free alternative, is to insert a few lines that kicks off whatever routine supplies credit, immediately after the inital (existing) checks on initial submission are completed. There would appear to be be little additional performance impact of such a minor change.

Its just a question of the Policy to be adopted. I suspect the commented out code was originally inserted in the way it was to conform to a Policy decision, if the Policy changes, then the drivers behind the original design, and therefore the solution to be adopted, will also change.

Regards
Zy
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2264
Credit: 8,666,051
RAC: 4,242
United States
Message 876346 - Posted: 17 Mar 2009, 0:45:08 UTC - in response to Message 876314.

I am not aware of the coding required for such a routine database task in SETI not being familiar with the code concerned, although I can understand the potential load such a daily database search task could impose.
...

I think the original implementation of this script/code was for the AP435/500 changeover. The two would not validate against each other, and I got burned out of five or six WUs myself. I processed the tasks completely legitimately with 435 before 500 was released, my wingmate timed out, the task was resent and done with 500, then a fourth task was needed, that was done with 500, and I didn't get any credit.

The script ran through and checked for that particular situation and granted credit where credit should have been given. It was very common at first, obviously, but then it became less common, and the script stopped running at some point. I was under the impression that it was run manually either daily, or when a PM was read requesting that it be run.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Zydor
Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876362 - Posted: 17 Mar 2009, 1:17:15 UTC - in response to Message 876346.
Last modified: 17 Mar 2009, 1:20:40 UTC

There is I believe an additional driver connected with this, the whole question of (what was described as) "Pending" for AP. The initial roll out problems are well known during the "great storm" a few weeks back. The technical reason for the immediate effect has been taken apart and analysed to the Nth degree, and possible solutions discussed. I think however it goes further than that, and is also affected by the proposed change above.

Its certainly true that the Storm was caused by various mechanisms used in the initial fielding of AP to everyone. It has been assumed that once the block of APs that were "zapped" by the issues are timed out and reissued, the problem of massive pendings will eventually receed to managible levels, the current trend to the next "Millionaire" in "Pendings", whilst the subject of good natured comment, has its serious side if it remains this way.

Its true immediate pendings can be traced to the Storm, and naturaly attentioned focused on that. However an "equal" number (crudely) - if not more - WUs were subsequently issued post "Storm" - on top of all the ones that did eventually succeeed on re-issue. Its not just a case of those that failed to go out, and therefore the total population of WUs involved in all this causing the "queue" is vastly greater than "ex Storm".

We all await the return of the ex-storm reissued WUs, however there will be at least an equal - likely greater - number of APs out there. As painful as it was during the Storm, the length of time taken to Crunch the AP unit will equally cause a permanent long queue of "Pendings", and I dont think they will in fact diminish as the new dynamic of the total populatiuon of WUs out there far outweighs the numbers affected by the Storm. Need a Statistician here (!), but I strongly suspect if this was modelled, the current growing AP queue of "millionaires" waiting wingmen is set to grow, not reduce ...... thats bad long term news.

If the change above is approved then this additional factor will go away at a stroke. If its not approved, I suggest a proper modelling of the AP flow is done to verify or otherwise the potential long term size of validation queue of "Pendings" before a final decision is taken on the other issue above. I have a sneeky feeling a "gotcha" lurks out there re the AP "Pending" queue if we are not careful. The MB queue is bad enough, I have a nasty feeling the AP queue has potential to be a far greater problem than is realised on a long term view.....

Regards
Zy
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2264
Credit: 8,666,051
RAC: 4,242
United States
Message 876380 - Posted: 17 Mar 2009, 2:00:10 UTC

I'm still waiting to see the day of two of my hosts being wingmen for the same WU. I know there are measures in place to keep the same host from being a wingman (even for reissues), but I don't know if it applies to a whole user account, or just one host...

If it's the whole account, I can quit waiting to see that day. :p I've always been under the impression that it was just for a host and not a user.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Zydor
Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876388 - Posted: 17 Mar 2009, 2:25:22 UTC - in response to Message 876380.

An illustration of where I'm coming from - and we would need a proper Flow of WUs to define it correctly. However it illustrates the point sufficiently to model this properly. Take an "average" slice of 100 machines:

- 25 are high end taking a median 12hrs to crunch it
- 25 mid range taking 18 hrs to crunch it
- 50 low end taking 4 days to crunch it

Total crunched in a four day period:
High End: 200 APs
Mid Range 134 APs
Low End: 50 APs

Essentially - in a crude sense - 200 APs high end are fed by 184 APs. Thats too close to call, now add three zeros and the size of the queue in orders of magnitude become potentially worrying.

Its obviously far more complex than this, and at this stage any set of figures can "prove" anything without too much difficulty without proper modelling. It does however raise doubt, and I believe the actual growing queue illustrates this, because when the whole Enterprise gets crunching as it now is, its irrelvant whether they crunched a "storm" AP or not. In the physical model of crunching, an AP WU is an AP WU, doesnt matter where it came from.

The only question is will the output of the High End & Mid Range, be matched by Mid Range/Low End. Its not so far ...... and I dont see anything that will change the physical dynamic. The queue is set to grow and needs modelling properly if the change proposed in this thread is not implemented to make sure all is well.

I dont reckon it is, but over to a proper statistical model to confirm that. The downside potential for SETI reputation is considerable if this bad case scenario is shown to be true without corrective action.

* crawls back in me box to read reaction with interest *

Regards
Zy
____________

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 992
Credit: 207,654,737
RAC: 0
United States
Message 876403 - Posted: 17 Mar 2009, 2:54:03 UTC
Last modified: 17 Mar 2009, 2:59:15 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=419322121
5 errors... but i'm not one of the unlucky, only 6 hours invested... my wing man is new and has never, ever, returned a result...
____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4244
Credit: 1,047,369
RAC: 275
United States
Message 876438 - Posted: 17 Mar 2009, 3:54:34 UTC - in response to Message 876388.

An illustration of where I'm coming from - and we would need a proper Flow of WUs to define it correctly. However it illustrates the point sufficiently to model this properly. Take an "average" slice of 100 machines:

- 25 are high end taking a median 12hrs to crunch it
- 25 mid range taking 18 hrs to crunch it
- 50 low end taking 4 days to crunch it

Multiply by about 4 since most participants do not run optimized applications. Awhile back Eric did note that the median machine was dual core, but not whether it was Pentium D or more recent.

Total crunched in a four day period:
High End: 200 APs
Mid Range 134 APs
Low End: 50 APs

Essentially - in a crude sense - 200 APs high end are fed by 184 APs. Thats too close to call, now add three zeros and the size of the queue in orders of magnitude become potentially worrying.

Its obviously far more complex than this, and at this stage any set of figures can "prove" anything without too much difficulty without proper modelling. It does however raise doubt, and I believe the actual growing queue illustrates this, because when the whole Enterprise gets crunching as it now is, its irrelvant whether they crunched a "storm" AP or not. In the physical model of crunching, an AP WU is an AP WU, doesnt matter where it came from.

The only question is will the output of the High End & Mid Range, be matched by Mid Range/Low End. Its not so far ...... and I dont see anything that will change the physical dynamic. The queue is set to grow and needs modelling properly if the change proposed in this thread is not implemented to make sure all is well.

As I read the change proposal it just dealt with granting credit without checking results, that's certainly possible and might entice some users to remain who would otherwise quit. It would also tempt some to do stupid things in quest of more credits. But putting doubtful results in the master science database is a different issue, I hope the project doesn't have to degrade the science to that extent. Two results with strongly similar results seems minimally acceptable checking to me.

I dont reckon it is, but over to a proper statistical model to confirm that. The downside potential for SETI reputation is considerable if this bad case scenario is shown to be true without corrective action.

* crawls back in me box to read reaction with interest *

Regards
Zy

The average turnaround for Astropulse work is 165.26 hours at the last reading, slightly less than one week. Many of the top hosts run a large queue and take even longer than the average, most participants simply install BOINC, attach to the project, and run with the small defaults. Then there are hosts which ask for 3 minutes of work and some flaw in the server code delivers 20 AP WUs. All those things affect the amount of data storage the project needs to support the amount of work in flight.

I don't know of anything which will resolve all the issues. I simply consider it a privilege to help reduce data which could possibly prove existence of technological extraterrestrial aliens.
Joe

Zydor
Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876447 - Posted: 17 Mar 2009, 4:28:08 UTC - in response to Message 876438.

I agree re the science, and the proposed change will not affect that at all. The matching of results and quality etc can still take place, reissuing as now where relevant.

The only change proposed is credits are given up front, WU validation in reality, in terms of the science, would be unaffected as it remains as is now.

Regards
Zy
____________

gomeyer
Volunteer tester
Send message
Joined: 21 May 99
Posts: 488
Credit: 50,157,953
RAC: 0
United States
Message 877446 - Posted: 20 Mar 2009, 13:34:02 UTC
Last modified: 20 Mar 2009, 13:45:28 UTC

"Completed, can't validate"
OK, anyone interested can keep an eye on this zero credit result to see if Eric's script was turned on. It was returned about 20 minutes ago as of this posting.

Workunit 423265213
Task 1183265607

[edit]
WOW !!! That was fast. It was granted credit within minutes after being returned. I guess we can consider that a confirmation. Well done.
[/edit]

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,503,935
RAC: 11,160
United States
Message 877447 - Posted: 20 Mar 2009, 13:46:23 UTC - in response to Message 877446.

This one?

1183265607 4533413 11 Mar 2009 21:49:57 UTC 20 Mar 2009 13:12:35 UTC Completed, can't validate 46,216.39 1,226.88 1,226.88

Looks like you were awarded credit for it.
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Kinguni
Volunteer tester
Avatar
Send message
Joined: 15 Feb 00
Posts: 239
Credit: 9,043,007
RAC: 0
Canada
Message 877484 - Posted: 20 Mar 2009, 16:27:17 UTC

Very cool.
____________
Join Team Starfire
BOINC Chat

Profile Ministry of Disinformation
Send message
Joined: 19 Sep 06
Posts: 8
Credit: 16,863,028
RAC: 1
United Kingdom
Message 877523 - Posted: 20 Mar 2009, 18:02:06 UTC - in response to Message 877484.

At least there's been a positive outcome it would seem.

I know it doesn't help with the one I lost, but it stops it happening again, to either myself or other people. That's a good result in my opinion.

It's nice to know that we are listened to. :)

Profile Zeus Fab3r
Avatar
Send message
Joined: 17 Jan 01
Posts: 642
Credit: 95,157,291
RAC: 131,100
Serbia
Message 904129 - Posted: 5 Jun 2009, 23:13:04 UTC

Well, listen to this...
Recently, I've noticed that one of my WU's is on the edge of being wasted,
because of my 'trusty' wingmans. That eventualy happened few minutes ago ;(

http://setiathome.berkeley.edu/workunit.php?wuid=422266081

Big question is, how that WU get Completed, can't validate status,
when my last wingman is nowhere near his deadline for finishing his part?
Can this project predict that my wingy's result will end up with an error? :)

Regards, ZF
____________

Who the hell is General Failure and why is he reading my harddisk?¿

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8491
Credit: 49,775,474
RAC: 54,312
United Kingdom
Message 904296 - Posted: 6 Jun 2009, 8:55:02 UTC - in response to Message 904129.

Well, listen to this...
Recently, I've noticed that one of my WU's is on the edge of being wasted,
because of my 'trusty' wingmans. That eventualy happened few minutes ago ;(

http://setiathome.berkeley.edu/workunit.php?wuid=422266081

Big question is, how that WU get Completed, can't validate status,
when my last wingman is nowhere near his deadline for finishing his part?
Can this project predict that my wingy's result will end up with an error? :)

Regards, ZF

In this case, it's because of the two lines above the result table:

max # of error/total/success tasks 5, 10, 10
errors Too many error results

It was probably the guy that aborted it on 5 June (after almost 7 weeks on a PIII at 522 MFlops) that did the damage.

Alinator
Volunteer tester
Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 904365 - Posted: 6 Jun 2009, 15:33:49 UTC
Last modified: 6 Jun 2009, 15:37:21 UTC

Hmmmm...

Here's the really unfortunate part.

The remaining wingman looks like he's wasting his time running the task as well.

Also, I reviewed the thread and ingleside said 221's were disabled, but I was under the impression that 221's were enabled and it was auto resend lost work which is the DB performance killer (not that it would have made a difference here).

In any event, this looks like a case where the project should send an unconditional abort to the host the next time it contacts the project (but probably won't).

Alinator

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4244
Credit: 1,047,369
RAC: 275
United States
Message 904395 - Posted: 6 Jun 2009, 17:24:03 UTC

The reason it "can't validate" is simply there isn't another successful result yet. If the last wingmate reports a success by 27 Jun 2009 6:49:09 UTC then validation will be possible.

The "Too many error results" state merely keeps the transitioner from creating any more to send out.

Joe

Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Completed, can't validate

Copyright © 2014 University of California