Completed, can't validate

Message boards : Number crunching : Completed, can't validate
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 876253 - Posted: 16 Mar 2009, 20:36:46 UTC - in response to Message 876176.  

I would like to see a response from one of the actual seti team about this, with a positive message that they are at least aware of the issues and they are going to try to fix it.


I have invited Project Administrator Dr. Eric Korpela to visit this thread and offer his input.
ID: 876253 · Report as offensive
Zydor

Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876297 - Posted: 16 Mar 2009, 22:30:39 UTC - in response to Message 876253.  
Last modified: 16 Mar 2009, 22:45:15 UTC

I think this is a case where the Validity of the science needs to bend a little to the needs of the Cruncher. Astropulse has brought in the "new" genre (to SETI) of a long(er) WU. The dynamics of the the whole credit award changes with the longer WU.

Before going further, let me be crystal clear - we cannot allow an unvalidated result to be assimilated as a done deal - end of story.

However..... we also cant get into the situation where a Cruncher has in good faith beavered away, sometimes for a few days with lower powered beasties, ending up with a result thats good then is dumped, no credits merely because he/she was unlucky enough to hit the above scenario. Rare it maybe, however each time it happens we could loose a (rightly) disgruntled Cruncher.

The successful Cruncher should be given the credit on submission of the WU with the longer AP WUs, validation (for credit purposes) should not be necessary. The tracking of the WU meanwhile should continue, and the Project decides what to do with it guided by its own set of agreed internal criteria. That scenario decouples the Cruncher from a situation that is not of their making, and provides no less of a problem for the SETI Team than under the current system - ie what to do with the errant WU. This time however we do not loose a Cruncher, and no increase in problem is faced by the science - its a win/win.

This has been faced by many Projects with long WUs, notably ClimatePrediction where credit is not tied to validation. Whilst its clear their dilemae is far worse, and the instant credit on submission irrefutable after 3 months of crunching one WU, the principle is the same.

AP is a tipping point for SETI. To do this for MB makes no sense, its so rare that a few minutes on an MB WU lost will not cause major grief. AP raises new "culture" questions in how credit is handled, past "best practice" no longer applies as we will loose Crunchers at a steady rate, at no additional increase in quality of the science. The latter is a bad trade off. There is a risk that such a system could allow credit cheats in without validation, but I dont think this would be a reality, the length of the WU will assist greatly in inhibiting that scenario. In any case the basis of most "cheats" has no baring on this issue, they have accomplished the cheat using other means (avoiding giving publicity to the detail of any such method).

SETI should impliment instant credit on submission of the longer type of WU. Its a mindset change, has no penalty for the Science, but does have a big guesture of good faith to the Cruncher. Without the Cruncher, SETI goes no where. Its a small change with substantial effect and no change to the science validity or otherwise of the errant WUs.

Regards
Zy
ID: 876297 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 876307 - Posted: 16 Mar 2009, 22:48:18 UTC - in response to Message 876253.  

We used to run a script daily to grant credit for cases where a workunit errors out, but it appears that was disabled some time ago. I'll re-enable it temporarily and ask Matt and Jeff why we disabled it. Probably because of database speed issue.

Eric
@SETIEric@qoto.org (Mastodon)

ID: 876307 · Report as offensive
Profile Ministry of Disinformation

Send message
Joined: 19 Sep 06
Posts: 8
Credit: 17,475,791
RAC: 0
United Kingdom
Message 876313 - Posted: 16 Mar 2009, 23:07:31 UTC - in response to Message 876253.  

I would like to see a response from one of the actual seti team about this, with a positive message that they are at least aware of the issues and they are going to try to fix it.


I have invited Project Administrator Dr. Eric Korpela to visit this thread and offer his input.


Thanks for that OzzFan, and thank you too to Dr. Eric Korpela for the reply.

At least if the issue is known about, a solution can hopefully be worked to suit the project and the crunchers.
ID: 876313 · Report as offensive
Zydor

Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876314 - Posted: 16 Mar 2009, 23:09:08 UTC - in response to Message 876307.  
Last modified: 16 Mar 2009, 23:24:50 UTC

I am not aware of the coding required for such a routine database task in SETI not being familiar with the code concerned, although I can understand the potential load such a daily database search task could impose.

However, on the face of it, it does seem that when a WU arrives it goes through a series of (whatever) criteria before being allocated the (in old terminology) "Pending" status. Whilst my knowledge of the SETI code is whoefully inadequate it would seem on the face of it a relatively simple and nearly load free alternative, is to insert a few lines that kicks off whatever routine supplies credit, immediately after the inital (existing) checks on initial submission are completed. There would appear to be be little additional performance impact of such a minor change.

Its just a question of the Policy to be adopted. I suspect the commented out code was originally inserted in the way it was to conform to a Policy decision, if the Policy changes, then the drivers behind the original design, and therefore the solution to be adopted, will also change.

Regards
Zy
ID: 876314 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 876346 - Posted: 17 Mar 2009, 0:45:08 UTC - in response to Message 876314.  

I am not aware of the coding required for such a routine database task in SETI not being familiar with the code concerned, although I can understand the potential load such a daily database search task could impose.
...

I think the original implementation of this script/code was for the AP435/500 changeover. The two would not validate against each other, and I got burned out of five or six WUs myself. I processed the tasks completely legitimately with 435 before 500 was released, my wingmate timed out, the task was resent and done with 500, then a fourth task was needed, that was done with 500, and I didn't get any credit.

The script ran through and checked for that particular situation and granted credit where credit should have been given. It was very common at first, obviously, but then it became less common, and the script stopped running at some point. I was under the impression that it was run manually either daily, or when a PM was read requesting that it be run.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 876346 · Report as offensive
Zydor

Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876362 - Posted: 17 Mar 2009, 1:17:15 UTC - in response to Message 876346.  
Last modified: 17 Mar 2009, 1:20:40 UTC

There is I believe an additional driver connected with this, the whole question of (what was described as) "Pending" for AP. The initial roll out problems are well known during the "great storm" a few weeks back. The technical reason for the immediate effect has been taken apart and analysed to the Nth degree, and possible solutions discussed. I think however it goes further than that, and is also affected by the proposed change above.

Its certainly true that the Storm was caused by various mechanisms used in the initial fielding of AP to everyone. It has been assumed that once the block of APs that were "zapped" by the issues are timed out and reissued, the problem of massive pendings will eventually receed to managible levels, the current trend to the next "Millionaire" in "Pendings", whilst the subject of good natured comment, has its serious side if it remains this way.

Its true immediate pendings can be traced to the Storm, and naturaly attentioned focused on that. However an "equal" number (crudely) - if not more - WUs were subsequently issued post "Storm" - on top of all the ones that did eventually succeeed on re-issue. Its not just a case of those that failed to go out, and therefore the total population of WUs involved in all this causing the "queue" is vastly greater than "ex Storm".

We all await the return of the ex-storm reissued WUs, however there will be at least an equal - likely greater - number of APs out there. As painful as it was during the Storm, the length of time taken to Crunch the AP unit will equally cause a permanent long queue of "Pendings", and I dont think they will in fact diminish as the new dynamic of the total populatiuon of WUs out there far outweighs the numbers affected by the Storm. Need a Statistician here (!), but I strongly suspect if this was modelled, the current growing AP queue of "millionaires" waiting wingmen is set to grow, not reduce ...... thats bad long term news.

If the change above is approved then this additional factor will go away at a stroke. If its not approved, I suggest a proper modelling of the AP flow is done to verify or otherwise the potential long term size of validation queue of "Pendings" before a final decision is taken on the other issue above. I have a sneeky feeling a "gotcha" lurks out there re the AP "Pending" queue if we are not careful. The MB queue is bad enough, I have a nasty feeling the AP queue has potential to be a far greater problem than is realised on a long term view.....

Regards
Zy
ID: 876362 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 876380 - Posted: 17 Mar 2009, 2:00:10 UTC

I'm still waiting to see the day of two of my hosts being wingmen for the same WU. I know there are measures in place to keep the same host from being a wingman (even for reissues), but I don't know if it applies to a whole user account, or just one host...

If it's the whole account, I can quit waiting to see that day. :p I've always been under the impression that it was just for a host and not a user.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 876380 · Report as offensive
Zydor

Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876388 - Posted: 17 Mar 2009, 2:25:22 UTC - in response to Message 876380.  

An illustration of where I'm coming from - and we would need a proper Flow of WUs to define it correctly. However it illustrates the point sufficiently to model this properly. Take an "average" slice of 100 machines:

- 25 are high end taking a median 12hrs to crunch it
- 25 mid range taking 18 hrs to crunch it
- 50 low end taking 4 days to crunch it

Total crunched in a four day period:
High End: 200 APs
Mid Range 134 APs
Low End: 50 APs

Essentially - in a crude sense - 200 APs high end are fed by 184 APs. Thats too close to call, now add three zeros and the size of the queue in orders of magnitude become potentially worrying.

Its obviously far more complex than this, and at this stage any set of figures can "prove" anything without too much difficulty without proper modelling. It does however raise doubt, and I believe the actual growing queue illustrates this, because when the whole Enterprise gets crunching as it now is, its irrelvant whether they crunched a "storm" AP or not. In the physical model of crunching, an AP WU is an AP WU, doesnt matter where it came from.

The only question is will the output of the High End & Mid Range, be matched by Mid Range/Low End. Its not so far ...... and I dont see anything that will change the physical dynamic. The queue is set to grow and needs modelling properly if the change proposed in this thread is not implemented to make sure all is well.

I dont reckon it is, but over to a proper statistical model to confirm that. The downside potential for SETI reputation is considerable if this bad case scenario is shown to be true without corrective action.

* crawls back in me box to read reaction with interest *

Regards
Zy
ID: 876388 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 876403 - Posted: 17 Mar 2009, 2:54:03 UTC
Last modified: 17 Mar 2009, 2:59:15 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=419322121
5 errors... but i'm not one of the unlucky, only 6 hours invested... my wing man is new and has never, ever, returned a result...
ID: 876403 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 876438 - Posted: 17 Mar 2009, 3:54:34 UTC - in response to Message 876388.  

An illustration of where I'm coming from - and we would need a proper Flow of WUs to define it correctly. However it illustrates the point sufficiently to model this properly. Take an "average" slice of 100 machines:

- 25 are high end taking a median 12hrs to crunch it
- 25 mid range taking 18 hrs to crunch it
- 50 low end taking 4 days to crunch it

Multiply by about 4 since most participants do not run optimized applications. Awhile back Eric did note that the median machine was dual core, but not whether it was Pentium D or more recent.

Total crunched in a four day period:
High End: 200 APs
Mid Range 134 APs
Low End: 50 APs

Essentially - in a crude sense - 200 APs high end are fed by 184 APs. Thats too close to call, now add three zeros and the size of the queue in orders of magnitude become potentially worrying.

Its obviously far more complex than this, and at this stage any set of figures can "prove" anything without too much difficulty without proper modelling. It does however raise doubt, and I believe the actual growing queue illustrates this, because when the whole Enterprise gets crunching as it now is, its irrelvant whether they crunched a "storm" AP or not. In the physical model of crunching, an AP WU is an AP WU, doesnt matter where it came from.

The only question is will the output of the High End & Mid Range, be matched by Mid Range/Low End. Its not so far ...... and I dont see anything that will change the physical dynamic. The queue is set to grow and needs modelling properly if the change proposed in this thread is not implemented to make sure all is well.

As I read the change proposal it just dealt with granting credit without checking results, that's certainly possible and might entice some users to remain who would otherwise quit. It would also tempt some to do stupid things in quest of more credits. But putting doubtful results in the master science database is a different issue, I hope the project doesn't have to degrade the science to that extent. Two results with strongly similar results seems minimally acceptable checking to me.

I dont reckon it is, but over to a proper statistical model to confirm that. The downside potential for SETI reputation is considerable if this bad case scenario is shown to be true without corrective action.

* crawls back in me box to read reaction with interest *

Regards
Zy

The average turnaround for Astropulse work is 165.26 hours at the last reading, slightly less than one week. Many of the top hosts run a large queue and take even longer than the average, most participants simply install BOINC, attach to the project, and run with the small defaults. Then there are hosts which ask for 3 minutes of work and some flaw in the server code delivers 20 AP WUs. All those things affect the amount of data storage the project needs to support the amount of work in flight.

I don't know of anything which will resolve all the issues. I simply consider it a privilege to help reduce data which could possibly prove existence of technological extraterrestrial aliens.
                                                                Joe
ID: 876438 · Report as offensive
Zydor

Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 876447 - Posted: 17 Mar 2009, 4:28:08 UTC - in response to Message 876438.  

I agree re the science, and the proposed change will not affect that at all. The matching of results and quality etc can still take place, reissuing as now where relevant.

The only change proposed is credits are given up front, WU validation in reality, in terms of the science, would be unaffected as it remains as is now.

Regards
Zy
ID: 876447 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 877446 - Posted: 20 Mar 2009, 13:34:02 UTC
Last modified: 20 Mar 2009, 13:45:28 UTC

"Completed, can't validate"
OK, anyone interested can keep an eye on this zero credit result to see if Eric's script was turned on. It was returned about 20 minutes ago as of this posting.

Workunit 423265213
Task 1183265607

[edit]
WOW !!! That was fast. It was granted credit within minutes after being returned. I guess we can consider that a confirmation. Well done.
[/edit]
ID: 877446 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 877447 - Posted: 20 Mar 2009, 13:46:23 UTC - in response to Message 877446.  

This one?

1183265607 4533413 11 Mar 2009 21:49:57 UTC 20 Mar 2009 13:12:35 UTC Completed, can't validate 46,216.39 1,226.88 1,226.88

Looks like you were awarded credit for it.


PROUD MEMBER OF Team Starfire World BOINC
ID: 877447 · Report as offensive
Profile Kinguni
Volunteer tester
Avatar

Send message
Joined: 15 Feb 00
Posts: 239
Credit: 9,043,007
RAC: 0
Canada
Message 877484 - Posted: 20 Mar 2009, 16:27:17 UTC

ID: 877484 · Report as offensive
Profile Ministry of Disinformation

Send message
Joined: 19 Sep 06
Posts: 8
Credit: 17,475,791
RAC: 0
United Kingdom
Message 877523 - Posted: 20 Mar 2009, 18:02:06 UTC - in response to Message 877484.  

At least there's been a positive outcome it would seem.

I know it doesn't help with the one I lost, but it stops it happening again, to either myself or other people. That's a good result in my opinion.

It's nice to know that we are listened to. :)
ID: 877523 · Report as offensive
Profile Zeus Fab3r
Avatar

Send message
Joined: 17 Jan 01
Posts: 649
Credit: 275,335,635
RAC: 597
Serbia
Message 904129 - Posted: 5 Jun 2009, 23:13:04 UTC

Well, listen to this...
Recently, I've noticed that one of my WU's is on the edge of being wasted,
because of my 'trusty' wingmans. That eventualy happened few minutes ago ;(

http://setiathome.berkeley.edu/workunit.php?wuid=422266081

Big question is, how that WU get Completed, can't validate status,
when my last wingman is nowhere near his deadline for finishing his part?
Can this project predict that my wingy's result will end up with an error? :)

Regards, ZF

Who the hell is General Failure and why is he reading my harddisk?¿
ID: 904129 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 904296 - Posted: 6 Jun 2009, 8:55:02 UTC - in response to Message 904129.  

Well, listen to this...
Recently, I've noticed that one of my WU's is on the edge of being wasted,
because of my 'trusty' wingmans. That eventualy happened few minutes ago ;(

http://setiathome.berkeley.edu/workunit.php?wuid=422266081

Big question is, how that WU get Completed, can't validate status,
when my last wingman is nowhere near his deadline for finishing his part?
Can this project predict that my wingy's result will end up with an error? :)

Regards, ZF

In this case, it's because of the two lines above the result table:

max # of error/total/success tasks 5, 10, 10 
errors Too many error results  

It was probably the guy that aborted it on 5 June (after almost 7 weeks on a PIII at 522 MFlops) that did the damage.
ID: 904296 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 904365 - Posted: 6 Jun 2009, 15:33:49 UTC
Last modified: 6 Jun 2009, 15:37:21 UTC

Hmmmm...

Here's the really unfortunate part.

The remaining wingman looks like he's wasting his time running the task as well.

Also, I reviewed the thread and ingleside said 221's were disabled, but I was under the impression that 221's were enabled and it was auto resend lost work which is the DB performance killer (not that it would have made a difference here).

In any event, this looks like a case where the project should send an unconditional abort to the host the next time it contacts the project (but probably won't).

Alinator
ID: 904365 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 904395 - Posted: 6 Jun 2009, 17:24:03 UTC

The reason it "can't validate" is simply there isn't another successful result yet. If the last wingmate reports a success by 27 Jun 2009 6:49:09 UTC then validation will be possible.

The "Too many error results" state merely keeps the transitioner from creating any more to send out.
                                                                Joe
ID: 904395 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Completed, can't validate


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.