Message boards :
Number crunching :
Completed, can't validate
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I would like to see a response from one of the actual seti team about this, with a positive message that they are at least aware of the issues and they are going to try to fix it. I have invited Project Administrator Dr. Eric Korpela to visit this thread and offer his input. |
Zydor Send message Joined: 4 Oct 03 Posts: 172 Credit: 491,111 RAC: 0 |
I think this is a case where the Validity of the science needs to bend a little to the needs of the Cruncher. Astropulse has brought in the "new" genre (to SETI) of a long(er) WU. The dynamics of the the whole credit award changes with the longer WU. Before going further, let me be crystal clear - we cannot allow an unvalidated result to be assimilated as a done deal - end of story. However..... we also cant get into the situation where a Cruncher has in good faith beavered away, sometimes for a few days with lower powered beasties, ending up with a result thats good then is dumped, no credits merely because he/she was unlucky enough to hit the above scenario. Rare it maybe, however each time it happens we could loose a (rightly) disgruntled Cruncher. The successful Cruncher should be given the credit on submission of the WU with the longer AP WUs, validation (for credit purposes) should not be necessary. The tracking of the WU meanwhile should continue, and the Project decides what to do with it guided by its own set of agreed internal criteria. That scenario decouples the Cruncher from a situation that is not of their making, and provides no less of a problem for the SETI Team than under the current system - ie what to do with the errant WU. This time however we do not loose a Cruncher, and no increase in problem is faced by the science - its a win/win. This has been faced by many Projects with long WUs, notably ClimatePrediction where credit is not tied to validation. Whilst its clear their dilemae is far worse, and the instant credit on submission irrefutable after 3 months of crunching one WU, the principle is the same. AP is a tipping point for SETI. To do this for MB makes no sense, its so rare that a few minutes on an MB WU lost will not cause major grief. AP raises new "culture" questions in how credit is handled, past "best practice" no longer applies as we will loose Crunchers at a steady rate, at no additional increase in quality of the science. The latter is a bad trade off. There is a risk that such a system could allow credit cheats in without validation, but I dont think this would be a reality, the length of the WU will assist greatly in inhibiting that scenario. In any case the basis of most "cheats" has no baring on this issue, they have accomplished the cheat using other means (avoiding giving publicity to the detail of any such method). SETI should impliment instant credit on submission of the longer type of WU. Its a mindset change, has no penalty for the Science, but does have a big guesture of good faith to the Cruncher. Without the Cruncher, SETI goes no where. Its a small change with substantial effect and no change to the science validity or otherwise of the errant WUs. Regards Zy |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
We used to run a script daily to grant credit for cases where a workunit errors out, but it appears that was disabled some time ago. I'll re-enable it temporarily and ask Matt and Jeff why we disabled it. Probably because of database speed issue. Eric @SETIEric@qoto.org (Mastodon) |
Ministry of Disinformation Send message Joined: 19 Sep 06 Posts: 8 Credit: 17,475,791 RAC: 0 |
I would like to see a response from one of the actual seti team about this, with a positive message that they are at least aware of the issues and they are going to try to fix it. Thanks for that OzzFan, and thank you too to Dr. Eric Korpela for the reply. At least if the issue is known about, a solution can hopefully be worked to suit the project and the crunchers. |
Zydor Send message Joined: 4 Oct 03 Posts: 172 Credit: 491,111 RAC: 0 |
I am not aware of the coding required for such a routine database task in SETI not being familiar with the code concerned, although I can understand the potential load such a daily database search task could impose. However, on the face of it, it does seem that when a WU arrives it goes through a series of (whatever) criteria before being allocated the (in old terminology) "Pending" status. Whilst my knowledge of the SETI code is whoefully inadequate it would seem on the face of it a relatively simple and nearly load free alternative, is to insert a few lines that kicks off whatever routine supplies credit, immediately after the inital (existing) checks on initial submission are completed. There would appear to be be little additional performance impact of such a minor change. Its just a question of the Policy to be adopted. I suspect the commented out code was originally inserted in the way it was to conform to a Policy decision, if the Policy changes, then the drivers behind the original design, and therefore the solution to be adopted, will also change. Regards Zy |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I am not aware of the coding required for such a routine database task in SETI not being familiar with the code concerned, although I can understand the potential load such a daily database search task could impose. I think the original implementation of this script/code was for the AP435/500 changeover. The two would not validate against each other, and I got burned out of five or six WUs myself. I processed the tasks completely legitimately with 435 before 500 was released, my wingmate timed out, the task was resent and done with 500, then a fourth task was needed, that was done with 500, and I didn't get any credit. The script ran through and checked for that particular situation and granted credit where credit should have been given. It was very common at first, obviously, but then it became less common, and the script stopped running at some point. I was under the impression that it was run manually either daily, or when a PM was read requesting that it be run. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Zydor Send message Joined: 4 Oct 03 Posts: 172 Credit: 491,111 RAC: 0 |
There is I believe an additional driver connected with this, the whole question of (what was described as) "Pending" for AP. The initial roll out problems are well known during the "great storm" a few weeks back. The technical reason for the immediate effect has been taken apart and analysed to the Nth degree, and possible solutions discussed. I think however it goes further than that, and is also affected by the proposed change above. Its certainly true that the Storm was caused by various mechanisms used in the initial fielding of AP to everyone. It has been assumed that once the block of APs that were "zapped" by the issues are timed out and reissued, the problem of massive pendings will eventually receed to managible levels, the current trend to the next "Millionaire" in "Pendings", whilst the subject of good natured comment, has its serious side if it remains this way. Its true immediate pendings can be traced to the Storm, and naturaly attentioned focused on that. However an "equal" number (crudely) - if not more - WUs were subsequently issued post "Storm" - on top of all the ones that did eventually succeeed on re-issue. Its not just a case of those that failed to go out, and therefore the total population of WUs involved in all this causing the "queue" is vastly greater than "ex Storm". We all await the return of the ex-storm reissued WUs, however there will be at least an equal - likely greater - number of APs out there. As painful as it was during the Storm, the length of time taken to Crunch the AP unit will equally cause a permanent long queue of "Pendings", and I dont think they will in fact diminish as the new dynamic of the total populatiuon of WUs out there far outweighs the numbers affected by the Storm. Need a Statistician here (!), but I strongly suspect if this was modelled, the current growing AP queue of "millionaires" waiting wingmen is set to grow, not reduce ...... thats bad long term news. If the change above is approved then this additional factor will go away at a stroke. If its not approved, I suggest a proper modelling of the AP flow is done to verify or otherwise the potential long term size of validation queue of "Pendings" before a final decision is taken on the other issue above. I have a sneeky feeling a "gotcha" lurks out there re the AP "Pending" queue if we are not careful. The MB queue is bad enough, I have a nasty feeling the AP queue has potential to be a far greater problem than is realised on a long term view..... Regards Zy |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I'm still waiting to see the day of two of my hosts being wingmen for the same WU. I know there are measures in place to keep the same host from being a wingman (even for reissues), but I don't know if it applies to a whole user account, or just one host... If it's the whole account, I can quit waiting to see that day. :p I've always been under the impression that it was just for a host and not a user. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Zydor Send message Joined: 4 Oct 03 Posts: 172 Credit: 491,111 RAC: 0 |
An illustration of where I'm coming from - and we would need a proper Flow of WUs to define it correctly. However it illustrates the point sufficiently to model this properly. Take an "average" slice of 100 machines: - 25 are high end taking a median 12hrs to crunch it - 25 mid range taking 18 hrs to crunch it - 50 low end taking 4 days to crunch it Total crunched in a four day period: High End: 200 APs Mid Range 134 APs Low End: 50 APs Essentially - in a crude sense - 200 APs high end are fed by 184 APs. Thats too close to call, now add three zeros and the size of the queue in orders of magnitude become potentially worrying. Its obviously far more complex than this, and at this stage any set of figures can "prove" anything without too much difficulty without proper modelling. It does however raise doubt, and I believe the actual growing queue illustrates this, because when the whole Enterprise gets crunching as it now is, its irrelvant whether they crunched a "storm" AP or not. In the physical model of crunching, an AP WU is an AP WU, doesnt matter where it came from. The only question is will the output of the High End & Mid Range, be matched by Mid Range/Low End. Its not so far ...... and I dont see anything that will change the physical dynamic. The queue is set to grow and needs modelling properly if the change proposed in this thread is not implemented to make sure all is well. I dont reckon it is, but over to a proper statistical model to confirm that. The downside potential for SETI reputation is considerable if this bad case scenario is shown to be true without corrective action. * crawls back in me box to read reaction with interest * Regards Zy |
RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0 |
http://setiathome.berkeley.edu/workunit.php?wuid=419322121 5 errors... but i'm not one of the unlucky, only 6 hours invested... my wing man is new and has never, ever, returned a result... |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
An illustration of where I'm coming from - and we would need a proper Flow of WUs to define it correctly. However it illustrates the point sufficiently to model this properly. Take an "average" slice of 100 machines: Multiply by about 4 since most participants do not run optimized applications. Awhile back Eric did note that the median machine was dual core, but not whether it was Pentium D or more recent. Total crunched in a four day period: As I read the change proposal it just dealt with granting credit without checking results, that's certainly possible and might entice some users to remain who would otherwise quit. It would also tempt some to do stupid things in quest of more credits. But putting doubtful results in the master science database is a different issue, I hope the project doesn't have to degrade the science to that extent. Two results with strongly similar results seems minimally acceptable checking to me. I dont reckon it is, but over to a proper statistical model to confirm that. The downside potential for SETI reputation is considerable if this bad case scenario is shown to be true without corrective action. The average turnaround for Astropulse work is 165.26 hours at the last reading, slightly less than one week. Many of the top hosts run a large queue and take even longer than the average, most participants simply install BOINC, attach to the project, and run with the small defaults. Then there are hosts which ask for 3 minutes of work and some flaw in the server code delivers 20 AP WUs. All those things affect the amount of data storage the project needs to support the amount of work in flight. I don't know of anything which will resolve all the issues. I simply consider it a privilege to help reduce data which could possibly prove existence of technological extraterrestrial aliens. Joe |
Zydor Send message Joined: 4 Oct 03 Posts: 172 Credit: 491,111 RAC: 0 |
I agree re the science, and the proposed change will not affect that at all. The matching of results and quality etc can still take place, reissuing as now where relevant. The only change proposed is credits are given up front, WU validation in reality, in terms of the science, would be unaffected as it remains as is now. Regards Zy |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
"Completed, can't validate" OK, anyone interested can keep an eye on this zero credit result to see if Eric's script was turned on. It was returned about 20 minutes ago as of this posting. Workunit 423265213 Task 1183265607 [edit] WOW !!! That was fast. It was granted credit within minutes after being returned. I guess we can consider that a confirmation. Well done. [/edit] |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
This one? 1183265607 4533413 11 Mar 2009 21:49:57 UTC 20 Mar 2009 13:12:35 UTC Completed, can't validate 46,216.39 1,226.88 1,226.88 Looks like you were awarded credit for it. PROUD MEMBER OF Team Starfire World BOINC |
Kinguni Send message Joined: 15 Feb 00 Posts: 239 Credit: 9,043,007 RAC: 0 |
|
Ministry of Disinformation Send message Joined: 19 Sep 06 Posts: 8 Credit: 17,475,791 RAC: 0 |
At least there's been a positive outcome it would seem. I know it doesn't help with the one I lost, but it stops it happening again, to either myself or other people. That's a good result in my opinion. It's nice to know that we are listened to. :) |
Zeus Fab3r Send message Joined: 17 Jan 01 Posts: 649 Credit: 275,335,635 RAC: 597 |
Well, listen to this... Recently, I've noticed that one of my WU's is on the edge of being wasted, because of my 'trusty' wingmans. That eventualy happened few minutes ago ;( http://setiathome.berkeley.edu/workunit.php?wuid=422266081 Big question is, how that WU get Completed, can't validate status, when my last wingman is nowhere near his deadline for finishing his part? Can this project predict that my wingy's result will end up with an error? :) Regards, ZF Who the hell is General Failure and why is he reading my harddisk?¿ |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Well, listen to this... In this case, it's because of the two lines above the result table: max # of error/total/success tasks 5, 10, 10 errors Too many error results It was probably the guy that aborted it on 5 June (after almost 7 weeks on a PIII at 522 MFlops) that did the damage. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmmm... Here's the really unfortunate part. The remaining wingman looks like he's wasting his time running the task as well. Also, I reviewed the thread and ingleside said 221's were disabled, but I was under the impression that 221's were enabled and it was auto resend lost work which is the DB performance killer (not that it would have made a difference here). In any event, this looks like a case where the project should send an unconditional abort to the host the next time it contacts the project (but probably won't). Alinator |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
The reason it "can't validate" is simply there isn't another successful result yet. If the last wingmate reports a success by 27 Jun 2009 6:49:09 UTC then validation will be possible. The "Too many error results" state merely keeps the transitioner from creating any more to send out. Joe |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.