Message boards :
Number crunching :
Problems...
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 13 · Next
Author | Message |
---|---|
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Now then, we're a bit further. When David tests from the internal SSL network, he gets through 100% of the time. No errors. Which would mean that it's something in SAH's internet connection. |
Lint trap Send message Joined: 30 May 03 Posts: 871 Credit: 28,092,319 RAC: 0 |
And I see completely good result with "validate error " state. I see that, too. 4 of the 5 "Invalids" for my pc look 'ok' based on the stats I can see now. But, who knows...maybe the eoy 06 and early 07 tapes were so noisy they just produce unreliable results. Martin |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
And now it seems to have been cleared. I no longer get any server replies of no headers, no data. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
But, who knows...maybe the eoy 06 and early 07 tapes were so noisy they just produce unreliable results. If they "noisy" they end in "-9" overflow. Outcome of task I posted : 1)Completed, validation inconclusive 2)Error while computing 3)Validate error 4)In progress "validate error" is validator error not science app error. If task would be computed incorrectly we would see 2 tasks in state number 1), "Completed, validation inconclusive" Validator was unable to do validation procedure. It's server problem hence I posted it here. Quality of data itself has no connection to problem IMO. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
And I see completely good result with "validate error " state. And I still have 22 "validate errors" in my list. About 20 have disappeared after having been re-crunched by another wingman. F. |
Lint trap Send message Joined: 30 May 03 Posts: 871 Credit: 28,092,319 RAC: 0 |
Without knowing details of the Validation process, we are just speculating. Martin |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
We have been round this loop before. We know this message results from a Validator error - that is why Eric has a script to hoover them up. An example is this one that I returned on 2 March and was awarded a "Validate Error". Given that mine had failed to validate, the result returned on 4 March should not be "Validation Inconclusive" but "Completed, waiting for validation". Note that my result was declared an error before there was anything for it to validate against. F. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
I've got one validate error that was turned in late last night. I show validate error but my wing man shows complete, waiting for validation and it has gone out to another wingy. Since it's the only one I have noticed I'm not going to worry about it. PROUD MEMBER OF Team Starfire World BOINC |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Without knowing details of the Validation process, we are just speculating. Both BOINC and S@H are open source, so the details of the Validation process are known. The "Validate error" status happens when the validator cannot find the result file, though there are a legion of possible reasons for that failure. Joe |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
We have been round this loop before. We know this message results from a Validator error - that is why Eric has a script to hoover them up. Do you have evidence for that final sentence, Fred? My understanding was that the validator wouldn't even LOOK at the WU until a quorum of reports had come in. Of course, in these cases, a quorum of reports does not guarantee a quorum of results: hence the validate error. But I don't think I've ever seen a task declared VE before the second report is in. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
We have been round this loop before. We know this message results from a Validator error - that is why Eric has a script to hoover them up. Unfortunately no evidence that I can produce. But I have been checking my "validate errors" daily since they started up again (out of curiosity more than anything else) and that result was definitely a "Validate error" while the wingman was still "In progress" though the tie-breaker WU was not sent out until the wingman reported (on 4 March). Looking through a few others' results who have reported "Validate errors" it seems to be consistent that if the "Validate error" is the second result to be reported then the successful result is awarded a "Complete, waiting for validation"; if the "Validate error" is the first to be reported then the successful result is awarded a CBNC (as it used to be known) so the validator seems to be looking for consensus even when the other result has been declared to be in error. A flaw in the logic, methinks. F. |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
@ Pappa Armstrong With the crash there could be an issue with the donation script (or UCB has changed something again), I will let Matt know. Thank You for letting us know. Regards Please consider a Donation to the Seti Project. |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
I am back, again There was a Database Crash... The database was restored to 16 January 2010. Results are stored in the Database. No if you return an "Active Result" that has no listing in he Database, it can be added as a New Record. When It goes to compare "Validate" is has nothing to compare against. You end up with a Validate Error. So as a Second Active Result is returned a new row is added. Because the entry in the Workunit table (Related Table/Database) does not link the two Active Results you end up in a Validate Error. When you look at both "Results" they are the same. The validator "could not" link them. This is because there are Various Tables in the Database that holds Various parts of the information about each Workunit and the expect Result that will be associated with it. Is was missing. That said. the Scheduler "has been" passing information to the Transitioner about "Database" entries that "were" supposed to be there. It waits while the Transtioner looks for missing information (in the mean time Boinc on your end is also waiting). In a comment from Eric, as we get past what everyone had on their machine (during the outage) and get it reported the validate errors should stop happening. Self Healing. As you get new work, then there is a correct entry that links everything together (in all the tables). In the mean time Eric periodically runs the Credit Script that accounts for the missing Credits. How they will pick the "canonical result" is up to them. Patience and keep returning results. Regards Please consider a Donation to the Seti Project. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
In a comment from Eric, as we get past what everyone had on their machine (during the outage) and get it reported the validate errors should stop happening. Self Healing. But I have a "validate error" on a WU that was sent to me on 10 March. Methinks there is something else afoot here - not "self-healing". And welcome back (again)... F. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Due to app_info misconfiguration all my MB tasks were discarded. But even after project reset they remain "in progress" state on web-page and were not resend to my host. That is, they now in "ghost" state and will be released to another host only when deadline comes ? It seems it's some problem with server config then, such tasks should be either resend to original host or marked as computational error and are resent to another host... |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Due to app_info misconfiguration all my MB tasks were discarded. Resetting isn't enough! You'll have to detach and re-attach to achieve your goal. The "resend to original host" feature is turned off at SETI, due to database stress. Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Due to app_info misconfiguration all my MB tasks were discarded. Still looks like logical error in server config to me. Actually I shouldn't even do reset. Tasks were discarded by BOINC client - why it didn't report computational error back to server and letted these tasks go in ghost state? |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Tasks were discarded by BOINC client - why it didn't report computational error back to server and letted these tasks go in ghost state? Did you report the aborted tasks before resetting? If yes - or if there haven't been any to report, I agree that that would be an error. Gruß, Gundolf |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Tasks were discarded by BOINC client I thought you said Due to app_info misconfiguration. That's not BOINC client, that's operator error (no criticism - we all do it). If you break it, you own both parts - until you fix it. BOINC doesn't supply its own glue - anyone who ventures down the anonymous platform route is well advised to heed the warnings 'advanced users only', and learn the manual recovery procedures from their manual experiments. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Tasks were discarded by BOINC client No-no-no, I didn't say (I hope) that discardign task was BOINC fault. app_info was incomplete (btw, is any reason to double mention of all files in heading section? I mean why each file_ref should have corresponding file_info? ) so discarding was legal. What I trying to say - why BOINC, knowing that it just discarded tasks, didn't reported this fact back to server as computation errors? And yes, before resetting I did few project updates (each time reciving no tasks, BOINC just rejected to ask for new work until I did project reset - another strange behavior BTW). Tasks usually discarded due "operator error" as Richard said so probably BOINC's behavior in such situation not well tested (who will deliberately repeat such experiment ;) )... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.