Problems...

Message boards : Number crunching : Problems...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 13 · Next

AuthorMessage
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 977156 - Posted: 10 Mar 2010, 18:23:55 UTC - in response to Message 976935.  

Now then, we're a bit further. When David tests from the internal SSL network, he gets through 100% of the time. No errors.

Which would mean that it's something in SAH's internet connection.
ID: 977156 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 977162 - Posted: 10 Mar 2010, 18:48:36 UTC - in response to Message 977145.  

And I see completely good result with "validate error " state.
Look: http://setiathome.berkeley.edu/result.php?resultid=1532205824



I see that, too. 4 of the 5 "Invalids" for my pc look 'ok' based on the stats I can see now. But, who knows...maybe the eoy 06 and early 07 tapes were so noisy they just produce unreliable results.

Martin
ID: 977162 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 977166 - Posted: 10 Mar 2010, 18:55:29 UTC - in response to Message 977156.  

And now it seems to have been cleared. I no longer get any server replies of no headers, no data.
ID: 977166 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 977172 - Posted: 10 Mar 2010, 19:06:05 UTC - in response to Message 977162.  

But, who knows...maybe the eoy 06 and early 07 tapes were so noisy they just produce unreliable results.

Martin

If they "noisy" they end in "-9" overflow.
Outcome of task I posted :
1)Completed, validation inconclusive
2)Error while computing
3)Validate error
4)In progress

"validate error" is validator error not science app error.
If task would be computed incorrectly we would see 2 tasks in state number 1), "Completed, validation inconclusive"
Validator was unable to do validation procedure. It's server problem hence I posted it here. Quality of data itself has no connection to problem IMO.
ID: 977172 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 977173 - Posted: 10 Mar 2010, 19:18:16 UTC - in response to Message 977145.  

And I see completely good result with "validate error " state.
Look: http://setiathome.berkeley.edu/result.php?resultid=1532205824

And I still have 22 "validate errors" in my list. About 20 have disappeared after having been re-crunched by another wingman.

F.
ID: 977173 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 977189 - Posted: 10 Mar 2010, 20:25:03 UTC - in response to Message 977172.  


Without knowing details of the Validation process, we are just speculating.

Martin
ID: 977189 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 977196 - Posted: 10 Mar 2010, 20:55:17 UTC - in response to Message 977189.  


Without knowing details of the Validation process, we are just speculating.

Martin

We have been round this loop before. We know this message results from a Validator error - that is why Eric has a script to hoover them up.
An example is this one that I returned on 2 March and was awarded a "Validate Error". Given that mine had failed to validate, the result returned on 4 March should not be "Validation Inconclusive" but "Completed, waiting for validation". Note that my result was declared an error before there was anything for it to validate against.

F.
ID: 977196 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 977197 - Posted: 10 Mar 2010, 21:02:15 UTC

I've got one validate error that was turned in late last night. I show validate error but my wing man shows complete, waiting for validation and it has gone out to another wingy. Since it's the only one I have noticed I'm not going to worry about it.


PROUD MEMBER OF Team Starfire World BOINC
ID: 977197 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 977202 - Posted: 10 Mar 2010, 21:13:01 UTC - in response to Message 977189.  

Without knowing details of the Validation process, we are just speculating.

Martin

Both BOINC and S@H are open source, so the details of the Validation process are known. The "Validate error" status happens when the validator cannot find the result file, though there are a legion of possible reasons for that failure.
                                                               Joe
ID: 977202 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 977221 - Posted: 10 Mar 2010, 22:08:21 UTC - in response to Message 977196.  

We have been round this loop before. We know this message results from a Validator error - that is why Eric has a script to hoover them up.
An example is this one that I returned on 2 March and was awarded a "Validate Error". Given that mine had failed to validate, the result returned on 4 March should not be "Validation Inconclusive" but "Completed, waiting for validation". Note that my result was declared an error before there was anything for it to validate against.

Do you have evidence for that final sentence, Fred? My understanding was that the validator wouldn't even LOOK at the WU until a quorum of reports had come in. Of course, in these cases, a quorum of reports does not guarantee a quorum of results: hence the validate error.

But I don't think I've ever seen a task declared VE before the second report is in.
ID: 977221 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 977233 - Posted: 10 Mar 2010, 22:58:04 UTC - in response to Message 977221.  

We have been round this loop before. We know this message results from a Validator error - that is why Eric has a script to hoover them up.
An example is this one that I returned on 2 March and was awarded a "Validate Error". Given that mine had failed to validate, the result returned on 4 March should not be "Validation Inconclusive" but "Completed, waiting for validation". Note that my result was declared an error before there was anything for it to validate against.

Do you have evidence for that final sentence, Fred? My understanding was that the validator wouldn't even LOOK at the WU until a quorum of reports had come in. Of course, in these cases, a quorum of reports does not guarantee a quorum of results: hence the validate error.

But I don't think I've ever seen a task declared VE before the second report is in.

Unfortunately no evidence that I can produce. But I have been checking my "validate errors" daily since they started up again (out of curiosity more than anything else) and that result was definitely a "Validate error" while the wingman was still "In progress" though the tie-breaker WU was not sent out until the wingman reported (on 4 March).
Looking through a few others' results who have reported "Validate errors" it seems to be consistent that if the "Validate error" is the second result to be reported then the successful result is awarded a "Complete, waiting for validation"; if the "Validate error" is the first to be reported then the successful result is awarded a CBNC (as it used to be known) so the validator seems to be looking for consensus even when the other result has been declared to be in error. A flaw in the logic, methinks.

F.
ID: 977233 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 977251 - Posted: 10 Mar 2010, 23:59:35 UTC - in response to Message 977104.  

@ Pappa

Hello Pappa

I always have problems when pay with my MasterCard Platinum Credidt if I send "to" click comes back: You're not a "Präfax" - whats that? I want to help but does not function.


Armstrong

With the crash there could be an issue with the donation script (or UCB has changed something again), I will let Matt know.

Thank You for letting us know.

Regards

Please consider a Donation to the Seti Project.

ID: 977251 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 977261 - Posted: 11 Mar 2010, 0:30:11 UTC

I am back, again

There was a Database Crash... The database was restored to 16 January 2010. Results are stored in the Database. No if you return an "Active Result" that has no listing in he Database, it can be added as a New Record. When It goes to compare "Validate" is has nothing to compare against. You end up with a Validate Error.

So as a Second Active Result is returned a new row is added. Because the entry in the Workunit table (Related Table/Database) does not link the two Active Results you end up in a Validate Error.

When you look at both "Results" they are the same. The validator "could not" link them. This is because there are Various Tables in the Database that holds Various parts of the information about each Workunit and the expect Result that will be associated with it. Is was missing.

That said. the Scheduler "has been" passing information to the Transitioner about "Database" entries that "were" supposed to be there. It waits while the Transtioner looks for missing information (in the mean time Boinc on your end is also waiting).

In a comment from Eric, as we get past what everyone had on their machine (during the outage) and get it reported the validate errors should stop happening. Self Healing.
As you get new work, then there is a correct entry that links everything together (in all the tables).

In the mean time Eric periodically runs the Credit Script that accounts for the missing Credits. How they will pick the "canonical result" is up to them.

Patience and keep returning results.

Regards


Please consider a Donation to the Seti Project.

ID: 977261 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 977380 - Posted: 11 Mar 2010, 17:37:22 UTC - in response to Message 977261.  

In a comment from Eric, as we get past what everyone had on their machine (during the outage) and get it reported the validate errors should stop happening. Self Healing.
As you get new work, then there is a correct entry that links everything together (in all the tables).

In the mean time Eric periodically runs the Credit Script that accounts for the missing Credits. How they will pick the "canonical result" is up to them.

Patience and keep returning results.

Regards


But I have a "validate error" on a WU that was sent to me on 10 March. Methinks there is something else afoot here - not "self-healing".

And welcome back (again)...

F.
ID: 977380 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 977431 - Posted: 11 Mar 2010, 21:30:17 UTC

Due to app_info misconfiguration all my MB tasks were discarded.
But even after project reset they remain "in progress" state on web-page and were not resend to my host.
That is, they now in "ghost" state and will be released to another host only when deadline comes ? It seems it's some problem with server config then, such tasks should be either resend to original host or marked as computational error and are resent to another host...
ID: 977431 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 977448 - Posted: 11 Mar 2010, 22:09:11 UTC - in response to Message 977431.  

Due to app_info misconfiguration all my MB tasks were discarded.
But even after project reset they remain "in progress" state on web-page and were not resend to my host.
That is, they now in "ghost" state and will be released to another host only when deadline comes ? It seems it's some problem with server config then, such tasks should be either resend to original host or marked as computational error and are resent to another host...

Resetting isn't enough! You'll have to detach and re-attach to achieve your goal. The "resend to original host" feature is turned off at SETI, due to database stress.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 977448 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 977468 - Posted: 11 Mar 2010, 22:52:42 UTC - in response to Message 977448.  
Last modified: 11 Mar 2010, 22:53:02 UTC

Due to app_info misconfiguration all my MB tasks were discarded.
But even after project reset they remain "in progress" state on web-page and were not resend to my host.
That is, they now in "ghost" state and will be released to another host only when deadline comes ? It seems it's some problem with server config then, such tasks should be either resend to original host or marked as computational error and are resent to another host...

Resetting isn't enough! You'll have to detach and re-attach to achieve your goal. The "resend to original host" feature is turned off at SETI, due to database stress.

Gruß,
Gundolf

Still looks like logical error in server config to me.
Actually I shouldn't even do reset. Tasks were discarded by BOINC client - why it didn't report computational error back to server and letted these tasks go in ghost state?
ID: 977468 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 977484 - Posted: 11 Mar 2010, 23:46:28 UTC - in response to Message 977468.  

Tasks were discarded by BOINC client - why it didn't report computational error back to server and letted these tasks go in ghost state?

Did you report the aborted tasks before resetting?

If yes - or if there haven't been any to report, I agree that that would be an error.

Gruß,
Gundolf
ID: 977484 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 977493 - Posted: 12 Mar 2010, 0:12:50 UTC - in response to Message 977468.  

Tasks were discarded by BOINC client

I thought you said Due to app_info misconfiguration.

That's not BOINC client, that's operator error (no criticism - we all do it).

If you break it, you own both parts - until you fix it. BOINC doesn't supply its own glue - anyone who ventures down the anonymous platform route is well advised to heed the warnings 'advanced users only', and learn the manual recovery procedures from their manual experiments.
ID: 977493 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 977498 - Posted: 12 Mar 2010, 0:24:05 UTC - in response to Message 977493.  

Tasks were discarded by BOINC client

I thought you said Due to app_info misconfiguration.

That's not BOINC client, that's operator error (no criticism - we all do it).

If you break it, you own both parts - until you fix it. BOINC doesn't supply its own glue - anyone who ventures down the anonymous platform route is well advised to heed the warnings 'advanced users only', and learn the manual recovery procedures from their manual experiments.


No-no-no, I didn't say (I hope) that discardign task was BOINC fault.
app_info was incomplete (btw, is any reason to double mention of all files in heading section? I mean why each file_ref should have corresponding file_info? ) so discarding was legal. What I trying to say - why BOINC, knowing that it just discarded tasks, didn't reported this fact back to server as computation errors?

And yes, before resetting I did few project updates (each time reciving no tasks, BOINC just rejected to ask for new work until I did project reset - another strange behavior BTW).

Tasks usually discarded due "operator error" as Richard said so probably BOINC's behavior in such situation not well tested (who will deliberately repeat such experiment ;) )...
ID: 977498 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 13 · Next

Message boards : Number crunching : Problems...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.