Validate Errors II

Message boards : Number crunching : Validate Errors II
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 21 · Next

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 625282 - Posted: 24 Aug 2007, 2:35:18 UTC - in response to Message 625274.  
Last modified: 24 Aug 2007, 2:36:32 UTC

...
Speaking of parsing the log, check this out. This is the failed to validate result http://setiathome.berkeley.edu/result.php?resultid=594959001. I think this is clearly a core client error, not Seti application error. Only three seconds betwee computation finish and result upload. Note that I DO NOT use the '-return_results_immedeately' option.

2007-08-22 08:53:50 [SETI@home] Computation for task 16fe07ac.4898.6207.11.5.20_0 finished
2007-08-22 08:53:51 [Einstein@Home] Restarting task h1_0534.55_S5R2__335_S5R2c_1 using einstein_S5R2 version 430
2007-08-22 08:53:52 [SETI@home] Sending scheduler request: To fetch work
2007-08-22 08:53:52 [SETI@home] Requesting 4638 seconds of new work, and reporting 1 completed tasks
2007-08-22 08:53:53 [SETI@home] [file_xfer] Started upload of file 16fe07ac.4898.6207.11.5.20_0_0
2007-08-22 08:53:56 [SETI@home] [file_xfer] Finished upload of file 16fe07ac.4898.6207.11.5.20_0_0
2007-08-22 08:53:56 [SETI@home] [file_xfer] Throughput 57742 bytes/sec

2007-08-22 08:53:57 [SETI@home] Scheduler RPC succeeded [server version 511]
2007-08-22 08:53:57 [SETI@home] Deferring communication for 11 sec
2007-08-22 08:53:57 [SETI@home] Reason: requested by project


As a reminder,
* Boinc is 64-bit Window version 5.10.13 from boinc.berkeley.edu
* Windows application is 2.4_Windows_x64_SSSE3.
* (from http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1002)
Connect time is set to 3.25 days.



Then I think this is a problem we discussed here in this thread:

Because of the 'validate errors'

After the update of the server-software you need little time between upload and report.. I use 0.002 days, this are ~ 3 minutes.
But if it's like you posted, then..


And.. thank you for looking!


ID: 625282 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 625315 - Posted: 24 Aug 2007, 4:09:33 UTC - in response to Message 625274.  

Speaking of parsing the log, check this out. This is the failed to validate result http://setiathome.berkeley.edu/result.php?resultid=594959001. I think this is clearly a core client error, not Seti application error. Only three seconds betwee computation finish and result upload. Note that I DO NOT use the '-return_results_immedeately' option.

2007-08-22 08:53:50 [SETI@home] Computation for task 16fe07ac.4898.6207.11.5.20_0 finished
2007-08-22 08:53:51 [Einstein@Home] Restarting task h1_0534.55_S5R2__335_S5R2c_1 using einstein_S5R2 version 430
2007-08-22 08:53:52 [SETI@home] Sending scheduler request: To fetch work
2007-08-22 08:53:52 [SETI@home] Requesting 4638 seconds of new work, and reporting 1 completed tasks
2007-08-22 08:53:53 [SETI@home] [file_xfer] Started upload of file 16fe07ac.4898.6207.11.5.20_0_0
2007-08-22 08:53:56 [SETI@home] [file_xfer] Finished upload of file 16fe07ac.4898.6207.11.5.20_0_0
2007-08-22 08:53:56 [SETI@home] [file_xfer] Throughput 57742 bytes/sec

2007-08-22 08:53:57 [SETI@home] Scheduler RPC succeeded [server version 511]
2007-08-22 08:53:57 [SETI@home] Deferring communication for 11 sec
2007-08-22 08:53:57 [SETI@home] Reason: requested by project


As a reminder,
* Boinc is 64-bit Window version 5.10.13 from boinc.berkeley.edu
* Windows application is 2.4_Windows_x64_SSSE3.
* (from http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1002)
Connect time is set to 3.25 days.

Sorry, Michael, that does not reveal the problem. Your resultid=594959001 was reported at 12:55:35 UTC or about 99 seconds after its "finished upload" message. The one being reported with the scheduler request you show was uploaded earlier. It's the "finished upload" which triggers the transition from "Uploading" to "Ready to report", so the critical time gap is between that and the next scheduler request. You might want to check that gap, in case your clock and Berkeley's aren't exactly in sync it could be more or less than the apparent 99 seconds.

For all I know, 99 seconds may sometimes not be enough when the servers are under heavy load, but it seems to me if that much lag in the backend happened often it would cause other more obvious problems.
                                                               Joe
ID: 625315 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 625323 - Posted: 24 Aug 2007, 4:41:59 UTC - in response to Message 625315.  

Sorry, Michael, that does not reveal the problem. Your resultid=594959001 was reported at 12:55:35 UTC or about 99 seconds after its "finished upload" message. The one being reported with the scheduler request you show was uploaded earlier. It's the "finished upload" which triggers the transition from "Uploading" to "Ready to report", so the critical time gap is between that and the next scheduler request. You might want to check that gap, in case your clock and Berkeley's aren't exactly in sync it could be more or less than the apparent 99 seconds.

For all I know, 99 seconds may sometimes not be enough when the servers are under heavy load, but it seems to me if that much lag in the backend happened often it would cause other more obvious problems.
                                                               Joe


This server time is not synchronized to anything. It really should have been! At this point of time, the server time is 52 seconds off the "standard NTP time", so (assuming the clock was always 52 seconds and assuming Berkeley time is synchronized to "standard NTP time") the time difference is definitely less than 99 seconds.

ID: 625323 · Report as offensive
satan_shark

Send message
Joined: 4 Aug 07
Posts: 1
Credit: 127,189
RAC: 0
Italy
Message 625935 - Posted: 24 Aug 2007, 21:05:05 UTC

Task 05mr07aa.12210.24612.15.4.209_1 does not Elaborate. It's 3 Days of running but it's still blocked to 0%. Sometimes it also restart automatically.
Please Fix it if possible.
ID: 625935 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 625948 - Posted: 24 Aug 2007, 21:21:06 UTC

If I'm not mistaken, that's from the batch of bad WU's which were sent out early in the MultiBeam (MB) rollout. Just abort it.

Alinator
ID: 625948 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 628790 - Posted: 29 Aug 2007, 12:03:39 UTC

I am aware that the project scientists and developers are very busy at the moment, but I put foreward a possible server side solution a few weeks ago.

There is an option in the BOINC scheduler code for a longer delay. Other BOINC projects that I run use it.

29/08/2007 12:59:09|rosetta@home|Project requested delay of 242.400000 seconds
29/08/2007 12:59:55|ralph@home|Project requested delay of 242.400000 seconds
29/08/2007 13:00:26|Leiden Classical|Project requested delay of 181.800000 seconds

If SETI and SETI Beta implemented a similar delay, I beleive many more Validate errors could be prevented.


Sir Arthur C Clarke 1917-2008
ID: 628790 · Report as offensive
Profile Pepino65
Volunteer tester

Send message
Joined: 16 Feb 04
Posts: 4
Credit: 1,031,222
RAC: 0
Czech Republic
Message 632317 - Posted: 2 Sep 2007, 20:48:21 UTC - in response to Message 625935.  

Task 05mr07aa.12210.24612.15.4.209_1 does not Elaborate. It's 3 Days of running but it's still blocked to 0%. Sometimes it also restart automatically.
Please Fix it if possible.


I have self problem with wu :
2.9.2007 22:40:11|SETI@home|Restarting task 05mr07ab.6072.368637.14.4.19_8 using setiathome_enhanced version 527

After 9:35 H - 0,008 % done, before PC restart time 0:00:00 h - 0 % done.


ID: 632317 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 632525 - Posted: 3 Sep 2007, 0:57:18 UTC - in response to Message 632317.  

Task 05mr07aa.12210.24612.15.4.209_1 does not Elaborate. It's 3 Days of running but it's still blocked to 0%. Sometimes it also restart automatically.
Please Fix it if possible.


I have self problem with wu :
2.9.2007 22:40:11|SETI@home|Restarting task 05mr07ab.6072.368637.14.4.19_8 using setiathome_enhanced version 527

After 9:35 H - 0,008 % done, before PC restart time 0:00:00 h - 0 % done.

Abort that task, it's one of a group of bad WUs.
                                                             Joe
ID: 632525 · Report as offensive
Profile Pepino65
Volunteer tester

Send message
Joined: 16 Feb 04
Posts: 4
Credit: 1,031,222
RAC: 0
Czech Republic
Message 632770 - Posted: 3 Sep 2007, 12:43:32 UTC - in response to Message 632525.  

Task 05mr07aa.12210.24612.15.4.209_1 does not Elaborate. It's 3 Days of running but it's still blocked to 0%. Sometimes it also restart automatically.
Please Fix it if possible.


I have self problem with wu :
2.9.2007 22:40:11|SETI@home|Restarting task 05mr07ab.6072.368637.14.4.19_8 using setiathome_enhanced version 527

After 9:35 H - 0,008 % done, before PC restart time 0:00:00 h - 0 % done.

Abort that task, it's one of a group of bad WUs.
                                                             Joe


Thank you Joe
ID: 632770 · Report as offensive
The Postman
Avatar

Send message
Joined: 4 Jan 03
Posts: 78
Credit: 14,960,413
RAC: 74
United States
Message 634790 - Posted: 6 Sep 2007, 16:22:31 UTC

ID: 634790 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 638876 - Posted: 11 Sep 2007, 15:33:26 UTC - in response to Message 634790.  

http://setiathome.berkeley.edu/workunit.php?wuid=154386050


Wow. Can we all agree now that this is a bug in boinc server?
605465759 	1040407  	3 Sep 2007 21:35:07 UTC  	4 Sep 2007 1:57:33 UTC  	Over  	Success  	Done  	7,510.64  	92.70  	92.70
605465760 	3458649 	4 Sep 2007 6:53:32 UTC 	6 Sep 2007 9:12:27 UTC 	Over 	Validate error 	Done 	15,633.78 	--- 	---
606727540 	3053701 	6 Sep 2007 9:12:40 UTC 	11 Sep 2007 6:40:10 UTC 	Over 	Success 	Done 	11,278.20 	92.70 	92.70


<stderr_txt>
setiathome_enhanced 5.27 DevC++/MinGW

Work Unit Info:
...............
WU true angle range is :  0.130374
Optimal function choices:
-----------------------------------------------------
name                
-----------------------------------------------------
              v_BaseLineSmooth (no other)
           v_vGetPowerSpectrum 0.00037 0.00000 
                   v_ChirpData 0.01987 0.00000 
           v_vTranspose4x16ntw 0.01068 0.00000 
                AK SSE folding 0.00184 0.00000 

Flopcounter: 28098176724478.195000

Spike count:    4
Pulse count:    2
Triplet count:  0
Gaussian count: 0

</stderr_txt>


Validate state	Initial
Claimed credit	92.7013931305126
Granted credit	0
application version	5.27


ID: 638876 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 642070 - Posted: 16 Sep 2007, 4:20:00 UTC

ID: 642070 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 642077 - Posted: 16 Sep 2007, 4:57:13 UTC - in response to Message 642070.  
Last modified: 16 Sep 2007, 5:06:50 UTC

In case someone cares, I am still being plagued by the validate errors. I have switched to 5.9.0.64 Boinc by Crunch3r with latest 2.4V application. I am getting very frustrated.

http://setiathome.berkeley.edu/workunit.php?wuid=157307080
http://setiathome.berkeley.edu/workunit.php?wuid=157307030
http://setiathome.berkeley.edu/workunit.php?wuid=157396688
http://setiathome.berkeley.edu/workunit.php?wuid=157525572
http://setiathome.berkeley.edu/workunit.php?wuid=157503807



Hello again..

I have sometimes 'validate errors' too, because of the 'no command' error.. :-( You remember.. ;-)
One week no error, then every day one, two or more.. I think it's a prob with the driver of the LAN-port..
Soon will install a new OS (WinVista 64Bit).. then will see..

Because of your 'validate errors'.. maybe it's the problem that you 'report results immediately'..
I posted it here..

I use BOINC V5.10.13

In Version 5.10.20 you cannot longer report after your chosen time.. it can be last to 24 hours..

Some use Crunch3rs Version too ('report results immediately') and have no probs..
So it's like 'good' and 'bad' - luck..


ID: 642077 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 642743 - Posted: 17 Sep 2007, 0:25:39 UTC

Sutaru, I've been there before. I switched from version 5.10.13.

http://setiathome.berkeley.edu/forum_thread.php?id=39588&nowrap=true#624094

ID: 642743 · Report as offensive
Profile John Vargason
Avatar

Send message
Joined: 18 Mar 07
Posts: 7
Credit: 26,455
RAC: 0
United States
Message 644294 - Posted: 19 Sep 2007, 16:24:38 UTC

Me to ever since my computer crashed a few weeks ago. I reformatted the hard drives and had to redownload all the files, including boinc and now all I get is errors. Before when my computer was really slow with junk I was able to dl two at a time, now it does the work and when the time comes I don't get the credit. So, so long folks good hunting I'm a signing off. Don't need the aggravation this bad. Not to mention one guy in our group went from zero points to over 55,000 in a month. Took me since March to get to 9843 so with people like him you surely don't need me! The Vargstr quits in disgust...
Life is where you believe it is, it simply is.
ID: 644294 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 645074 - Posted: 20 Sep 2007, 20:47:12 UTC - in response to Message 644294.  

Me to ever since my computer crashed a few weeks ago. I reformatted the hard drives and had to redownload all the files, including boinc and now all I get is errors. Before when my computer was really slow with junk I was able to dl two at a time, now it does the work and when the time comes I don't get the credit. So, so long folks good hunting I'm a signing off. Don't need the aggravation this bad. Not to mention one guy in our group went from zero points to over 55,000 in a month. Took me since March to get to 9843 so with people like him you surely don't need me! The Vargstr quits in disgust...

There are sevreral possibilities. I notice two different behaviors on two different machines of yours. One of the hosts cannot find a file, and the other is erroring out using an optomized application.

It is entirely possible that you did not configure the optimized application correctly after the re-format, or you grabbed the wrong optomized application. On the host that is failint to start, you are either missing the optomized application or you are unable to download the unoptomized application - probably due to a rights issue.


BOINC WIKI
ID: 645074 · Report as offensive
Simplex0
Volunteer tester

Send message
Joined: 28 May 99
Posts: 124
Credit: 205,874
RAC: 0
Message 651356 - Posted: 30 Sep 2007, 5:23:39 UTC

Work unit
162727450
&
162727446

failed after 0 sec.
ID: 651356 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 651367 - Posted: 30 Sep 2007, 6:27:51 UTC

Work unit
162727450
&
162727446

failed after 0 sec.


I got lots of these on my X5355 over night... but I get "Client error" instead of "Done".


mic.


ID: 651367 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 651526 - Posted: 30 Sep 2007, 15:49:11 UTC - in response to Message 651356.  

Work unit
162727450
&
162727446

failed after 0 sec.

Errors like these were recognized by the BOINC core client, and are tagged "Client error" in the result pages. They can include bad WUs like this case, errors in the science application, errors in communications, or errors in the core client itself.

When the Validator recognizes an error, specifically when it cannot find a result file it has been asked to validate, the tag is "Validate error". Those are the subject of this thread.
ID: 651526 · Report as offensive
_heinz
Volunteer tester

Send message
Joined: 25 Feb 05
Posts: 744
Credit: 5,539,270
RAC: 0
France
Message 653388 - Posted: 3 Oct 2007, 14:11:13 UTC

bad header file in wu
look
heinz
ID: 653388 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 21 · Next

Message boards : Number crunching : Validate Errors II


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.