Because of the 'validate errors'

Author	Message
Dirk Sadowski Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 614765 - Posted: 4 Aug 2007, 13:54:22 UTC Last modified: 4 Aug 2007, 14:05:05 UTC Because of the 'validate errors' Crunch3r posted, since the update of the software of the servers at Berkeley, 'report results immediately' make sometimes 'validate errors'. I have BOINC V5.10.7 and connect every 0.001 days, this are ~ 90 seconds. (report ~ 90 seconds after upload) And sometimes I have 'validate errors'. I think because of this: 8/4/2007 3:37:41 PM\|SETI@home\|Computation for task 29mr00ab.25614.1169.804838.3.10_3 finished 8/4/2007 3:37:41 PM\|SETI@home\|Starting 19jn00aa.11827.12834.417318.3.137_2 8/4/2007 3:37:41 PM\|SETI@home\|Starting task 19jn00aa.11827.12834.417318.3.137_2 using setiathome_enhanced version 515 8/4/2007 3:37:41 PM\|SETI@home\|Sending scheduler request: To fetch work 8/4/2007 3:37:41 PM\|SETI@home\|Requesting 2363 seconds of new work 8/4/2007 3:37:43 PM\|SETI@home\|[file_xfer] Started upload of file 29mr00ab.25614.1169.804838.3.10_3_0 8/4/2007 3:37:47 PM\|SETI@home\|Scheduler RPC succeeded [server version 511] 8/4/2007 3:37:47 PM\|SETI@home\|Deferring communication for 11 sec 8/4/2007 3:37:47 PM\|SETI@home\|Reason: requested by project 8/4/2007 3:37:49 PM\|SETI@home\|[file_xfer] Started download of file 29mr00ab.25614.5729.548578.3.113 [b]8/4/2007 3:37:50 PM\|SETI@home\|[file_xfer] Finished upload of file 29mr00ab.25614.1169.804838.3.10_3_0[/b] 8/4/2007 3:37:50 PM\|SETI@home\|[file_xfer] Throughput 8315 bytes/sec 8/4/2007 3:37:54 PM\|SETI@home\|[file_xfer] Finished download of file 29mr00ab.25614.5729.548578.3.113 8/4/2007 3:37:54 PM\|SETI@home\|[file_xfer] Throughput 77539 bytes/sec 8/4/2007 3:38:02 PM\|SETI@home\|Sending scheduler request: To fetch work [b]8/4/2007 3:38:02 PM\|SETI@home\|Requesting 874 seconds of new work, and [color=red]reporting 1 completed tasks[/color][/b] 8/4/2007 3:38:12 PM\|SETI@home\|Scheduler RPC succeeded [server version 511] 8/4/2007 3:38:12 PM\|SETI@home\|Deferring communication for 11 sec 8/4/2007 3:38:12 PM\|SETI@home\|Reason: requested by project 8/4/2007 3:38:14 PM\|SETI@home\|[file_xfer] Started download of file 13mr00aa.16419.10962.842330.3.48 8/4/2007 3:38:20 PM\|SETI@home\|[file_xfer] Finished download of file 13mr00aa.16419.10962.842330.3.48 8/4/2007 3:38:20 PM\|SETI@home\|[file_xfer] Throughput 87819 bytes/sec This are 12 seconds and it's a good reported result. BUT sometimes the time between is shorter and then: 'validate error' :-( OR? ID: 614765 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 614808 - Posted: 4 Aug 2007, 15:41:16 UTC - in response to Message 614765. Because of the 'validate errors' Crunch3r posted, since the update of the software of the servers at Berkeley, 'report results immediately' make sometimes 'validate errors'. I have BOINC V5.10.7 and connect every 0.001 days, this are ~ 90 seconds. (report ~ 90 seconds after upload) And sometimes I have 'validate errors'. ... This are 12 seconds and it's a good reported result. BUT sometimes the time between is shorter and then: 'validate error' :-( OR? In the Beta "Validate error?" thread Keith T reported a similar case with a 22 second delay which did cause a Validate error. Because work fetch calculations are distinct from cpu usage calculations it's always possible for a request to the Scheduler to occur very soon after "completion" of an upload. Those situations should be fairly rare, though. Perhaps the 0.001 day setting is not always enough, 0.002 would be even safer. Joe ID: 614808 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 614814 - Posted: 4 Aug 2007, 15:48:24 UTC LOL... Agreed, why go looking for trouble? I've used 0.01 days for a CI for ages and have never had a result get invalidated for this reason AFAIK, and the project wasn't having problems. When you get right down to it, practically speaking, 15 minutes is as good as 80 seconds when it comes to 'immediately', especially if the alternative is loosing a good result needlessly. ;-) Alinator ID: 614814 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 614829 - Posted: 4 Aug 2007, 16:09:57 UTC Last modified: 4 Aug 2007, 16:43:02 UTC During a discussion about 'Return Results Immediately' last year, I noticed (here) that there is a strong correlation between computation finishing on a WU, and a scheduler request for more work. This effect is quite separate from the CI: I'm sure it happens because of the re-calculation of the RDCF, and hence the total estimated crunch time of the WUs held in cache. If you've recently finished a slow WU like the dreaded 58.69, and are now working on more 'normal' WUs, then your RDCF will be decreased as each WU finishes: the total work buffer on hand will decrease pro-rata. There's a fair chance that this decrease will cross the cache size boundary, and so the request goes in for more work. Usually this happens *before* the upload has completed, and so the report of the just-finished WU has to wait until the *next* scheduler contact (the point of the discussion last year). However, now that the server back-off is only 11 seconds, sometimes two work requests happen in quick succession, and the second one can carry with it the report of the just-completed WU. This is exactly the situation shown in Sutaru Tsureku's opening post in this thread. [Edit - the second 'work-fetch' after 11 seconds is much more likely with recent clients, because of the server-abort of redundant WUs. You ask for more work - you end up with less because of an abort instruction - so you ask again. Like Oliver Twist: more! more!] There was some support last year for a client-enforced 'cooling off period' after a WU completes, to allow that upload to complete in an orderly fashion before the next scheduler contact. I think I've observed something like that happening in recent clients when WUs error out, but not when they finish normally. Is it worth re-visiting this suggestion? ID: 614829 ·

Keith T. Volunteer tester Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9	Message 614836 - Posted: 4 Aug 2007, 16:19:04 UTC I think the best solution would be for the project to set a longer Scheduler delay. Rosetta uses a 4 minute delay, would it tbe possible to set a similar value on the SETI and SETI Beta schedulers? Sir Arthur C Clarke 1917-2008 ID: 614836 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 614839 - Posted: 4 Aug 2007, 16:22:23 UTC - in response to Message 614836. I think the best solution would be for the project to set a longer Scheduler delay. Rosetta uses a 4 minute delay, would it tbe possible to set a similar value on the SETI and SETI Beta schedulers? When we were discussing this last year, the delay was 10-minutes-plus-a-bit. It then dropped without warning (or, as far as I can remember, any explanation) to the current 11 seconds. Maybe 4 minutes would be a happy medium.... ID: 614839 ·

KB7RZF Volunteer tester Send message Joined: 15 Aug 99 Posts: 9555 Credit: 3,308,926 RAC: 2	Message 615203 - Posted: 5 Aug 2007, 7:49:43 UTC Just to throw my 2 cents worth in, as I just saw this thread: I left the CI at 0, and have Maintain enough work for an additional: set at .1, and all last month I ran nothing but SETI, and I never, ever had a result go bad. Call me lucky I guess? Dunno, but I have yet to have a problem with it set like this. Jeremy ID: 615203 ·

Dirk Sadowski Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 615210 - Posted: 5 Aug 2007, 8:23:08 UTC Last modified: 5 Aug 2007, 8:55:00 UTC I had done now, connect every 0.002 days.. (~ 180 seconds) BUT what do the people with DUO or QUAD- CPUs? ..or the people with 'V8'? Here an example.. 2 results finished and uploaded. A third result finished uploading after reporting.. BUT, if this third result finished the upload ~ 10 seconds earlier, it would be reported 3 seconds later.. AND then, it will be VERY SURE a 'validate error'.. OR? :-( SO, 'how we could do it better'? ;-) 8/5/2007 10:04:00 AM\|SETI@home\|Computation for task 29mr00ab.25614.4656.665884.3.235_0 finished 8/5/2007 10:04:00 AM\|SETI@home\|Starting 19jn00aa.11827.19456.484658.3.204_2 8/5/2007 10:04:00 AM\|SETI@home\|Starting task 19jn00aa.11827.19456.484658.3.204_2 using setiathome_enhanced version 515 8/5/2007 10:04:02 AM\|SETI@home\|[file_xfer] Started upload of file 29mr00ab.25614.4656.665884.3.235_0_0 8/5/2007 10:04:10 AM\|SETI@home\|[file_xfer] Finished upload of file 29mr00ab.25614.4656.665884.3.235_0_0 8/5/2007 10:04:10 AM\|SETI@home\|[file_xfer] Throughput 8571 bytes/sec 8/5/2007 10:04:12 AM\|SETI@home\|Computation for task 19jn00aa.11827.19456.484658.3.204_2 finished 8/5/2007 10:04:12 AM\|SETI@home\|Starting 29mr00ab.25614.4656.665884.3.198_0 8/5/2007 10:04:12 AM\|SETI@home\|Starting task 29mr00ab.25614.4656.665884.3.198_0 using setiathome_enhanced version 515 8/5/2007 10:04:14 AM\|SETI@home\|[file_xfer] Started upload of file 19jn00aa.11827.19456.484658.3.204_2_0 8/5/2007 10:04:19 AM\|SETI@home\|[file_xfer] Finished upload of file 19jn00aa.11827.19456.484658.3.204_2_0 8/5/2007 10:04:19 AM\|SETI@home\|[file_xfer] Throughput 9858 bytes/sec 8/5/2007 10:06:57 AM\|SETI@home\|Computation for task 29mr00ab.25614.7121.304816.3.3_1 finished 8/5/2007 10:06:57 AM\|SETI@home\|Starting 29mr00ab.25614.4656.665884.3.237_1 8/5/2007 10:06:57 AM\|SETI@home\|Starting task 29mr00ab.25614.4656.665884.3.237_1 using setiathome_enhanced version 515 8/5/2007 10:07:00 AM\|SETI@home\|[file_xfer] Started upload of file 29mr00ab.25614.7121.304816.3.3_1_0 8/5/2007 10:07:05 AM\|SETI@home\|Sending scheduler request: To report completed tasks 8/5/2007 10:07:05 AM\|SETI@home\|Reporting 2 tasks 8/5/2007 10:07:12 AM\|SETI@home\|[file_xfer] Finished upload of file 29mr00ab.25614.7121.304816.3.3_1_0 8/5/2007 10:07:12 AM\|SETI@home\|[file_xfer] Throughput 2641 bytes/sec 8/5/2007 10:07:15 AM\|SETI@home\|Scheduler RPC succeeded [server version 511] 8/5/2007 10:07:15 AM\|SETI@home\|Deferring communication for 11 sec 8/5/2007 10:07:15 AM\|SETI@home\|Reason: requested by project 8/5/2007 10:10:07 AM\|SETI@home\|Sending scheduler request: To report completed tasks 8/5/2007 10:10:07 AM\|SETI@home\|Reporting 1 tasks 8/5/2007 10:10:17 AM\|SETI@home\|Scheduler RPC succeeded [server version 511] 8/5/2007 10:10:17 AM\|SETI@home\|Deferring communication for 11 sec 8/5/2007 10:10:17 AM\|SETI@home\|Reason: requested by project ID: 615210 ·

Dirk Sadowski Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 615333 - Posted: 5 Aug 2007, 16:54:41 UTC I was 'little interested', why my results are got 'validate errors', so I took a little time.. and looked to my online available results and in 'stdoutdae.txt' and I saw: An example for all 3 available results: 2007-08-03 04:00:43 [SETI@home] [file_xfer] Started upload of file 20jn00aa.3173.11457.542316.3.176_1_0 2007-08-03 04:00:50 [SETI@home] [error] Error on file upload: no command 2007-08-03 04:00:50 [SETI@home] [file_xfer] Permanently failed upload of 20jn00aa.3173.11457.542316.3.176_1_0 2007-08-03 04:00:50 [SETI@home] Giving up on upload of 20jn00aa.3173.11457.542316.3.176_1_0: server rejected file So it's a server problem and not a problem from the client, OR? ID: 615333 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 615370 - Posted: 5 Aug 2007, 18:26:52 UTC - in response to Message 615333. I was 'little interested', why my results are got 'validate errors', so I took a little time.. and looked to my online available results and in 'stdoutdae.txt' and I saw: An example for all 3 available results: 2007-08-03 04:00:43 [SETI@home] [file_xfer] Started upload of file 20jn00aa.3173.11457.542316.3.176_1_0 2007-08-03 04:00:50 [SETI@home] [error] Error on file upload: no command 2007-08-03 04:00:50 [SETI@home] [file_xfer] Permanently failed upload of 20jn00aa.3173.11457.542316.3.176_1_0 2007-08-03 04:00:50 [SETI@home] Giving up on upload of 20jn00aa.3173.11457.542316.3.176_1_0: server rejected file So it's a server problem and not a problem from the client, OR? The other possibility is garbled communication. The upload uses two POSTs, in the first one the "command" is <get_file_size> and in the second it's <file_upload>. If neither is found, that gives the "no command" error. Joe ID: 615370 ·

Dirk Sadowski Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 615433 - Posted: 5 Aug 2007, 21:08:34 UTC - in response to Message 615370. Last modified: 5 Aug 2007, 21:08:47 UTC The other possibility is garbled communication. The upload uses two POSTs, in the first one the "command" is <get_file_size> and in the second it's <file_upload>. If neither is found, that gives the "no command" error. Joe And how or what we could do that this don't happen? ID: 615433 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 615554 - Posted: 6 Aug 2007, 4:20:44 UTC - in response to Message 615433. The other possibility is garbled communication. The upload uses two POSTs, in the first one the "command" is <get_file_size> and in the second it's <file_upload>. If neither is found, that gives the "no command" error. Joe And how or what we could do that this don't happen? The first step in solving every problem is diagnosing it. I was going to look at your computers to see if there is anything obvious, but they're hidden. ID: 615554 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19716 Credit: 40,757,560 RAC: 67	Message 615596 - Posted: 6 Aug 2007, 5:54:31 UTC - in response to Message 615554. The other possibility is garbled communication. The upload uses two POSTs, in the first one the "command" is <get_file_size> and in the second it's <file_upload>. If neither is found, that gives the "no command" error. Joe And how or what we could do that this don't happen? The first step in solving every problem is diagnosing it. I was going to look at your computers to see if there is anything obvious, but they're hidden. But from one of the Validation errors you posted resultid=582062180 I think you might try a bit less over-clocking, and/or checking stability with prime95 or similar. Andy ID: 615596 ·

Dirk Sadowski Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 615915 - Posted: 6 Aug 2007, 21:18:43 UTC - in response to Message 615596. Last modified: 6 Aug 2007, 21:21:04 UTC The other possibility is garbled communication. The upload uses two POSTs, in the first one the "command" is <get_file_size> and in the second it's <file_upload>. If neither is found, that gives the "no command" error. Joe And how or what we could do that this don't happen? The first step in solving every problem is diagnosing it. I was going to look at your computers to see if there is anything obvious, but they're hidden. But from one of the Validation errors you posted resultid=582062180 I think you might try a bit less over-clocking, and/or checking stability with prime95 or similar. Andy No.. no.. the OC is O.K. .. :-) The three available results are 'server rejected file' errors! I posted it here. SETI@home is not a good test-program to look it's stable? ;-) But Prime95, what is this? This is an other BOINC project? Or now it's named PrimeGrid? I had let run memtest86+ V1.70 and it was well. ID: 615915 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 615920 - Posted: 6 Aug 2007, 21:22:51 UTC - in response to Message 615915. No.. no.. the OC is O.K. .. :-) SETI@home is not a good test-program to look it's stable? ;-) But Prime95, what is this? This is an other BOINC project? Or now it's named PrimeGrid? I had let run memtest85+ V1.70 and it was well. To verify that SETI is not the problem, another CPU stress tester is always a good idea to cross-verify results. Prime95 is a different, stand-alone application that stresses the CPU just like SETI@Home does. I think they have a BOINC project, but that wouldn't remove BOINC as a possible point of failure so it's best to use the stand-alone program. If you get errors with Prime95 too, then there's a good chance your overclock is too aggressive. ID: 615920 ·

Dirk Sadowski Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 615926 - Posted: 6 Aug 2007, 21:32:29 UTC - in response to Message 615920. To verify that SETI is not the problem, another CPU stress tester is always a good idea to cross-verify results. Prime95 is a different, stand-alone application that stresses the CPU just like SETI@Home does. I think they have a BOINC project, but that wouldn't remove BOINC as a possible point of failure so it's best to use the stand-alone program. If you get errors with Prime95 too, then there's a good chance your overclock is too aggressive. I saw it like this.. If I have a 'validate error', it's because of the server.. And if I have a 'client error', it's because of to much OC.. I saw it right or wrong? I OC the Intel Core2 Extreme QX6700 from 2.66 to 3.17 GHz, so it's not so much.. You must ask msattler because of his OC! ;-) Where I can get Prime95? ID: 615926 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 615931 - Posted: 6 Aug 2007, 21:37:29 UTC - in response to Message 615926. To verify that SETI is not the problem, another CPU stress tester is always a good idea to cross-verify results. Prime95 is a different, stand-alone application that stresses the CPU just like SETI@Home does. I think they have a BOINC project, but that wouldn't remove BOINC as a possible point of failure so it's best to use the stand-alone program. If you get errors with Prime95 too, then there's a good chance your overclock is too aggressive. I saw it like this.. If I have a 'validate error', it's because of the server.. And if I have a 'client error', it's because of to much OC.. I saw it right or wrong? I OC the Intel Core2 Extreme QX6700 from 2.66 to 3.17 GHz, so it's not so much.. You must ask msattler because of his OC! ;-) Where I can get Prime95? You seem overly focused on finding fault, and not focused at all on diagnosing and fixing the problem. Overclocking is the process of getting more performance by reducing the "margins" -- getting closer to the 'edge' of the signal's rise and/or fall (moving away from solid, stable 1's and 0's toward 0.7's and 0.3's). How much you can overclock depends on a lot of factors, not just the CPU. We'd like to look at your computers if you'd like our help. ID: 615931 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 615933 - Posted: 6 Aug 2007, 21:38:41 UTC Last modified: 6 Aug 2007, 21:40:08 UTC Well I think it's safe to say if it's a compute error, then in all likleyhood it's due to the OC, especially if it goes away when you back off. However, you cannot say the same thing about a validate error. It might be due to a server issue losing the output files for one reason or another. OTOH, it could just as easily be due to subtle calculational errors from the OC which don't generate a 'hard' error. Alinator ID: 615933 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 615934 - Posted: 6 Aug 2007, 21:40:52 UTC - in response to Message 615926. I saw it like this.. If I have a 'validate error', it's because of the server.. And if I have a 'client error', it's because of to much OC.. I saw it right or wrong? Not necessarily. Always double check your work and cross reference your results. I OC the Intel Core2 Extreme QX6700 from 2.66 to 3.17 GHz, so it's not so much.. You must ask msattler because of his OC! ;-) Unless you're running the same setup MSattler is (including, most importantly, the same cooling setup he is), I don't think you can make a direct comparison. Where I can get Prime95? Here. ID: 615934 ·

Dirk Sadowski Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 615960 - Posted: 6 Aug 2007, 21:56:58 UTC Last modified: 6 Aug 2007, 22:01:02 UTC Thanks a lot for help! I'll look in future more in 'stdoutdae.txt', that I know it's a server prob or maybe a OC prob. And maybe I'll let run Prime95. @ Ned Ludd You are funny, your PCs are hidden too! ;-) ID: 615960 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.