Astropulse Errors II-Optimized version 5.03!

Author	Message
Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 899539 - Posted: 25 May 2009, 23:22:16 UTC - in response to Message 899365. Byron, I think since the second man got in before you he got the credit. It's not a problem with your machine. If I remember correctly, you might be able to get credit for it but it has to be done manually and before the WUs disappear from the database. I think Eric has a program he has to run for that. Ok, I figured the same, but it seems strange that would be the norm if that's the case. Since I didn't even begin the task until days after he had reported his task, I'm surprised there isn't something in Boinc to catch that, and keep someone from wasting their time doing a task there's no way to get validated. thanks for the help. Your task was issued because the second host missed deadline, but that host did complete before you did and its result was valid. The BOINC servers are supposed to wait for all sent tasks to be reported (or miss deadline) before deleting the canonical result, that way the last results reported can be checked. Perhaps the AP validator is returning a wrong status, perhaps its code to check another result against an already chosen canonical result has a flaw. There have been multiple reports where the last task is being declared invalid if a late return has caused a reissue, see Gomeyer's post earlier in this thread and followups. Eric said he would investigate, I don't know if he was interrupted by some crisis or what. There is a BOINC option which will tell a host to abort unstarted tasks if they're no longer needed, but it isn't enabled for this project (too much database load apparently). Joe ID: 899539 ·

Byron S Goodgame Volunteer tester Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0	Message 900100 - Posted: 27 May 2009, 17:27:15 UTC - in response to Message 899539. Last modified: 27 May 2009, 17:29:38 UTC The BOINC servers are supposed to wait for all sent tasks to be reported (or miss deadline) before deleting the canonical result, that way the last results reported can be checked. Perhaps the AP validator is returning a wrong status, perhaps its code to check another result against an already chosen canonical result has a flaw. Hi Josef, Thanks for the info. I guess my mis-understanding comes from my thinking there wouldn't be a canonical result until all the task were completed. Edit" or maybe it did and because of the problems lately I just was not able to see that. Thanks again. ID: 900100 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 900127 - Posted: 27 May 2009, 18:07:25 UTC Yeah, I fell victim to a good result getting no credit due to this same situation. 0 reported on time, 1 went 2 days over the deadline, I was 3. I returned mine in 4 days, but got no credit because a canonical result was chosen between 0 and 1, and there was nothing for 2 to compare to. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 900127 ·

Byron S Goodgame Volunteer tester Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0	Message 900134 - Posted: 27 May 2009, 18:28:32 UTC - in response to Message 900127. Yeah, I fell victim to a good result getting no credit due to this same situation. 0 reported on time, 1 went 2 days over the deadline, I was 3. I returned mine in 4 days, but got no credit because a canonical result was chosen between 0 and 1, and there was nothing for 2 to compare to. The frustrating part of this for me is not knowing for sure if I actaully returned a bad result, or if it is because of the person being late, reporting before me. I don't care much for wondering if I've got a problem with my pc, so I've already done some test and cleaned it out again. I'm now starting to believe it was just one of those things, that my pc had little to do with, other than getting the task in later than the wingman. If nothing else I'll check a bit closer and more often on resends sent because of a "no response" wigman, and if they get it back in after it's been assigned to me I can abort it and not waste any more time on it. ID: 900134 ·

Byron S Goodgame Volunteer tester Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0	Message 904582 - Posted: 6 Jun 2009, 23:24:20 UTC statefile is 0'd at about 14% on resultid=1245053854, completed after that. ID: 904582 ·

Questor Volunteer tester Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157	Message 905202 - Posted: 8 Jun 2009, 14:46:40 UTC Last modified: 8 Jun 2009, 14:59:52 UTC Don't know if this AP 5.03 workunit 441277129 has a problem or not. Completed, validation inconclusive for two returned results, third one then issued. One seems to be stock app and mine is R112 SSE3 opt. app. Claimed credit is identical albeit CPU time is quite different. Opt app shows : single pulses: 1 repetitive pulses: 0 percent blanked: 0.00 Stock app does not show any results in the stderr out section. Such a long time since I ran the stock app I can't remember if this is normal? Same here 449120228 - had changed BOINC from 6.6.28 to 6.6.31. and earlier 434094359 All report outcome=success. GPU Users Group ID: 905202 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 905228 - Posted: 8 Jun 2009, 16:01:15 UTC The stock AP app doesn't report any signal counts in stderr. It would be some help in analyzing these situations, though without actually having the details in the two result files to look at there's no way to really tell why the comparison was inconclusive. I note that all 3 you've flagged have at least one repetitive pulse found, while 3 of the 4 in your task list which validated at first try have no repetitive pulses. The 5.03 AP_v5 apps (both stock and optimized) have a slightly imprecise method of calculating the frequencies for pulse folding which has been fixed for 5.05+, it might be the cause of these inconclusive comparisons. If that's the case, the differences are scientifically meaningless; in effect the validator is insisting on a level of accuracy which can't always be met with 5.03 code. Joe ID: 905228 ·

Questor Volunteer tester Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157	Message 905270 - Posted: 8 Jun 2009, 19:03:41 UTC - in response to Message 905228. The stock AP app doesn't report any signal counts in stderr. It would be some help in analyzing these situations, though without actually having the details in the two result files to look at there's no way to really tell why the comparison was inconclusive. I note that all 3 you've flagged have at least one repetitive pulse found, while 3 of the 4 in your task list which validated at first try have no repetitive pulses. The 5.03 AP_v5 apps (both stock and optimized) have a slightly imprecise method of calculating the frequencies for pulse folding which has been fixed for 5.05+, it might be the cause of these inconclusive comparisons. If that's the case, the differences are scientifically meaningless; in effect the validator is insisting on a level of accuracy which can't always be met with 5.03 code. Joe I only spotted these out of idle curiosity when looking to see the reason for some longish outstanding AP WUs so don't know how many more have gone unnoticed or how big a problem it is. I assume it is then pot luck as to which app returns the next result whether it is a better match for mine or the other one. I haven't seen any notices about the new release - is it still a way off yet? John. GPU Users Group ID: 905270 ·

Kathy Send message Joined: 5 Jan 03 Posts: 338 Credit: 27,877,436 RAC: 0	Message 909735 - Posted: 21 Jun 2009, 1:14:51 UTC - in response to Message 875031. Last modified: 21 Jun 2009, 1:17:20 UTC I've had a few AP's not accepted but am not understanding why. This one was originally marked validation inconclusive and then invalid when the third one came in. It looks substantially like the original in the quorum and the outcome was success. http://setiathome.berkeley.edu/workunit.php?wuid=450431622 Thanks! Nevermind says it can't find WU from the link, it was deleted as I typed. ID: 909735 ·

Evil Decepticon Send message Joined: 12 Jul 09 Posts: 23 Credit: 15,847 RAC: 0	Message 919300 - Posted: 19 Jul 2009, 11:50:56 UTC Astropulses are HUGE, my computer spends a lot of time to process only one, more than 24 hours. Amazing. ID: 919300 ·

Questor Volunteer tester Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157	Message 919320 - Posted: 19 Jul 2009, 12:59:29 UTC - in response to Message 909735. I've had a few AP's not accepted but am not understanding why. This one was originally marked validation inconclusive and then invalid when the third one came in. It looks substantially like the original in the quorum and the outcome was success. http://setiathome.berkeley.edu/workunit.php?wuid=450431622 Thanks! Nevermind says it can't find WU from the link, it was deleted as I typed. Your computers are hidden so can't see what is occuring and as you say that result has now gone. Have you added V5.05 to your app_info.xml? If so have you also added the new V5.05 app or just reused the V5.03 app? The results from the two are incompatible so won't validate. GPU Users Group ID: 919320 ·

StanJaz Volunteer tester Send message Joined: 31 Jul 99 Posts: 6 Credit: 13,299,316 RAC: 0	Message 923045 - Posted: 1 Aug 2009, 21:24:06 UTC - in response to Message 900127. I have had the same thing happen to me before. I received another ap unit with a late wingman so I bookmarked the workunit. I checked the page today and it was returned over a month late and received credit. Since I would rather lose the 3 hours I worked on it already than the full 16 to 18 hours I aborted it. I do not usually abort workunits but in this case it was the right thing to do. It was sent to a 450 MHz computer. Why? http://setiweb.ssl.berkeley.edu/workunit.php?wuid=438140058 Stan Here is the 1 month late wingman's info. Owner boris Created 24 Nov 2005 15:16:13 UTC Total credit 43,951 Average credit 195.50 Cross project credit CPU type GenuineIntel x86 Family 6 Model 5 Stepping 2 450MHz Number of processors 2 Operating System Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00) Name ap_19mr09ab_B4_P1_00021_20090430_05961.wu_3 Workunit 438140058 Created 30 May 2009 10:57:58 UTC Sent 30 May 2009 10:58:00 UTC Received 1 Aug 2009 4:10:09 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 1768419 Report deadline 29 Jun 2009 10:58:00 UTC Run time 4752053 [url][/url] ID: 923045 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 923174 - Posted: 2 Aug 2009, 15:13:26 UTC - in response to Message 923045. I have had the same thing happen to me before. I received another ap unit with a late wingman so I bookmarked the workunit. I checked the page today and it was returned over a month late and received credit. Since I would rather lose the 3 hours I worked on it already than the full 16 to 18 hours I aborted it. I do not usually abort workunits but in this case it was the right thing to do. It was sent to a 450 MHz computer. Why? ... Based on the rsc_fpops_est in the WU and the host's benchmarks, time stats, and Duration Correction Factor (DCF), the BOINC Scheduler judged the host could do the task in 23 days or less. The combination of PIII hosts benchmarking rather high relative to actual S@H crunching capability and DCF based on MB work rather than AP work made the estimate wrong, so it actually took 55 days using Astropulse v5 5.03. That long run will have more than doubled the DCF so that host won't get any more AP work until MB work gradually reduces it again. When it does, it'll be AP_v505 work which has the same rsc_fpops_est but the application is faster; the host might still miss deadline but by a smaller amount. The factor of 1.3 in BOINC (called hard_app) by which the raw estimate for AP work is increased before comparing to the 30 day deadline was clearly inadequate for this case. There was also a change made in the ap_splitter code to 25 day deadline which has only been applied at SETI Beta so far. Kudos to you for allowing that host to complete and get credit and for not wasting your host's time just to get credits, as you say it was the right thing to do. Joe ID: 923174 ·

Henk Haneveld Volunteer tester Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1	Message 923180 - Posted: 2 Aug 2009, 16:24:10 UTC - in response to Message 923174. The factor of 1.3 in BOINC (called hard_app) by which the raw estimate for AP work is increased before comparing to the 30 day deadline was clearly inadequate for this case. There was also a change made in the ap_splitter code to 25 day deadline which has only been applied at SETI Beta so far. I find it strange there is a possibility that AP deadlines will get shorter when the MB deadlines have been doubled with the recent change. There are now MB results with a deadline off something like 50 days with a runtime less then a quarter off a AP result. It would be beter to cut back on the deadlines for MB. ID: 923180 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 923204 - Posted: 2 Aug 2009, 17:57:06 UTC - in response to Message 923180. The factor of 1.3 in BOINC (called hard_app) by which the raw estimate for AP work is increased before comparing to the 30 day deadline was clearly inadequate for this case. There was also a change made in the ap_splitter code to 25 day deadline which has only been applied at SETI Beta so far. I find it strange there is a possibility that AP deadlines will get shorter when the MB deadlines have been doubled with the recent change. There are now MB results with a deadline off something like 50 days with a runtime less then a quarter off a AP result. It would be beter to cut back on the deadlines for MB. The AP deadline reduction was specifically intended to ensure a slow host would not get AP work when first attaching to the project. As the scenario also involved a BOINC version change, that should be a rare occurrence but still worth avoiding. Overall, the effect would be that some hosts which are now eligible for both MB and AP could only get MB after the change. Reducing MB deadlines would eject hosts from the project. You're entitled to your own opinion on whether that's desirable. I think it isn't. My guess is the project will only move that direction if too many users get greedy and increase their caches to the maximum which has become practical with the new minimum deadline of nearly 14 days. Joe ID: 923204 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.