Message boards :
Number crunching :
Astropulse Errors II-Optimized version 5.03!
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7
Author | Message |
---|---|
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
Byron, I think since the second man got in before you he got the credit. It's not a problem with your machine. If I remember correctly, you might be able to get credit for it but it has to be done manually and before the WUs disappear from the database. I think Eric has a program he has to run for that. Your task was issued because the second host missed deadline, but that host did complete before you did and its result was valid. The BOINC servers are supposed to wait for all sent tasks to be reported (or miss deadline) before deleting the canonical result, that way the last results reported can be checked. Perhaps the AP validator is returning a wrong status, perhaps its code to check another result against an already chosen canonical result has a flaw. There have been multiple reports where the last task is being declared invalid if a late return has caused a reissue, see Gomeyer's post earlier in this thread and followups. Eric said he would investigate, I don't know if he was interrupted by some crisis or what. There is a BOINC option which will tell a host to abort unstarted tasks if they're no longer needed, but it isn't enabled for this project (too much database load apparently). Joe |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
The BOINC servers are supposed to wait for all sent tasks to be reported (or miss deadline) before deleting the canonical result, that way the last results reported can be checked. Perhaps the AP validator is returning a wrong status, perhaps its code to check another result against an already chosen canonical result has a flaw. Hi Josef, Thanks for the info. I guess my mis-understanding comes from my thinking there wouldn't be a canonical result until all the task were completed. Edit" or maybe it did and because of the problems lately I just was not able to see that. Thanks again. |
Cosmic_Ocean ![]() Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 ![]() ![]() |
Yeah, I fell victim to a good result getting no credit due to this same situation. 0 reported on time, 1 went 2 days over the deadline, I was 3. I returned mine in 4 days, but got no credit because a canonical result was chosen between 0 and 1, and there was nothing for 2 to compare to. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
Yeah, I fell victim to a good result getting no credit due to this same situation. 0 reported on time, 1 went 2 days over the deadline, I was 3. I returned mine in 4 days, but got no credit because a canonical result was chosen between 0 and 1, and there was nothing for 2 to compare to. The frustrating part of this for me is not knowing for sure if I actaully returned a bad result, or if it is because of the person being late, reporting before me. I don't care much for wondering if I've got a problem with my pc, so I've already done some test and cleaned it out again. I'm now starting to believe it was just one of those things, that my pc had little to do with, other than getting the task in later than the wingman. If nothing else I'll check a bit closer and more often on resends sent because of a "no response" wigman, and if they get it back in after it's been assigned to me I can abort it and not waste any more time on it. |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
|
![]() ![]() ![]() ![]() ![]() Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157 ![]() ![]() |
Don't know if this AP 5.03 workunit 441277129 has a problem or not. Completed, validation inconclusive for two returned results, third one then issued. One seems to be stock app and mine is R112 SSE3 opt. app. Claimed credit is identical albeit CPU time is quite different. Opt app shows : single pulses: 1 repetitive pulses: 0 percent blanked: 0.00 Stock app does not show any results in the stderr out section. Such a long time since I ran the stock app I can't remember if this is normal? Same here 449120228 - had changed BOINC from 6.6.28 to 6.6.31. and earlier 434094359 All report outcome=success. GPU Users Group ![]() ![]() |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
The stock AP app doesn't report any signal counts in stderr. It would be some help in analyzing these situations, though without actually having the details in the two result files to look at there's no way to really tell why the comparison was inconclusive. I note that all 3 you've flagged have at least one repetitive pulse found, while 3 of the 4 in your task list which validated at first try have no repetitive pulses. The 5.03 AP_v5 apps (both stock and optimized) have a slightly imprecise method of calculating the frequencies for pulse folding which has been fixed for 5.05+, it might be the cause of these inconclusive comparisons. If that's the case, the differences are scientifically meaningless; in effect the validator is insisting on a level of accuracy which can't always be met with 5.03 code. Joe |
![]() ![]() ![]() ![]() ![]() Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157 ![]() ![]() |
The stock AP app doesn't report any signal counts in stderr. It would be some help in analyzing these situations, though without actually having the details in the two result files to look at there's no way to really tell why the comparison was inconclusive. I only spotted these out of idle curiosity when looking to see the reason for some longish outstanding AP WUs so don't know how many more have gone unnoticed or how big a problem it is. I assume it is then pot luck as to which app returns the next result whether it is a better match for mine or the other one. I haven't seen any notices about the new release - is it still a way off yet? John. GPU Users Group ![]() ![]() |
Kathy ![]() Send message Joined: 5 Jan 03 Posts: 338 Credit: 27,877,436 RAC: 0 ![]() |
I've had a few AP's not accepted but am not understanding why. This one was originally marked validation inconclusive and then invalid when the third one came in. It looks substantially like the original in the quorum and the outcome was success. http://setiathome.berkeley.edu/workunit.php?wuid=450431622 Thanks! Nevermind says it can't find WU from the link, it was deleted as I typed. |
![]() ![]() Send message Joined: 12 Jul 09 Posts: 23 Credit: 15,847 RAC: 0 ![]() |
Astropulses are HUGE, my computer spends a lot of time to process only one, more than 24 hours. Amazing. |
![]() ![]() ![]() ![]() ![]() Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157 ![]() ![]() |
I've had a few AP's not accepted but am not understanding why. This one was originally marked validation inconclusive and then invalid when the third one came in. It looks substantially like the original in the quorum and the outcome was success. Your computers are hidden so can't see what is occuring and as you say that result has now gone. Have you added V5.05 to your app_info.xml? If so have you also added the new V5.05 app or just reused the V5.03 app? The results from the two are incompatible so won't validate. GPU Users Group ![]() ![]() |
StanJaz Send message Joined: 31 Jul 99 Posts: 6 Credit: 13,299,316 RAC: 0 ![]() |
I have had the same thing happen to me before. I received another ap unit with a late wingman so I bookmarked the workunit. I checked the page today and it was returned over a month late and received credit. Since I would rather lose the 3 hours I worked on it already than the full 16 to 18 hours I aborted it. I do not usually abort workunits but in this case it was the right thing to do. It was sent to a 450 MHz computer. Why? http://setiweb.ssl.berkeley.edu/workunit.php?wuid=438140058 Stan Here is the 1 month late wingman's info. Owner boris Created 24 Nov 2005 15:16:13 UTC Total credit 43,951 Average credit 195.50 Cross project credit CPU type GenuineIntel x86 Family 6 Model 5 Stepping 2 450MHz Number of processors 2 Operating System Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00) Name ap_19mr09ab_B4_P1_00021_20090430_05961.wu_3 Workunit 438140058 Created 30 May 2009 10:57:58 UTC Sent 30 May 2009 10:58:00 UTC Received 1 Aug 2009 4:10:09 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 1768419 Report deadline 29 Jun 2009 10:58:00 UTC Run time 4752053 [url][/url] |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
I have had the same thing happen to me before. I received another ap unit with a late wingman so I bookmarked the workunit. I checked the page today and it was returned over a month late and received credit. Since I would rather lose the 3 hours I worked on it already than the full 16 to 18 hours I aborted it. I do not usually abort workunits but in this case it was the right thing to do. It was sent to a 450 MHz computer. Why? Based on the rsc_fpops_est in the WU and the host's benchmarks, time stats, and Duration Correction Factor (DCF), the BOINC Scheduler judged the host could do the task in 23 days or less. The combination of PIII hosts benchmarking rather high relative to actual S@H crunching capability and DCF based on MB work rather than AP work made the estimate wrong, so it actually took 55 days using Astropulse v5 5.03. That long run will have more than doubled the DCF so that host won't get any more AP work until MB work gradually reduces it again. When it does, it'll be AP_v505 work which has the same rsc_fpops_est but the application is faster; the host might still miss deadline but by a smaller amount. The factor of 1.3 in BOINC (called hard_app) by which the raw estimate for AP work is increased before comparing to the 30 day deadline was clearly inadequate for this case. There was also a change made in the ap_splitter code to 25 day deadline which has only been applied at SETI Beta so far. Kudos to you for allowing that host to complete and get credit and for not wasting your host's time just to get credits, as you say it was the right thing to do. Joe |
![]() Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 ![]() |
The factor of 1.3 in BOINC (called hard_app) by which the raw estimate for AP work is increased before comparing to the 30 day deadline was clearly inadequate for this case. There was also a change made in the ap_splitter code to 25 day deadline which has only been applied at SETI Beta so far. I find it strange there is a possibility that AP deadlines will get shorter when the MB deadlines have been doubled with the recent change. There are now MB results with a deadline off something like 50 days with a runtime less then a quarter off a AP result. It would be beter to cut back on the deadlines for MB. ![]() |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
The factor of 1.3 in BOINC (called hard_app) by which the raw estimate for AP work is increased before comparing to the 30 day deadline was clearly inadequate for this case. There was also a change made in the ap_splitter code to 25 day deadline which has only been applied at SETI Beta so far. The AP deadline reduction was specifically intended to ensure a slow host would not get AP work when first attaching to the project. As the scenario also involved a BOINC version change, that should be a rare occurrence but still worth avoiding. Overall, the effect would be that some hosts which are now eligible for both MB and AP could only get MB after the change. Reducing MB deadlines would eject hosts from the project. You're entitled to your own opinion on whether that's desirable. I think it isn't. My guess is the project will only move that direction if too many users get greedy and increase their caches to the maximum which has become practical with the new minimum deadline of nearly 14 days. Joe |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.