Astropulse Errors II-Optimized version 5.03!

Message boards : Number crunching : Astropulse Errors II-Optimized version 5.03!
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7

AuthorMessage
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 899539 - Posted: 25 May 2009, 23:22:16 UTC - in response to Message 899365.  

Byron, I think since the second man got in before you he got the credit. It's not a problem with your machine. If I remember correctly, you might be able to get credit for it but it has to be done manually and before the WUs disappear from the database. I think Eric has a program he has to run for that.

Ok, I figured the same, but it seems strange that would be the norm if that's the case.
Since I didn't even begin the task until days after he had reported his task, I'm surprised there isn't something in Boinc to catch that, and keep someone from wasting their time doing a task there's no way to get validated.

thanks for the help.

Your task was issued because the second host missed deadline, but that host did complete before you did and its result was valid. The BOINC servers are supposed to wait for all sent tasks to be reported (or miss deadline) before deleting the canonical result, that way the last results reported can be checked. Perhaps the AP validator is returning a wrong status, perhaps its code to check another result against an already chosen canonical result has a flaw. There have been multiple reports where the last task is being declared invalid if a late return has caused a reissue, see Gomeyer's post earlier in this thread and followups. Eric said he would investigate, I don't know if he was interrupted by some crisis or what.

There is a BOINC option which will tell a host to abort unstarted tasks if they're no longer needed, but it isn't enabled for this project (too much database load apparently).
                                                                Joe
ID: 899539 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 900100 - Posted: 27 May 2009, 17:27:15 UTC - in response to Message 899539.  
Last modified: 27 May 2009, 17:29:38 UTC

The BOINC servers are supposed to wait for all sent tasks to be reported (or miss deadline) before deleting the canonical result, that way the last results reported can be checked. Perhaps the AP validator is returning a wrong status, perhaps its code to check another result against an already chosen canonical result has a flaw.

Hi Josef,

Thanks for the info. I guess my mis-understanding comes from my thinking there wouldn't be a canonical result until all the task were completed.
Edit" or maybe it did and because of the problems lately I just was not able to see that.
Thanks again.
ID: 900100 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 900127 - Posted: 27 May 2009, 18:07:25 UTC

Yeah, I fell victim to a good result getting no credit due to this same situation. 0 reported on time, 1 went 2 days over the deadline, I was 3. I returned mine in 4 days, but got no credit because a canonical result was chosen between 0 and 1, and there was nothing for 2 to compare to.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 900127 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 900134 - Posted: 27 May 2009, 18:28:32 UTC - in response to Message 900127.  

Yeah, I fell victim to a good result getting no credit due to this same situation. 0 reported on time, 1 went 2 days over the deadline, I was 3. I returned mine in 4 days, but got no credit because a canonical result was chosen between 0 and 1, and there was nothing for 2 to compare to.

The frustrating part of this for me is not knowing for sure if I actaully returned a bad result, or if it is because of the person being late, reporting before me. I don't care much for wondering if I've got a problem with my pc, so I've already done some test and cleaned it out again. I'm now starting to believe it was just one of those things, that my pc had little to do with, other than getting the task in later than the wingman.

If nothing else I'll check a bit closer and more often on resends sent because of a "no response" wigman, and if they get it back in after it's been assigned to me I can abort it and not waste any more time on it.
ID: 900134 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 904582 - Posted: 6 Jun 2009, 23:24:20 UTC

statefile is 0'd at about 14% on resultid=1245053854, completed after that.
ID: 904582 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 905202 - Posted: 8 Jun 2009, 14:46:40 UTC
Last modified: 8 Jun 2009, 14:59:52 UTC

Don't know if this AP 5.03 workunit 441277129 has a problem or not.

Completed, validation inconclusive for two returned results, third one then issued. One seems to be stock app and mine is R112 SSE3 opt. app.

Claimed credit is identical albeit CPU time is quite different.

Opt app shows :

single pulses: 1
repetitive pulses: 0
percent blanked: 0.00

Stock app does not show any results in the stderr out section. Such a long time since I ran the stock app I can't remember if this is normal?

Same here 449120228 - had changed BOINC from 6.6.28 to 6.6.31.

and earlier 434094359

All report outcome=success.
GPU Users Group



ID: 905202 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 905228 - Posted: 8 Jun 2009, 16:01:15 UTC

The stock AP app doesn't report any signal counts in stderr. It would be some help in analyzing these situations, though without actually having the details in the two result files to look at there's no way to really tell why the comparison was inconclusive.

I note that all 3 you've flagged have at least one repetitive pulse found, while 3 of the 4 in your task list which validated at first try have no repetitive pulses. The 5.03 AP_v5 apps (both stock and optimized) have a slightly imprecise method of calculating the frequencies for pulse folding which has been fixed for 5.05+, it might be the cause of these inconclusive comparisons. If that's the case, the differences are scientifically meaningless; in effect the validator is insisting on a level of accuracy which can't always be met with 5.03 code.
                                                               Joe
ID: 905228 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 905270 - Posted: 8 Jun 2009, 19:03:41 UTC - in response to Message 905228.  

The stock AP app doesn't report any signal counts in stderr. It would be some help in analyzing these situations, though without actually having the details in the two result files to look at there's no way to really tell why the comparison was inconclusive.

I note that all 3 you've flagged have at least one repetitive pulse found, while 3 of the 4 in your task list which validated at first try have no repetitive pulses. The 5.03 AP_v5 apps (both stock and optimized) have a slightly imprecise method of calculating the frequencies for pulse folding which has been fixed for 5.05+, it might be the cause of these inconclusive comparisons. If that's the case, the differences are scientifically meaningless; in effect the validator is insisting on a level of accuracy which can't always be met with 5.03 code.
                                                               Joe


I only spotted these out of idle curiosity when looking to see the reason for some longish outstanding AP WUs so don't know how many more have gone unnoticed or how big a problem it is.

I assume it is then pot luck as to which app returns the next result whether it is a better match for mine or the other one.

I haven't seen any notices about the new release - is it still a way off yet?


John.

GPU Users Group



ID: 905270 · Report as offensive
Kathy
Avatar

Send message
Joined: 5 Jan 03
Posts: 338
Credit: 27,877,436
RAC: 0
United States
Message 909735 - Posted: 21 Jun 2009, 1:14:51 UTC - in response to Message 875031.  
Last modified: 21 Jun 2009, 1:17:20 UTC

I've had a few AP's not accepted but am not understanding why. This one was originally marked validation inconclusive and then invalid when the third one came in. It looks substantially like the original in the quorum and the outcome was success.

http://setiathome.berkeley.edu/workunit.php?wuid=450431622

Thanks!

Nevermind says it can't find WU from the link, it was deleted as I typed.
ID: 909735 · Report as offensive
Profile Evil Decepticon
Avatar

Send message
Joined: 12 Jul 09
Posts: 23
Credit: 15,847
RAC: 0
Spain
Message 919300 - Posted: 19 Jul 2009, 11:50:56 UTC

Astropulses are HUGE, my computer spends a lot of time to process only one, more than 24 hours. Amazing.
ID: 919300 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 919320 - Posted: 19 Jul 2009, 12:59:29 UTC - in response to Message 909735.  

I've had a few AP's not accepted but am not understanding why. This one was originally marked validation inconclusive and then invalid when the third one came in. It looks substantially like the original in the quorum and the outcome was success.

http://setiathome.berkeley.edu/workunit.php?wuid=450431622

Thanks!

Nevermind says it can't find WU from the link, it was deleted as I typed.


Your computers are hidden so can't see what is occuring and as you say that result has now gone. Have you added V5.05 to your app_info.xml? If so have you also added the new V5.05 app or just reused the V5.03 app? The results from the two are incompatible so won't validate.
GPU Users Group



ID: 919320 · Report as offensive
StanJaz
Volunteer tester

Send message
Joined: 31 Jul 99
Posts: 6
Credit: 13,299,316
RAC: 0
United States
Message 923045 - Posted: 1 Aug 2009, 21:24:06 UTC - in response to Message 900127.  

I have had the same thing happen to me before. I received another ap unit with a late wingman so I bookmarked the workunit. I checked the page today and it was returned over a month late and received credit. Since I would rather lose the 3 hours I worked on it already than the full 16 to 18 hours I aborted it. I do not usually abort workunits but in this case it was the right thing to do. It was sent to a 450 MHz computer. Why?

http://setiweb.ssl.berkeley.edu/workunit.php?wuid=438140058

Stan

Here is the 1 month late wingman's info.

Owner boris
Created 24 Nov 2005 15:16:13 UTC
Total credit 43,951
Average credit 195.50
Cross project credit
CPU type GenuineIntel
x86 Family 6 Model 5 Stepping 2 450MHz
Number of processors 2
Operating System Microsoft Windows XP
Professional Edition, Service Pack 2, (05.01.2600.00)


Name ap_19mr09ab_B4_P1_00021_20090430_05961.wu_3
Workunit 438140058
Created 30 May 2009 10:57:58 UTC
Sent 30 May 2009 10:58:00 UTC
Received 1 Aug 2009 4:10:09 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 1768419
Report deadline 29 Jun 2009 10:58:00 UTC
Run time 4752053
[url][/url]
ID: 923045 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 923174 - Posted: 2 Aug 2009, 15:13:26 UTC - in response to Message 923045.  

I have had the same thing happen to me before. I received another ap unit with a late wingman so I bookmarked the workunit. I checked the page today and it was returned over a month late and received credit. Since I would rather lose the 3 hours I worked on it already than the full 16 to 18 hours I aborted it. I do not usually abort workunits but in this case it was the right thing to do. It was sent to a 450 MHz computer. Why?
...

Based on the rsc_fpops_est in the WU and the host's benchmarks, time stats, and Duration Correction Factor (DCF), the BOINC Scheduler judged the host could do the task in 23 days or less. The combination of PIII hosts benchmarking rather high relative to actual S@H crunching capability and DCF based on MB work rather than AP work made the estimate wrong, so it actually took 55 days using Astropulse v5 5.03.

That long run will have more than doubled the DCF so that host won't get any more AP work until MB work gradually reduces it again. When it does, it'll be AP_v505 work which has the same rsc_fpops_est but the application is faster; the host might still miss deadline but by a smaller amount.

The factor of 1.3 in BOINC (called hard_app) by which the raw estimate for AP work is increased before comparing to the 30 day deadline was clearly inadequate for this case. There was also a change made in the ap_splitter code to 25 day deadline which has only been applied at SETI Beta so far.

Kudos to you for allowing that host to complete and get credit and for not wasting your host's time just to get credits, as you say it was the right thing to do.
                                                               Joe
ID: 923174 · Report as offensive
Profile Henk Haneveld
Volunteer tester

Send message
Joined: 16 May 99
Posts: 154
Credit: 1,577,293
RAC: 1
Netherlands
Message 923180 - Posted: 2 Aug 2009, 16:24:10 UTC - in response to Message 923174.  

The factor of 1.3 in BOINC (called hard_app) by which the raw estimate for AP work is increased before comparing to the 30 day deadline was clearly inadequate for this case. There was also a change made in the ap_splitter code to 25 day deadline which has only been applied at SETI Beta so far.


I find it strange there is a possibility that AP deadlines will get shorter when the MB deadlines have been doubled with the recent change.

There are now MB results with a deadline off something like 50 days with a runtime less then a quarter off a AP result.

It would be beter to cut back on the deadlines for MB.
ID: 923180 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 923204 - Posted: 2 Aug 2009, 17:57:06 UTC - in response to Message 923180.  

The factor of 1.3 in BOINC (called hard_app) by which the raw estimate for AP work is increased before comparing to the 30 day deadline was clearly inadequate for this case. There was also a change made in the ap_splitter code to 25 day deadline which has only been applied at SETI Beta so far.

I find it strange there is a possibility that AP deadlines will get shorter when the MB deadlines have been doubled with the recent change.

There are now MB results with a deadline off something like 50 days with a runtime less then a quarter off a AP result.

It would be beter to cut back on the deadlines for MB.

The AP deadline reduction was specifically intended to ensure a slow host would not get AP work when first attaching to the project. As the scenario also involved a BOINC version change, that should be a rare occurrence but still worth avoiding. Overall, the effect would be that some hosts which are now eligible for both MB and AP could only get MB after the change.

Reducing MB deadlines would eject hosts from the project. You're entitled to your own opinion on whether that's desirable. I think it isn't. My guess is the project will only move that direction if too many users get greedy and increase their caches to the maximum which has become practical with the new minimum deadline of nearly 14 days.
                                                                 Joe
ID: 923204 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7

Message boards : Number crunching : Astropulse Errors II-Optimized version 5.03!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.