Astropulse, next workunit selection queue

Author	Message
Jim Wright Send message Joined: 3 Sep 99 Posts: 36 Credit: 54,761,763 RAC: 17	Message 795760 - Posted: 10 Aug 2008, 16:18:42 UTC Addition of astropulse workunits to work queues on machines having multiple processors has a small problem. Some adjustment in the order of selection on my local work queue would be nice. ie: Since astropulse (ap) workunits take significantly longer to process than Seti enhanced workunits, one of them (ap) should be shuffled to the head of the queue. This permits at least one ap workunit to be processed concurrently with the larger list of smaller/shorter Seti enhanced workunits. The present method permits Seti enhanced workunits to be exhausted before ap workunits can be selected for processing. When multiple processors are available and workunit distribution ceases (frequently on weekends), the ap workunit soon becomes the ONLY workunit being processed and much of my available capability remains idle until regular distribution is restored (usually on Monday). Thanks for listening, Jim Wright ID: 795760 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 795771 - Posted: 10 Aug 2008, 16:33:06 UTC - in response to Message 795760. Addition of astropulse workunits to work queues on machines having multiple processors has a small problem. Some adjustment in the order of selection on my local work queue would be nice. ie: Since astropulse (ap) workunits take significantly longer to process than Seti enhanced workunits, one of them (ap) should be shuffled to the head of the queue. This permits at least one ap workunit to be processed concurrently with the larger list of smaller/shorter Seti enhanced workunits. The present method permits Seti enhanced workunits to be exhausted before ap workunits can be selected for processing. When multiple processors are available and workunit distribution ceases (frequently on weekends), the ap workunit soon becomes the ONLY workunit being processed and much of my available capability remains idle until regular distribution is restored (usually on Monday). Thanks for listening, Jim Wright I think you'll find that given a little time, the current BOINC logic will tend to do this -- not explicitly, but just how the random work stacks up. ... and remember that BOINC is used by multiple projects, so if a change helps SETI, but messes up some other project, it's less likely to happen. Try (mid-week) setting your "report every" interval to 1, and your "extra days" setting to 10. That way you should always have at least 9 days of work on your machine, and a 2 day weekend outage won't be that significant. ID: 795771 ·

frodo Send message Joined: 24 Jul 08 Posts: 4 Credit: 680,614 RAC: 0	Message 795779 - Posted: 10 Aug 2008, 16:48:54 UTC - in response to Message 795760. Addition of astropulse workunits to work queues on machines having multiple processors has a small problem. Some adjustment in the order of selection on my local work queue would be nice. ie: Since astropulse (ap) workunits take significantly longer to process than Seti enhanced workunits, one of them (ap) should be shuffled to the head of the queue. This permits at least one ap workunit to be processed concurrently with the larger list of smaller/shorter Seti enhanced workunits. The present method permits Seti enhanced workunits to be exhausted before ap workunits can be selected for processing. When multiple processors are available and workunit distribution ceases (frequently on weekends), the ap workunit soon becomes the ONLY workunit being processed and much of my available capability remains idle until regular distribution is restored (usually on Monday). Thanks for listening, Jim Wright The meathod I use If I have Astropulse WU's and several MB I right click and drag across all WU's and suspend them I then resume the Astropulse And then restore the rest same way. Haveing never received anymore then one at a time so far. They always get this treatment. Go also for the midweek increase download. Michael. ID: 795779 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 795790 - Posted: 10 Aug 2008, 17:20:58 UTC - in response to Message 795779. The meathod I use If I have Astropulse WU's and several MB I right click and drag across all WU's and suspend them I then resume the Astropulse And then restore the rest same way. Haveing never received anymore then one at a time so far. They always get this treatment. Go also for the midweek increase download. Michael. Michael, What happens if you just leave them alone? -- Ned ID: 795790 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 795791 - Posted: 10 Aug 2008, 17:24:09 UTC - in response to Message 795771. Last modified: 10 Aug 2008, 17:24:28 UTC Try (mid-week) setting your "report every" interval to 1, and your "extra days" setting to 10. That way you should always have at least 9 days of work on your machine, and a 2 day weekend outage won't be that significant. I would be cautious about going for the full 10 days, even if SETI is your only project. There's still a bit of an imbalance between the task-duration estimates for MultiBeam and Astropulse, especially if you use an optimised MB application. And we don't yet know how the work-flow balance will settle down when the project (eventually) gets back on an even keel. The worst-case scenario is a long run of VHAR 'shorty' MB tasks. With a 10-day cache setting, these will run in EDF, and drive your duration correction factor down - to about 0.1 on a Core2. When the last one passes through the system, you'll put in a massive work request. If that happens to coincide with a batch of AP at the front of the queue, BOINC will accept what it thinks is 10 days of work (the deadline allows that), but will find that it takes 40 days to crunch them (because the correction factor for AP on the same machine should be 0.4). That's a lot of deadline misses. Please don't advise people to increase their caches too far, Ned, at least in these early days of AP. 3 or 4 days should be enough to achieve the objective, without risking problems further down the line. ID: 795791 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 795793 - Posted: 10 Aug 2008, 17:40:04 UTC - in response to Message 795791. Try (mid-week) setting your "report every" interval to 1, and your "extra days" setting to 10. That way you should always have at least 9 days of work on your machine, and a 2 day weekend outage won't be that significant. I would be cautious about going for the full 10 days, even if SETI is your only project. <snip> Please don't advise people to increase their caches too far, Ned, at least in these early days of AP. 3 or 4 days should be enough to achieve the objective, without risking problems further down the line. I agree that this could cause the average cruncher some difficulties, but.... As an experiment, you can stop BOINC, hand-edit the duration correction factor to something very small (like 100 times smaller) and restart BOINC. BOINC will massively over-estimate the amount of work needed. When it finishes the first WU it will start adjusting DCF "up" and fairly quickly realize it has deadline issues and start in on the work in deadline order. It recovers well. Besides, we aren't talking about the average cruncher here, we're talking about the people who are obsessing over watching what BOINC does, and who have an overwhelming need to "run" BOINC (as opposed to just letting it run). The ones who don't watch BOINC closely don't have kittens just because the CPU is idle for an hour. So, based on some experimentation, I don't see 10 days as a huge risk. It's a fun experiment if you want to see how BOINC handles difficult situations. ID: 795793 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 795795 - Posted: 10 Aug 2008, 17:53:40 UTC - in response to Message 795793. .... As an experiment, you can stop BOINC, hand-edit the duration correction factor to something very small (like 100 times smaller) and restart BOINC. BOINC will massively over-estimate the amount of work needed. When it finishes the first WU it will start adjusting DCF "up" and fairly quickly realize it has deadline issues and start in on the work in deadline order. .... So, based on some experimentation, I don't see 10 days as a huge risk. It's a fun experiment if you want to see how BOINC handles difficult situations. With the caveat that it's a fun experiment, undertaken at your own risk, I have no problem with the advice. But I would still add two comments: 1) It won't "start" adjusting DCF upwards after the first WU - it'll do it in one big jump. But.... 2) Where AP is concerned, that first WU won't be done for 40 hours or more - and BOINC doesn't adjust DCF until the first WU finishes. ID: 795795 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 795797 - Posted: 10 Aug 2008, 17:58:38 UTC - in response to Message 795795. .... As an experiment, you can stop BOINC, hand-edit the duration correction factor to something very small (like 100 times smaller) and restart BOINC. BOINC will massively over-estimate the amount of work needed. When it finishes the first WU it will start adjusting DCF "up" and fairly quickly realize it has deadline issues and start in on the work in deadline order. .... So, based on some experimentation, I don't see 10 days as a huge risk. It's a fun experiment if you want to see how BOINC handles difficult situations. With the caveat that it's a fun experiment, undertaken at your own risk, I have no problem with the advice. But I would still add two comments: 1) It won't "start" adjusting DCF upwards after the first WU - it'll do it in one big jump. But.... 2) Where AP is concerned, that first WU won't be done for 40 hours or more - and BOINC doesn't adjust DCF until the first WU finishes. Worst part is AP also messes up MB work time estimates as well. What are 20 minute shorties on my system (and still take 20 minutes) are now estimated at 30 minutes. ID: 795797 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 795911 - Posted: 10 Aug 2008, 20:39:28 UTC - in response to Message 795795. .... As an experiment, you can stop BOINC, hand-edit the duration correction factor to something very small (like 100 times smaller) and restart BOINC. BOINC will massively over-estimate the amount of work needed. When it finishes the first WU it will start adjusting DCF "up" and fairly quickly realize it has deadline issues and start in on the work in deadline order. .... So, based on some experimentation, I don't see 10 days as a huge risk. It's a fun experiment if you want to see how BOINC handles difficult situations. With the caveat that it's a fun experiment, undertaken at your own risk, I have no problem with the advice. But I would still add two comments: 1) It won't "start" adjusting DCF upwards after the first WU - it'll do it in one big jump. But.... 2) Where AP is concerned, that first WU won't be done for 40 hours or more - and BOINC doesn't adjust DCF until the first WU finishes. Richard, As long as you get a mix of AP and Multibeam, then there is a chance that DCF won't adjust for 40 hours, but it doesn't seem very likely. That would only be true if your had one AP for each core, and BOINC handed each core an AP, and it would only stay true if deadlines were not approaching. The shorties have short deadlines and at some point BOINC will stop the "long" AP to give time to the shorties to meet deadlines. Then, DCF will start adjusting. My experiments made my computer look 100 times faster -- your concern was that 10 days of cache would actually be 40 days, which is only four times. I don't have a problem with 5 days if you think it is more prudent, but for your example it could still be 20 days worth of work, and still force "deadline pressure". My conclusion based on my experiments: it's really hard to get BOINC to miss deadlines. -- Ned ID: 795911 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.