Message boards :
Number crunching :
4.42 has been posted.
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
At this time, the stable release is probably better for modem connections, or you can keep running 4.42 and reporting so the folks working on it can see how it looks from where you are. Tony, here is the problem as I see it: Everyone who is arguing that the new scheduler is badly broken is arguing that they don't see the "proper short-term mix" in their queue. They're right. The problem is that under the old scheduler, you could easily pass deadlines, especially with projects like Einstein where they have a longish work unit and a short deadline. People get mad when work is done, but they don't get credit because they missed a deadline. So we have two incompatible goals: process everything in a strict order and strict proportion, but don't let work go over deadline. The new scheduler does this by allowing one project to get priority as long as the deadlines are approaching fast, and then effectively locks that project out until the others are caught up. Your dialup issue is a little different: BOINC doesn't download to fill the cache when it is worried about deadlines, and that probably still needs tuning. If you get an official answer, it will probably be in the form of a newer release that works more smoothly in your environment. -- Ned P.S. as much effort as folks put into BOINC, and farms, and etc. I wonder why the really dedicated crunchers don't use a cable/dsl router with "dial backup" and just live on the backup feature -- no DSL or cable at all. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Okay, I found a quirk. There is a CPU scheduler policy (nearest deadline or highest ST debt first) and there is also a download policy (allowed or not allowed) and they have distinct reasons. Nearest deadline CPU mode occurs if: 1) A result is due within 24 hours. or 2) A result is due within 2* the queue size. or 3) Arrange the results in deadline order, start adding the remaining processing times. If at any time the remaining processing time for the WUs with earlier deadlines is greater than 80% of the time to that deadline (over committed). No work is allowed if: 1) See #3 above. 2) Add the required fractions of time til the deadline of each WU. If the total is greater than 0.8, do not allow more work to be downloaded. 3) There are more than x (default 5) projects with work on the system. The default can be changed by editing the global_prefs.xml file. Work will be requested in any case if: 1) A CPU is out of work. 2) Less than the min queue amount of work total for all projects is on the system. Work will not be requested from a project if it has a negative LT debt unless a CPU is out of work. (This is the resource share balancing mechanism.). BOINC WIKI |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
John, The LTD and STD amounts are fun enough and I hope you can teach everyone about them in due time, but please then take out the resource shares, as they are being overriden about 90% of the time anyway! Or give us the old BOINC as an option, where we, the users, dictate what project should crunch first, by setting the resource shares and a considerable connect to server time. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
4.42 is now ignoring the resource share preferences set. If I want to run a project more than anything else, I get buffed out by the LTD and STD of other projects attached. I was to be running only Einstein, Seti & CPDN units and none of the project I have set to take most (as its LTD was -70,000 and a bit), so BOINC decides I can't download new units for this project... The resource share is used in calculating the new debt from the old debt. If a project has a huge negative LT debt, it must have been doing some extra crunching recently (-70,000 seconds is a bit less than 20 hours). debt += project CPU time - wall time * resource share. BOINC WIKI |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
Your dialup issue is a little different: BOINC doesn't download to fill the cache when it is worried about deadlines, and that probably still needs tuning. Ned, I agree with everything you say. Except, the dial up issue needs more than a little tuning. I only had 3 minutes left of a wu that wasn't due for 12 more days, and it still wouldn't download new work. I don't think the scheduler was "worried about deadlines". If it was, they better get to work on the definition of "worried". Unless the know of this problem I don't see it making it into the next version. I won't fix the dishwasher unless my family tells me it's broke. They also need to fix the problem that lets PPAH and LHC continue to download even with HIGH negative LTD. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
If a project has a huge negative LT debt, it must have been doing some extra crunching recently (-70,000 seconds is a bit less than 20 hours). Uhuh... 3 Sshh units of 9 hours each (on my PC), 2.5 of them that it had to run in the past 24 hours in deadline mode... Because BOINC doesn't follow seperate preferences anymore. So it didn't run the units for 2 hours against 1 hour request for all the other projects. But okay, blame me. Blame my setting for 0.2 days, so I get 3 units for Sshh and 2 units for every other project that has 8 hour units... I'll admit, it's me! |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
When a project is processing, its LT and ST debts will be reduced. When it is not processing, its LT and ST will be increased. ST debt is only calculated for projects that have work to be processed on the computer. When the host enters process earliest first mode, the LT and ST debt of the project running will be reduced, and all others will be increased. The time will be made up later by intentionally not allowing download of new work - so resource share balancing is deferred, but not dropped. 4.42 is supposed to fill the cache from any project with a positive LT debt until the cache is at or above min cache for all projects. The problem is if all projects with a positive debt are not handing out work. Resetting a project will move its debt closer to 0. (It actually is set to 0, and then the LT debts are shifted so that the mean is 0 again - moving the debt for the just reset project away from 0). As to the reason for going into earliest deadline first if a deadline is within 2 * the min queue. If a result is due within the current min queue time, it is going to be reported late. If it is within 2 * the min queue, it can be reported on time only if it is completed by the the next connection. The resource sharing will balance better if the min queue setting is about the same as the actuall duration between connections. The only time that a dial up user should have trouble is if all of the projects with positive debt are not handing out work at the moment. Otherwise, (s)he should be able to fill the cache from the projects with positive debt. I am still thinking about how to fix the problem with modem users and no positive debt projects with work without giving up on resource shares. BOINC WIKI |
Contact Send message Joined: 16 Jan 00 Posts: 197 Credit: 2,249,004 RAC: 0 |
I am still thinking about how to fix the problem with modem users and no positive debt projects with work without giving up on resource shares. Well sir. Thanks for still thinking. Thanks for your previous thinking. Your contribution to the intelligence of BOINC is large. Thanks also for your forum input. Keep it up! |
The Gas Giant Send message Joined: 22 Nov 01 Posts: 1904 Credit: 2,646,654 RAC: 0 |
Just a couple of comments about the posts below; ---------- Ned Ludd wrote: "The problem is that under the old scheduler, you could easily pass deadlines, especially with projects like Einstein where they have a longish work unit and a short deadline." I have never missed a deadline on any project (unless I have had a computer problem and it has been down for a period of time), so to me the old schedular worked sufficiently well IF the cache size was kept to about a maximum of half the deadline time. So since I run Predictor (as well as LHC and SAH on this machine) I like to set my cache at 4 days. This also works well for weekend outages and since they overestimate their completion times I really only get 3 days of work anyway. I run Einstein, LHC and Predictor on another machine. And yes with changing resource share between the projects I got close to a deadline with Einstein (really only because they underestimate the wu completion time and I had too much work - but it would have been fine if I hadn't altered the resource share). ----------------------- John McCleod wrote: "Nearest deadline CPU mode occurs if: 1) A result is due within 24 hours. or 2) A result is due within 2* the queue size. or 3) Arrange the results in deadline order, start adding the remaining processing times. If at any time the remaining processing time for the WUs with earlier deadlines is greater than 80% of the time to that deadline (over committed)." Now I see my specific problem, I set my cache size to 4 days. Since I run predictor it ends up violating #2 and BOINC goes into deadline mode. This is specifically a problem with projects that have shorter deadlines. So all up, if a someone is attached to both a short deadline project and a longer deadline project, these constraints set the cache size that BOINC will be then able to effectively allocate resource sharing that the user has set up and keep a cache of wu's from each project. Otherwise it will always be in deadline mode and then LT debt mode and the shorter deadline project will crunch first then not get any more wu's until the LT debt of the longer term project has been satisfied. Wow, convoluted and not very usable. Maybe #2 can be shortened to 1.1 to 1.5 times the queue length to make it more usable for now and fine tune it later. Development, gotta love it! Live long and crunch! Paul. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Just a couple of comments about the posts below; Things would be a whole lot easier if everyone studied and set things up reasonably -- your settings are fine. I remember a post (quite a while ago) where the user was having trouble getting and reporting work, so they kept raising what is now "connect every 'x' days" and I think according to his last post had it near 100 days. That was about the time we got the 10 day limit. It is entirely possible for someone to crunch Einstein and set BOINC to connect every 10 days. The new scheduler handles that. If you crunch more than one project, I don't think there is much difference between connect every 4 days and connect every 1/4 days. The long-term-debt mechanism works well with a short cache. |
Daykay Send message Joined: 18 Dec 00 Posts: 647 Credit: 739,559 RAC: 0 |
I'd have to say with the new developments in scheduling, I have been lowering my "Connect every x days" preference in an attempt to avoid so much time spent in Panic mode. At this time I have gone from connecting every 3 days down to 1 day. This has still failed to fix my problems with the new scheduler. One of my projects (Shh) has 33% resource share, and the WU's have very short deadlines and each WU takes around 8 hours to complete. Thus I am often in Panic mode. Since I have changed to a more frequent connection time BOINC holds less WU's in cache. Pirates also has a 33% resource share and is frequently requesting more work (even though I am running in Panic mode). CPDN, for which I now have 2 WU's, is the next most highly resourced project with a share of 10%. These WU's manage to get some CPU cycles after I have reported a batch of my Shh WU's. It might even be as much as 50% over the last 140 hours, though probably slightly less. Einstein, LHC and SETI with resource shares of 7.67%, 7.33% and 8.33% respectively, have not downloaded any new WU's (as far as I can tell) during the last 140 hours (approx). So to sum up it looks like Shh is getting slightly more than it should. CPDN is getting much more than it should. Einstein, LHC and SETI are getting ripped off. Pirates is behaving as I would like, though I'm not sure it should be asking for work while the client is in Panic mode. (edit) Oh and some further information, for the record, I have an always on adsl connection so I care not how often it connects to the servers. Kolch - Crunching for the BOINC@Australia team since July 2004. Search for your own intelligence... |
ksnash Send message Joined: 28 Nov 99 Posts: 402 Credit: 528,725 RAC: 0 |
Well I guess I will be crunching on einstein until Setiathome goes into deadline mode. I have gained back the two days for einstein but nothing else is running. |
Anthony Brixey Send message Joined: 24 Jun 00 Posts: 102 Credit: 1,757,916 RAC: 0 |
"The resource sharing will balance better if the min queue setting is about the same as the actuall duration between connections." Boinc is trying to do two things with one number. It would be better to have a ‘Cache size’ as well as a ‘Connect every’ setting. Anthony |
Martin P. Send message Joined: 19 May 99 Posts: 294 Credit: 27,230,961 RAC: 2 |
I agree on all counts. Since 4.35 I've been asking (on the forums) if resource share was respected in "panic" mode, but have NEVER got an answer. I see Rom and JM7 respond to questions before and after mine. They leave mine alone. I'm wondering if it's because of the uproar the real answer would cause. Ned, but this is exactly the problem, especially with SETI@Home and Einstein@Home that have different deadlines (2 weeks vs. 1 weeek). My client finished a work unit for E@H correctly and also reported it. Then it started with SETI@Home which has a deadline of June 1st in "Panic-mode". With this scheduling scheme it will never again download any E@H WUs since the only WUs it has are SETI@Home and these are ofcourse always earlier than none. The Panic mode should only be activated if the deadline comes within twice the time estimated to finish a WU. |
Scottatron Send message Joined: 15 Jul 03 Posts: 94 Credit: 220,389 RAC: 0 |
Well, I have once again downgraded, due to a machine finishing its last PP@H WU and not requesting any more. The machine had around 5 SETI WUs left, and on that machine enough to keep it busy for around 11 hours. So, with all this talk of debt accumulating etc, how can debt not be at its maximum when the project runs out of work? |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Well, I have once again downgraded, due to a machine finishing its last PP@H WU and not requesting any more. The instant a project runs out of work, it is likely to have near its minimum debt as it has just finished using the CPU. If it had a large positive LT debt to begin with, it may still have a positive LT debt. If, on the otherhand, it had a low LT debt, the LT debt is now likely to be negative. Leave it alone, and it should recover. BOINC WIKI |
MrMaxx Send message Joined: 22 Apr 99 Posts: 135 Credit: 1,645,913 RAC: 1 |
I've been running 4.41 for a few days because I wanted the "no new work" feature. Unfortunately, when that feature was enabled, it wouldn't upload any results either... is that a bug or is that by design? If it's a bug, has that been fixed in the latest Dev release? Thanks. (later) Just downloaded and upgraded to 4.42. It appears that this WAS a bug and that it was fixed in this version... Thanks to the programmers for fixing it! I needed the "no new work" feature, but also wanted to be able to report my completed WUs! :-) |
Captain Avatar Send message Joined: 17 May 99 Posts: 15133 Credit: 529,088 RAC: 0 |
I've been running 4.41 for a few days because I wanted the "no new work" feature. Unfortunately, when that feature was enabled, it wouldn't upload any results either... is that a bug or is that by design? If it's a bug, has that been fixed in the latest Dev release? Not a bug, Under 4.41 it wouldn't upload until 24hrs before it’s due. I believe 4.42 that is corrected..... Same old disclaimer: Running development versions you run at your own risk etc.. When you down load the Devals look at the list of new features and Issues.... Always download the most current version before posting a problem If it still is a problem then post the question. Always do a clean install. I.E. Under windows Add and remove programs.. This will not affect the data files and you won't loose work.... |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
I've been running 4.41 for a few days because I wanted the "no new work" feature. Unfortunately, when that feature was enabled, it wouldn't upload any results either... is that a bug or is that by design? If it's a bug, has that been fixed in the latest Dev release? It was a bug alright. Under the wrong circumstances, it would not contact a scheduler at all. BOINC WIKI |
Captain Avatar Send message Joined: 17 May 99 Posts: 15133 Credit: 529,088 RAC: 0 |
[/quote] It was a bug alright. Under the wrong circumstances, it would not contact a scheduler at all.[/quote] I stand corrected it was a bug,,,, Hi John! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.