Message boards :
Number crunching :
Round-robin scheduling works great (BoincLogX)
Message board moderation
Author | Message |
---|---|
MattDavis Send message Joined: 11 Nov 99 Posts: 919 Credit: 934,161 RAC: 0 |
I installed BoincLogX yesterday to keep track of what work units I've crunched. You know, out of idle curiousity. There's a neat feature where the history section tells you how long you've spent on each project. That gave me an idea. Some people don't think that the round-robin scheduling works. Usually those are the people that stare at Boinc for 23 hours a day, and if Boinc doesn't switch between projects when the person thinks they should, they start to monkey around with the settings trying to micromanage it so that the projects switch every 6 seconds or something. Well, here's proof that if you leave it alone it'll all work out. As you can see, each project has been going for the same amount of time (my settings are all 1:1:1:1:etc). The lesson to learn? Leave Boinc alone and trust it! It's scheduler works just fine! Micromanaging just gets in the way! ----- |
Jason Safoutin Send message Joined: 8 Sep 05 Posts: 1386 Credit: 200,389 RAC: 0 |
I use BoincLogX and I love it :) its a great add-on. Should try MapView with it to Matt "By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3 |
Bob Guy Send message Joined: 7 Sep 00 Posts: 126 Credit: 213,429 RAC: 0 |
I must disagree! The scheduler is impressively STUPID! It must be noted that I usually have at least 4 active science apps running that have different priorities which may cause the scheduler to act strangely. It should also be noted that there is not a deadline to be met - I do realize that an app which is nearing its deadline will get more time - this is not what is going on. I frequently find that the scheduler will start (or restart) an app only to preempt it within 2 minutes or so. The app then sits in memory for an hour or more before it is allowed to continue running - this is STUPID and a waste of memory even if it does not cause a loss of functionality! The scheduler also more often that not preempts an app when it needs only 5 minutes or so to finish - this behavior is STUPID! Boinc can certainly determine that an app needs just a couple of minutes to finish and should give it the time to finish. I have even increased the default time to switch apps in order to give more time to finish and find that Boinc still won't observe my suggestion - Boinc still prematurely preempts the running apps. My suggestion: 1. The Boinc scheduler should give a full time slice to an app once it has been started. Don't arbitrarily preempt the apps! 2. The Boinc scheduler should look at time to finish and if it is 5 minutes or so then it should not preempt the app. I do realize that the time to finish may be just an estimate but in my experience it is close enough to accurate to use for this purpose. |
KB7RZF Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2 |
Hmm, I have 4 projects, and never had a problem with anything switching properly. I have my preferences set to switch every 3 hours. My connect every x days is set low, my computer never goes into EDF mode. Scheduler works pretty good to me. Course I don't micro-manage my computer, it runs in the background, doesn't cause any problems with my day to day use of my computer. No problems. |
Jim Send message Joined: 28 Jan 00 Posts: 614 Credit: 2,031,206 RAC: 0 |
Gotta agree with Bob Guy on one point. It's always seemed odd to me that a unit is left in memory with a very few minutes left to go. Though I'm sure it's cached to vm, silly is silly. Also starting a project and then switching to another after only a few minutes seems odd to me. Oh well. I hardly see it as a precursor to the end of the world. The science still gets done. Without love, breath is just a clock ... ticking. Equilibrium |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
The only time I've seen this effect is when a WU completes shortly before the end of its slice. The next one in the queue gets started and then gets preempted at the next switch. This is exactly what I would expect to happen. Likewise, so what if the unit swaps out with 5 minutes to go? Its time slice was over, end of story till its turn comes up again. It's not like it won't complete the next time it runs, and the next WU in the queue gets the rest of the slice. Alinator |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
It would be much better if the CPU scheduler waited for a checkpoint. However, the current structure does not lend itself to this readily - hence the delay in implementation for this. BOINC WIKI |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Hmm, I have 4 projects, and never had a problem with anything switching properly. I have my preferences set to switch every 3 hours. My connect every x days is set low, my computer never goes into EDF mode. Scheduler works pretty good to me. Course I don't micro-manage my computer, it runs in the background, doesn't cause any problems with my day to day use of my computer. No problems. It might be interesting as an experiment to raise your "connect every 'x' setting" so that the scheduler does go into EDF mode. ... when it does, the scheduler still does work according to resource share, it just prioritizes until the "most urgent" work is done, then goes back to round-robin scheduling. If a project gets more than a fair share of time, BOINC won't request more work. I don't micro-manage either, but I do "tickle" the scheduler from time to time to see if I can knock it off level, and short of outright cheating, it works well. |
Jason Safoutin Send message Joined: 8 Sep 05 Posts: 1386 Credit: 200,389 RAC: 0 |
My projects switch like clockwork every 1 hour. I use BoincLogX mostly for SetiMapView. But I have to say BoincLogX is a great tool. You have to keep it on to log the work right before the WU starts and right before it ends. If you setting are correct then BOINC will run as it should. But it really is not broken. Its just doing what its told to do. If you update the software, sometimes setting are changed. So make sure they are how you would like them :) P.s. I had to do it too when i last upgraded. "By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3 |
KB7RZF Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2 |
Hey Ned, I used to run my "connect every x settings" at 2.5 days, but that was when I just crunched SETI and Einstein. When I joined more, I dropped it down, just to avoid going into the EDF mode. Mainly now I just sit and let it crunch. My computer is happy, I'm happy, so thats what matters I hope. :-) On my moms computer, it just runs SETI, so I just leave the settings alone on it. No need to do anything. My mom doesn't even know it runs. But she does know its installed. She just leaves it be. The way it should be. Hehe. Jeremy |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
I used to run my "connect every x settings" at 2.5 days, but that was when I just crunched SETI and Einstein. When I joined more, I dropped it down, just to avoid going into the EDF mode. Mainly now I just sit and let it crunch. My computer is happy, I'm happy, so thats what matters I hope. :-) Sure. If you're happy, don't worry about it. It is kind of interesting to crank your numbers around and force BOINC into EDF mode, just to see what it does. EDF is actually your friend. :-) -- Ned |
Tern Send message Joined: 4 Dec 03 Posts: 1122 Credit: 13,376,822 RAC: 44 |
I must disagree! The scheduler is impressively STUPID! It must be noted that I usually have at least 4 active science apps running that have different priorities which may cause the scheduler to act strangely. If you have a multiple-CPU system, there are problems with the scheduler switching out "too frequently". This is being discussed to determine how to solve it. If you have a single-core system, it works just fine. |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
If you have a multiple-CPU system, there are problems with the scheduler switching out "too frequently". This is being discussed to determine how to solve it. If you have a single-core system, it works just fine. THe more CPUs you have the worse the problem, unfortunately. At the moment, the only "cure" I have found counters my usual advice and that is to: a) limit the number of active projects on the system b) Set the "switch" time to some number larger than 60 minutes (I am using 240 on one system right now). c) Maintain "balanced" resource shares (no 99/1) Note that this does not "cure" all of the observed symptomes, but does limit them somewhat. There is more discussion on the developer's mailing list if you want my thoughts on the subject. *MY* current feeling is that a fairly simple change would alieviate the most significant of the issues without increasing the complexity significantly. |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
Bill Michael said: If you have a multiple-CPU system, there are problems with the scheduler switching out "too frequently". This is being discussed to determine how to solve it. If you have a single-core system, it works just fine. Paul D. Buck Said: THe more CPUs you have the worse the problem, unfortunately. I have seen this behavior, but at the time attributed it to different things. Since I do not micromanage, I only noticed it after say a reboot, or me fiddling with things. So I didn't know an issue existed. I want to thank you both for finally explaining why I have it set to 180 minutes, but the 4 processor unit likes to still switch hourly. I might be under the impression it's doing some dividing through where 60 is a minimum almost no matter what. So my 2 processor does 2 hours (since it seems to like the even hour stuff. The 4 processor does single hour. So if I changed it to 240 minutes, they would probably stay the same, if I went 360 minutes it might change the 2 processor to 3 and the 4 processor to 2? Just an observation, and wondering if I am not too far off the mark? It's nice to know I am not going crazy thinking that everytime I reboot and/or fiddle with the machines, I am seen a know anomaly. My movie https://vimeo.com/manage/videos/502242 |
Tern Send message Joined: 4 Dec 03 Posts: 1122 Credit: 13,376,822 RAC: 44 |
Just an observation, and wondering if I am not too far off the mark? The basic problem is that any time _any_ CPU completes a result, ALL the CPUs get rescheduled. So if you're running SETI results in about an hour, it's going to reschedule all CPUs every hour, regardless of the "switch-interval". Making the switch interval longer makes it more likely to finish a result before the switch interval is up, which leaves fewer preempted, which means fewer that will finish shortly into their next turn, causing a very short switch time. (In other words, if SETI takes 1 hour and Einstein takes 3.9, and that's all you're running, your ideal switch interval is going to be 4. Or 2, forcing Einstein to take two "turns". The worst would be 3.) Paul has a proposed fix, I have a proposed fix, John Keck has a proposed fix, JM7 has half-rewritten the scheduler it seems to verify _his_ proposed fix, a few others have chimed in... I think it's safe to say that if one of the "simple" fixes is chosen, it won't be a perfect fix but it'll be "soon". If one of the "perfect" fixes is chosen, it'll take a little longer. But either way, it's currently being worked on. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Just an observation, and wondering if I am not too far off the mark? The simple fixes all have a tendency to break something else rather badly - hence the caution about implementation. BOINC WIKI |
Bob Guy Send message Joined: 7 Sep 00 Posts: 126 Credit: 213,429 RAC: 0 |
Yes, the 'problem' is evidently a multiple CPU thing. I don't micromanage my projects either - I've just noticed this odd behavior recently. I may try setting the time slice higher than the 90 minutes I have it set to now. I have to say that I'm a lazy programmer and if I wanted a solution really bad then I could look at the code and I haven't. Maybe I'll get annoyed enough and bored enough and take a look at the code some day. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Yes, the 'problem' is evidently a multiple CPU thing. I don't micromanage my projects either - I've just noticed this odd behavior recently. I may try setting the time slice higher than the 90 minutes I have it set to now. I have to say that I'm a lazy programmer and if I wanted a solution really bad then I could look at the code and I haven't. Maybe I'll get annoyed enough and bored enough and take a look at the code some day. I'm looking at the code. It will take me a while to fix. BOINC WIKI |
Bob Guy Send message Joined: 7 Sep 00 Posts: 126 Credit: 213,429 RAC: 0 |
Thank you very much for your attention to this 'problem'. I'm sure most of us sometime wonder if there is anybody out there doing anything useful. A short example that just occurred: 1/2/2006 7:07:09 PM|Einstein@Home|Resuming result l1_1139.5__1139.6_0.1_T10_S4lD_1 using einstein version 479 1/2/2006 7:07:09 PM|rosetta@home|Pausing result NEW_SOFT_CENTROID_PACKING_1dtj_225_4048_0 (left in memory) 1/2/2006 7:07:09 PM|rosetta@home|Pausing result NO_MORE_RELAX_CYCLES_1r69_224_4198_0 (left in memory) 1/2/2006 7:07:09 PM|SETI@home|Starting result 18mr05aa.14866.29360.840898.1.30_3 using setiathome version 418 1/2/2006 7:07:10 PM||request_reschedule_cpus: files downloaded 1/2/2006 7:07:10 PM|Einstein@Home|Pausing result l1_1139.5__1139.6_0.1_T10_S4lD_1 (left in memory) 1/2/2006 7:07:10 PM|SETI@home|Starting result 18fe05ab.2322.29122.411056.1.75_2 using setiathome version 418 I believe the (2nd) Seti app preempted the Einstein app because of a higher priority setting - but I think that the Einstein app should have been allowed to run except if the Seti app had a nearing deadline, which it did not. The apparent causal event was the download and reschedule. |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
Just an observation, and wondering if I am not too far off the mark? I am still "fiddling" and DON'T have good answers. THere is more hard data in this Rosetta@Home thread with some data. SO FAR, on the 4 CPU systems the "best" result I have gotten is to run only one project. :( Unfortunately, to see things clearly you have to let the systems stabalize and run stable for a couple days to be confident of the usage pattern. I still have some "clean-up" to do on a couple systems before I want to launch the next trial. I am hoping that running long work units CPDN, Einstein@Home, etc. with a long "switch" time (4 hours) may improve the instability I see. But, don't actually hold much hope that all the symptoms will be "cured". For example, I am doing two projects on a dual processor system, and earlier tonight it ran 2 SETI@Home work units, it completed one, shelved another with 36 seconds to go. Now, what do you think is going to happen when it runs that work unit? ... probably not what I would want. And before people panic, this is mostly a significant problem only with 4 CPU systems (and larger) with multiple projects. And I agree with John, we have to be somewhat wary, simplistic solutions, including mine, usually have some "gotchas" burried within. The proposal I made was deliverately kept as simple as possible with the intent of allowing quicker implementation (one structure and 4-10 lines of code estimated, of course, I don't do C++, so what do I know?) with the intent of seeing what "breaks" when we try it (if anything). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.