Round-robin scheduling works great (BoincLogX)

Message boards : Number crunching : Round-robin scheduling works great (BoincLogX)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile MattDavis
Volunteer tester
Avatar

Send message
Joined: 11 Nov 99
Posts: 919
Credit: 934,161
RAC: 0
United States
Message 224389 - Posted: 1 Jan 2006, 23:23:58 UTC
Last modified: 1 Jan 2006, 23:24:17 UTC

I installed BoincLogX yesterday to keep track of what work units I've crunched. You know, out of idle curiousity.

There's a neat feature where the history section tells you how long you've spent on each project. That gave me an idea.

Some people don't think that the round-robin scheduling works. Usually those are the people that stare at Boinc for 23 hours a day, and if Boinc doesn't switch between projects when the person thinks they should, they start to monkey around with the settings trying to micromanage it so that the projects switch every 6 seconds or something. Well, here's proof that if you leave it alone it'll all work out.

As you can see, each project has been going for the same amount of time (my settings are all 1:1:1:1:etc).

The lesson to learn? Leave Boinc alone and trust it! It's scheduler works just fine! Micromanaging just gets in the way!


-----
ID: 224389 · Report as offensive
Profile Jason Safoutin
Volunteer tester
Avatar

Send message
Joined: 8 Sep 05
Posts: 1386
Credit: 200,389
RAC: 0
United States
Message 224396 - Posted: 1 Jan 2006, 23:29:45 UTC

I use BoincLogX and I love it :) its a great add-on. Should try MapView with it to Matt
"By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3

ID: 224396 · Report as offensive
Bob Guy
Volunteer tester

Send message
Joined: 7 Sep 00
Posts: 126
Credit: 213,429
RAC: 0
United States
Message 224420 - Posted: 2 Jan 2006, 1:00:08 UTC

I must disagree! The scheduler is impressively STUPID! It must be noted that I usually have at least 4 active science apps running that have different priorities which may cause the scheduler to act strangely. It should also be noted that there is not a deadline to be met - I do realize that an app which is nearing its deadline will get more time - this is not what is going on. I frequently find that the scheduler will start (or restart) an app only to preempt it within 2 minutes or so. The app then sits in memory for an hour or more before it is allowed to continue running - this is STUPID and a waste of memory even if it does not cause a loss of functionality! The scheduler also more often that not preempts an app when it needs only 5 minutes or so to finish - this behavior is STUPID! Boinc can certainly determine that an app needs just a couple of minutes to finish and should give it the time to finish. I have even increased the default time to switch apps in order to give more time to finish and find that Boinc still won't observe my suggestion - Boinc still prematurely preempts the running apps.

My suggestion:
1. The Boinc scheduler should give a full time slice to an app once it has been started. Don't arbitrarily preempt the apps!
2. The Boinc scheduler should look at time to finish and if it is 5 minutes or so then it should not preempt the app. I do realize that the time to finish may be just an estimate but in my experience it is close enough to accurate to use for this purpose.
ID: 224420 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 224430 - Posted: 2 Jan 2006, 1:28:01 UTC
Last modified: 2 Jan 2006, 1:28:13 UTC

Hmm, I have 4 projects, and never had a problem with anything switching properly. I have my preferences set to switch every 3 hours. My connect every x days is set low, my computer never goes into EDF mode. Scheduler works pretty good to me. Course I don't micro-manage my computer, it runs in the background, doesn't cause any problems with my day to day use of my computer. No problems.
ID: 224430 · Report as offensive
Jim
Avatar

Send message
Joined: 28 Jan 00
Posts: 614
Credit: 2,031,206
RAC: 0
United States
Message 224433 - Posted: 2 Jan 2006, 1:47:47 UTC

Gotta agree with Bob Guy on one point. It's always seemed odd to me that a unit is left in memory with a very few minutes left to go. Though I'm sure it's cached to vm, silly is silly. Also starting a project and then switching to another after only a few minutes seems odd to me.

Oh well. I hardly see it as a precursor to the end of the world. The science still gets done.

Without love, breath is just a clock ... ticking.
Equilibrium
ID: 224433 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 224453 - Posted: 2 Jan 2006, 2:43:04 UTC
Last modified: 2 Jan 2006, 2:45:16 UTC

The only time I've seen this effect is when a WU completes shortly before the end of its slice. The next one in the queue gets started and then gets preempted at the next switch. This is exactly what I would expect to happen.

Likewise, so what if the unit swaps out with 5 minutes to go? Its time slice was over, end of story till its turn comes up again. It's not like it won't complete the next time it runs, and the next WU in the queue gets the rest of the slice.

Alinator
ID: 224453 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 224475 - Posted: 2 Jan 2006, 4:01:50 UTC

It would be much better if the CPU scheduler waited for a checkpoint. However, the current structure does not lend itself to this readily - hence the delay in implementation for this.


BOINC WIKI
ID: 224475 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 224499 - Posted: 2 Jan 2006, 4:55:39 UTC - in response to Message 224430.  

Hmm, I have 4 projects, and never had a problem with anything switching properly. I have my preferences set to switch every 3 hours. My connect every x days is set low, my computer never goes into EDF mode. Scheduler works pretty good to me. Course I don't micro-manage my computer, it runs in the background, doesn't cause any problems with my day to day use of my computer. No problems.

It might be interesting as an experiment to raise your "connect every 'x' setting" so that the scheduler does go into EDF mode.

... when it does, the scheduler still does work according to resource share, it just prioritizes until the "most urgent" work is done, then goes back to round-robin scheduling. If a project gets more than a fair share of time, BOINC won't request more work.

I don't micro-manage either, but I do "tickle" the scheduler from time to time to see if I can knock it off level, and short of outright cheating, it works well.
ID: 224499 · Report as offensive
Profile Jason Safoutin
Volunteer tester
Avatar

Send message
Joined: 8 Sep 05
Posts: 1386
Credit: 200,389
RAC: 0
United States
Message 224504 - Posted: 2 Jan 2006, 5:00:28 UTC

My projects switch like clockwork every 1 hour. I use BoincLogX mostly for SetiMapView. But I have to say BoincLogX is a great tool. You have to keep it on to log the work right before the WU starts and right before it ends. If you setting are correct then BOINC will run as it should. But it really is not broken. Its just doing what its told to do. If you update the software, sometimes setting are changed. So make sure they are how you would like them :)

P.s. I had to do it too when i last upgraded.
"By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3

ID: 224504 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 224546 - Posted: 2 Jan 2006, 6:02:53 UTC - in response to Message 224499.  


It might be interesting as an experiment to raise your "connect every 'x' setting" so that the scheduler does go into EDF mode.

... when it does, the scheduler still does work according to resource share, it just prioritizes until the "most urgent" work is done, then goes back to round-robin scheduling. If a project gets more than a fair share of time, BOINC won't request more work.

I don't micro-manage either, but I do "tickle" the scheduler from time to time to see if I can knock it off level, and short of outright cheating, it works well.


Hey Ned,

I used to run my "connect every x settings" at 2.5 days, but that was when I just crunched SETI and Einstein. When I joined more, I dropped it down, just to avoid going into the EDF mode. Mainly now I just sit and let it crunch. My computer is happy, I'm happy, so thats what matters I hope. :-)

On my moms computer, it just runs SETI, so I just leave the settings alone on it. No need to do anything. My mom doesn't even know it runs. But she does know its installed. She just leaves it be. The way it should be. Hehe.

Jeremy
ID: 224546 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 224567 - Posted: 2 Jan 2006, 6:38:35 UTC - in response to Message 224546.  

I used to run my "connect every x settings" at 2.5 days, but that was when I just crunched SETI and Einstein. When I joined more, I dropped it down, just to avoid going into the EDF mode. Mainly now I just sit and let it crunch. My computer is happy, I'm happy, so thats what matters I hope. :-)

Sure. If you're happy, don't worry about it.

It is kind of interesting to crank your numbers around and force BOINC into EDF mode, just to see what it does.

EDF is actually your friend. :-)

-- Ned

ID: 224567 · Report as offensive
Profile Tern
Volunteer tester
Avatar

Send message
Joined: 4 Dec 03
Posts: 1122
Credit: 13,376,822
RAC: 44
United States
Message 224605 - Posted: 2 Jan 2006, 9:51:07 UTC - in response to Message 224420.  

I must disagree! The scheduler is impressively STUPID! It must be noted that I usually have at least 4 active science apps running that have different priorities which may cause the scheduler to act strangely.


If you have a multiple-CPU system, there are problems with the scheduler switching out "too frequently". This is being discussed to determine how to solve it. If you have a single-core system, it works just fine.
ID: 224605 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 224627 - Posted: 2 Jan 2006, 11:58:25 UTC - in response to Message 224605.  

If you have a multiple-CPU system, there are problems with the scheduler switching out "too frequently". This is being discussed to determine how to solve it. If you have a single-core system, it works just fine.

THe more CPUs you have the worse the problem, unfortunately.

At the moment, the only "cure" I have found counters my usual advice and that is to:

a) limit the number of active projects on the system
b) Set the "switch" time to some number larger than 60 minutes (I am using 240 on one system right now).
c) Maintain "balanced" resource shares (no 99/1)

Note that this does not "cure" all of the observed symptomes, but does limit them somewhat. There is more discussion on the developer's mailing list if you want my thoughts on the subject.

*MY* current feeling is that a fairly simple change would alieviate the most significant of the issues without increasing the complexity significantly.
ID: 224627 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 224651 - Posted: 2 Jan 2006, 13:40:43 UTC - in response to Message 224605.  

Bill Michael said:
If you have a multiple-CPU system, there are problems with the scheduler switching out "too frequently". This is being discussed to determine how to solve it. If you have a single-core system, it works just fine.


Paul D. Buck Said:
THe more CPUs you have the worse the problem, unfortunately.


I have seen this behavior, but at the time attributed it to different things. Since I do not micromanage, I only noticed it after say a reboot, or me fiddling with things. So I didn't know an issue existed. I want to thank you both for finally explaining why I have it set to 180 minutes, but the 4 processor unit likes to still switch hourly. I might be under the impression it's doing some dividing through where 60 is a minimum almost no matter what. So my 2 processor does 2 hours (since it seems to like the even hour stuff. The 4 processor does single hour. So if I changed it to 240 minutes, they would probably stay the same, if I went 360 minutes it might change the 2 processor to 3 and the 4 processor to 2?

Just an observation, and wondering if I am not too far off the mark?

It's nice to know I am not going crazy thinking that everytime I reboot and/or fiddle with the machines, I am seen a know anomaly.



My movie https://vimeo.com/manage/videos/502242
ID: 224651 · Report as offensive
Profile Tern
Volunteer tester
Avatar

Send message
Joined: 4 Dec 03
Posts: 1122
Credit: 13,376,822
RAC: 44
United States
Message 224865 - Posted: 3 Jan 2006, 1:50:59 UTC - in response to Message 224651.  

Just an observation, and wondering if I am not too far off the mark?


The basic problem is that any time _any_ CPU completes a result, ALL the CPUs get rescheduled. So if you're running SETI results in about an hour, it's going to reschedule all CPUs every hour, regardless of the "switch-interval". Making the switch interval longer makes it more likely to finish a result before the switch interval is up, which leaves fewer preempted, which means fewer that will finish shortly into their next turn, causing a very short switch time.

(In other words, if SETI takes 1 hour and Einstein takes 3.9, and that's all you're running, your ideal switch interval is going to be 4. Or 2, forcing Einstein to take two "turns". The worst would be 3.)

Paul has a proposed fix, I have a proposed fix, John Keck has a proposed fix, JM7 has half-rewritten the scheduler it seems to verify _his_ proposed fix, a few others have chimed in... I think it's safe to say that if one of the "simple" fixes is chosen, it won't be a perfect fix but it'll be "soon". If one of the "perfect" fixes is chosen, it'll take a little longer. But either way, it's currently being worked on.
ID: 224865 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 224869 - Posted: 3 Jan 2006, 1:56:02 UTC - in response to Message 224865.  

Just an observation, and wondering if I am not too far off the mark?


The basic problem is that any time _any_ CPU completes a result, ALL the CPUs get rescheduled. So if you're running SETI results in about an hour, it's going to reschedule all CPUs every hour, regardless of the "switch-interval". Making the switch interval longer makes it more likely to finish a result before the switch interval is up, which leaves fewer preempted, which means fewer that will finish shortly into their next turn, causing a very short switch time.

(In other words, if SETI takes 1 hour and Einstein takes 3.9, and that's all you're running, your ideal switch interval is going to be 4. Or 2, forcing Einstein to take two "turns". The worst would be 3.)

Paul has a proposed fix, I have a proposed fix, John Keck has a proposed fix, JM7 has half-rewritten the scheduler it seems to verify _his_ proposed fix, a few others have chimed in... I think it's safe to say that if one of the "simple" fixes is chosen, it won't be a perfect fix but it'll be "soon". If one of the "perfect" fixes is chosen, it'll take a little longer. But either way, it's currently being worked on.

The simple fixes all have a tendency to break something else rather badly - hence the caution about implementation.


BOINC WIKI
ID: 224869 · Report as offensive
Bob Guy
Volunteer tester

Send message
Joined: 7 Sep 00
Posts: 126
Credit: 213,429
RAC: 0
United States
Message 224967 - Posted: 3 Jan 2006, 4:55:14 UTC

Yes, the 'problem' is evidently a multiple CPU thing. I don't micromanage my projects either - I've just noticed this odd behavior recently. I may try setting the time slice higher than the 90 minutes I have it set to now. I have to say that I'm a lazy programmer and if I wanted a solution really bad then I could look at the code and I haven't. Maybe I'll get annoyed enough and bored enough and take a look at the code some day.

ID: 224967 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 224969 - Posted: 3 Jan 2006, 4:57:29 UTC - in response to Message 224967.  

Yes, the 'problem' is evidently a multiple CPU thing. I don't micromanage my projects either - I've just noticed this odd behavior recently. I may try setting the time slice higher than the 90 minutes I have it set to now. I have to say that I'm a lazy programmer and if I wanted a solution really bad then I could look at the code and I haven't. Maybe I'll get annoyed enough and bored enough and take a look at the code some day.

I'm looking at the code. It will take me a while to fix.


BOINC WIKI
ID: 224969 · Report as offensive
Bob Guy
Volunteer tester

Send message
Joined: 7 Sep 00
Posts: 126
Credit: 213,429
RAC: 0
United States
Message 224989 - Posted: 3 Jan 2006, 6:02:54 UTC

Thank you very much for your attention to this 'problem'. I'm sure most of us sometime wonder if there is anybody out there doing anything useful.

A short example that just occurred:
1/2/2006 7:07:09 PM|Einstein@Home|Resuming result l1_1139.5__1139.6_0.1_T10_S4lD_1 using einstein version 479
1/2/2006 7:07:09 PM|rosetta@home|Pausing result NEW_SOFT_CENTROID_PACKING_1dtj_225_4048_0 (left in memory)
1/2/2006 7:07:09 PM|rosetta@home|Pausing result NO_MORE_RELAX_CYCLES_1r69_224_4198_0 (left in memory)
1/2/2006 7:07:09 PM|SETI@home|Starting result 18mr05aa.14866.29360.840898.1.30_3 using setiathome version 418
1/2/2006 7:07:10 PM||request_reschedule_cpus: files downloaded
1/2/2006 7:07:10 PM|Einstein@Home|Pausing result l1_1139.5__1139.6_0.1_T10_S4lD_1 (left in memory)
1/2/2006 7:07:10 PM|SETI@home|Starting result 18fe05ab.2322.29122.411056.1.75_2 using setiathome version 418

I believe the (2nd) Seti app preempted the Einstein app because of a higher priority setting - but I think that the Einstein app should have been allowed to run except if the Seti app had a nearing deadline, which it did not. The apparent causal event was the download and reschedule.

ID: 224989 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 225041 - Posted: 3 Jan 2006, 9:39:19 UTC - in response to Message 224651.  

Just an observation, and wondering if I am not too far off the mark?

It's nice to know I am not going crazy thinking that everytime I reboot and/or fiddle with the machines, I am seen a know anomaly.

I am still "fiddling" and DON'T have good answers.

THere is more hard data in this Rosetta@Home thread with some data.

SO FAR, on the 4 CPU systems the "best" result I have gotten is to run only one project. :(

Unfortunately, to see things clearly you have to let the systems stabalize and run stable for a couple days to be confident of the usage pattern. I still have some "clean-up" to do on a couple systems before I want to launch the next trial.

I am hoping that running long work units CPDN, Einstein@Home, etc. with a long "switch" time (4 hours) may improve the instability I see. But, don't actually hold much hope that all the symptoms will be "cured". For example, I am doing two projects on a dual processor system, and earlier tonight it ran 2 SETI@Home work units, it completed one, shelved another with 36 seconds to go. Now, what do you think is going to happen when it runs that work unit? ... probably not what I would want.

And before people panic, this is mostly a significant problem only with 4 CPU systems (and larger) with multiple projects.

And I agree with John, we have to be somewhat wary, simplistic solutions, including mine, usually have some "gotchas" burried within. The proposal I made was deliverately kept as simple as possible with the intent of allowing quicker implementation (one structure and 4-10 lines of code estimated, of course, I don't do C++, so what do I know?) with the intent of seeing what "breaks" when we try it (if anything).
ID: 225041 · Report as offensive

Message boards : Number crunching : Round-robin scheduling works great (BoincLogX)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.