Message boards :
Number crunching :
Question about EDF behavior
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Also overshooting the RDCF correction for a single short task. I grabbed one in my last task-fetching connection that inflated the estimated runtime on some 2.7 hour tasks all the way up to 5 hours. Now it is having to gradually work itself down from that one little incident, so BOINC thinks I have a lot more work than I actually have... By my current estimate, it thinks I have about 24 hours more work than what I really have. Sure, it started out comming down quickly, but now it is going down more gradually. All that for one task that was underestimated. Perhaps logic should be added that searches through the work queue to find if there are any similar results in the queue and only adjust RDCF on a weighted percentage instead of what it appears to do now (adjust based on the raw percentage amount of the miss)? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Good one, the possible solution you propose is a crisp logic one it might work very well. A simplistic fuzzy approach might look at delta RDCF too (rate of change of RDCF, or 1st derivative of the RDCF (wrt Time), the very same factor your intelligence spotted), allowing spurious 'environmental noise' [chaotic factors] like errant tasks, to be completely ignored, while still allowing rapid change when required, depending on the weightings of other factors like 'on time'. This is embedding similar logic in the algorithms as you might use [under expert manual control]. example, a rice cooker would use deltaTemp as an Input, allowing the system to compensate for environmental noise (ambient temperature changes, someone dumping extra cold water in etc...). Though much of the research I've read [and implemented some simple things] on these kinds of control methods is US in origin, I understand it is used mostly elsewhere in the world. Relates to efficient control of consumer goods, through industrial [process control] to auto-piloting and navigation systems. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
A simplistic fuzzy approach might look at delta RDCF too (rate of change of RDCF, or 1st derivative of the RDCF, the very same factor your intelligence spotted), allowing spurious 'environmental noise' [chaotic factors] like errant tasks, to be completely ignored, while still allowing rapid change when required, depending on the weightings of other factors like 'on time'. This is embedding similar logic in the algorithms as you might use. Blah... "derivatives"!!!!! I have an HP 20S calculator that does Simpson's-rule Integrals... I wish I had stayed with school the first time around. My life would be so different... Anyway, now we pretty much wait for someone to look at the code and say what a "tie" means...or if indeed that code is missing/broken... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Anyway, now we pretty much wait for someone to look at the code and say what a "tie" means...or if indeed that code is missing/broken... Hahahaha, my more cynical side guesses a tie means 'randomly switch tasks at random intervals', I suppose in about a fortnight , out of curiosity and interest I might take a look at the code if this is still an issue then :D. [ I think Joe Segur knows more about the boinc internals, and may chime in any time...Though I think RDCF may be under science app control? ] Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14655 Credit: 200,643,578 RAC: 874 |
I think we'll have to go back and read the full posting history of John McLeod VII (who wrote the relevant bit of code) to get the final answer, but here are some snippets I remember seeing. In earlier versions of BOINC, EDF was absolute. Once a deadline problem was noticed, BOINC switched mode, and posted in the message log to that effect. This was crazy on a multicore: one short deadline, and everything stopped to pick up early-ish work for other cores. Then John modified it to work on a core-by-core basis: if one task needs a hurry-up to meet deadline, then BOINC will free up just one core by preempting another task, but leave the other work to carry on as normal. I think this change was in the v5.8.xx sequence. Then John changed it again, probably for the v5.10.xx sequence (I haven't found the reference yet), and actually did away with the concept of 'Earliest Deadline First' entirely. Yup: EDF doesn't exist - official. Individual tasks are now given 'high priority' when needed, but EDF isn't an absolute rule. Unfortunately, I haven't been able to think up a TLA (three letter acronym) for high priority, so I'm as guilty as anyone of continuing to promulgate EDF - under most circumstances, it's a useful approximation to the concept we're discussing, but as The Holy Hand Grenade has noticed, the concept breaks down when you look at it too closely. Under the High Priority rules, I think BOINC periodically looks at the entire queue, and runs a simulation on each task - taking into account deadline, estimated total time, estimated time remaining if part-crunched, etc., etc. - to see if, at that moment, any task needs extra priority. My suspicion is that a short-deadline task which has not even been started is more likely to be assigned high priority than a similar deadline task which has already been part-processed. That would account for the task switching that THHG is seeing: effectively, BOINC is saying "phew, broken the back of that one - should finish it in time, no problem, I can relax a bit. Oh no, there's another one....". I don't usually see this behaviour myself, because SETI short tasks finish in under my 1-hour task switch interval, but because on THHG's AMD they take more than one hour, BOINC will do the re-estimation at least once during each short run, and he's seeing the outcome. If anyone wants to work through the full logic, there are some threads on the BOINC Dev boards, e.g. here - different topic, but it gets you into BOINC Client scheduler issues. And something that came up here - the rule "if two tasks have the same priority, don't switch, but stick with the one that's running" is listed in the change log for BOINC v5.10.24, so it wouldn't apply on THHG's v5.10.20 - well, it's written there as "if a project has 2 jobs with same deadline, give preference to jobs already started", but I think it amounts to the same thing for running tasks. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I think we'll have to go back and read the full posting history of John McLeod VII (who wrote the relevant bit of code) to get the final answer, but here are some snippets I remember seeing. ... Ah Hah, that'll help give me some background, thanks for the links Richard. [I see the CPU scheduler changes referred to in the changelog, time for another update I guess :D] Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Scheduling work is a complex process that tries to predict the future based on information that may be wrong. We've all seen, for example, what happens when DCF is wildly wrong. It's a particularly interesting problem when you have two different kinds of jobs running, like we have right now in SETI Beta. It's an issue when someone sets their "connect every" interval to something inappropriate, and BOINC can't report when promised. I suspect this odd behaviour happens when a lot of work is assigned with short deadlines. BOINC wants to start the unit that is most at-risk first. Because they all arrive on the machine at different times, the one BOINC wants to start may not have been downloaded, so it goes to the next best choice until it finds one that it can do, and that is actually present. Then, another WU arrives and that one is more pressing, so it stops the one it has and moves to the next one. I don't see that this causes a huge technical problem. The work does get done, it just looks goofy. ... and it's really easy for us to sit here with our "Mark I mod 0" brains, complete with intuition and say "that looks really wrong." If we could code the same intuition into BOINC, the scheduler would be easy. |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
[snip] NO! WU's were downloaded in order! (I've switched this machine to download/upload only 1 file at a time, due to download problems I've noticed in the past...) The suggestion (further down-thread from your reply) of "ties" and "Fuzzy logic" problems(?) appears to have (to me, at least) merit. I was sitting in front of the monitor watching this... All the WU's that were involved (8 total) had EXACTLY the same deadline. (to the second!) . Hello, from Albany, CA!... |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
[snip] YES, I can state with complete certainty that all the WU's had EXACTLY the same deadline - to the second! this only happened with the first 8 of the 20, all of which had the exact same deadline. The other 12 WU's had a deadline one second later - and behaved normally, once reached. (Behaved normally for EDF WU's, that is!) . Hello, from Albany, CA!... |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I just got a bunch (~20) of very short duration WU's, (the kind that go into immediate EDF) and I'm noticing that as each new WU downloads, the prior WU will go into "waiting to run", and the newer WU starts to run, until the next WU finishes downloading... (this with the x64 5.10.20) I remember JM7 describing this behavior in a different thread many months ago. He stated at the time it was an extremely rare occurence but that they were aware of it and were seeking a solution. |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
I think we'll have to go back and read the full posting history of John McLeod VII (who wrote the relevant bit of code) to get the final answer, but here are some snippets I remember seeing. Interesting. Your scenario breaks down on two fronts - I've change the task switching preference to 20 minutes; and these tasks weren't switching at that 20 minute interval, anyway! (again, they switched at seemingly random intervals!) [sarcasm] Who says we must absolutely have a three letter acronym? We can start calling the behaviour "HP" instead of EDF - but we better keep using EDF for a while, 'cause many BOINCers won't get it... [/sarcasm] . Hello, from Albany, CA!... |
UBT - NaRyan Send message Joined: 20 Oct 07 Posts: 89 Credit: 165,614 RAC: 0 |
I just got a bunch (~20) of very short duration WU's, (the kind that go into immediate EDF) and I'm noticing that as each new WU downloads, the prior WU will go into "waiting to run", and the newer WU starts to run, until the next WU finishes downloading... (this with the x64 5.10.20) I had the same thing happen to me on 2 of my systems (but it was a lot more than 20 of them!), they took about 30min and 40min on each system. they all had the same deadline (sometime on 8th December, I got them 29th November), and as soon as I got them Boin went into EDF and started to work on them. I had one task that it stoped working on sitting at 99.994% completed for nearly 2 days, the other task was at 96.xxx% for the same time. |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
[snip] OK, then looking at your message log might be of help. Also you probably saw my discussion with Jason about what a "tie" really means. If there are fractional seconds being used under the covers, but they're only displaying whole rounded seconds to us in the manager, then just eyeballing the manager isn't going to provide enough detail to say whether or not there is "a problem". Also, I agree with Ned in principle. I don't know if this causes any technical issue other than some minor overhead. That said, the more cores you have and more tasks that are being downloaded, the greater that overhead becomes. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14655 Credit: 200,643,578 RAC: 874 |
...[sarcasm] I like EDF because it's quick and easy to type, and - unless you get your volts from Electricité de France - relatively unambiguous. HP gets a bit messy - it's a food complement (sauce) here in the UK, as well as other more computerish things. Doesn't have the same recognition value in BOINC-land. |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
...[sarcasm] Ok, how about Hi Pri? - it won't get mistaken for a computer company ;-) (I have spent quite a bit of time staring at various Hewlett-Packard devices, too) . Hello, from Albany, CA!... |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Mostly I just call it EDF. It is actually per project rather than per task (to try to prevent ping pong with which task is running as is seen here). There is code that is supposed to prevent the ping pong, but it appears to have one case that was missed. BOINC WIKI |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Mostly I just call it EDF. Could you elaborate? |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Mostly I just call it EDF. On which? BOINC WIKI |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Mostly I just call it EDF. Ya know, it's kinda funny that you're asking me to elaborate on my request for you to elaborate... Who is on first anyway? :) Could you explain the case that you think was missed? Would it apply in this situation? Etc, etc, etc... |
ML1 Send message Joined: 25 Nov 01 Posts: 20443 Credit: 7,508,002 RAC: 20 |
... I haven't been able to think up a TLA (three letter acronym) for high priority, ... OK, so, easy: HPR: High Priority Rules Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.