Message boards :
Number crunching :
Why not in order of date?
Message board moderation
Author | Message |
---|---|
Steven Gaber Send message Joined: 19 Jan 13 Posts: 111 Credit: 2,834,186 RAC: 11 |
Right now, has 155 S@H tasks ready to start, with due dates of 9 October, 14 October and 21 October. The computer is working on three tasks due on 21 October. One would think that BOINC should have been programmed to work on the tasks with earlier dates than on the ones with later dates. Why is this not the case? Cheers, Steven Gaber Oldsmar, FL |
Wiggo Send message Joined: 24 Jan 00 Posts: 36654 Credit: 261,360,520 RAC: 489 |
BOINC works on the "First In First Out" (or FIFO) order unless deadlines become a problem. ;-) Cheers. |
rob smith Send message Joined: 7 Mar 03 Posts: 22508 Credit: 416,307,556 RAC: 380 |
Because BOINC works on a First-in, First-out priority - if you look at the "sent" dates you will see that those being processed first are slightly older (maybe by only a few minutes). than those sitting waiting to be processed. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Consider this: You have two projects added. Project one has tasks with deadlines of 3 weeks away. Project two has tasks with deadlines of 3 days away. If BOINC were to run tasks in order of deadline, it would run the tasks of the 3 days deadline project first, very possibly all the way until the tasks for the 3 week deadline project were past the deadline. So it's better to just run them in the order they were received in. |
Steven Gaber Send message Joined: 19 Jan 13 Posts: 111 Credit: 2,834,186 RAC: 11 |
Thanks for those insights. I understand the first-in-first out principle. But in addition to the aforementioned 155 S@H tasks due in October, I have 65 Asteroids tacks with deadlines of 3 September, only four days away. I will probably abort most of them. I formerly also had 75 Rosetta tasks with short deadlines, but had to abort 45 of them. Steven Gaber Oldsmar, FL |
Steven Gaber Send message Joined: 19 Jan 13 Posts: 111 Credit: 2,834,186 RAC: 11 |
BOINC works on the "First In First Out" (or FIFO) order unless deadlines become a problem. ;-) For me, those long task lists and short deadlines have become a problem. Salud. Steven Gaber Oldsmar, FL |
W-K 666 Send message Joined: 18 May 99 Posts: 19376 Credit: 40,757,560 RAC: 67 |
I would decrease your cache size. You shouldn't run a 10 day cache when you have tasks with 3 day deadline. A one day cache should be more than sufficient. And lets hope Dorian does cause you problems. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13848 Credit: 208,696,464 RAC: 304 |
I have 65 Asteroids tacks with deadlines of 3 September, only four days away. I will probably abort most of them.Why??? The manager will take in to account how long your computer is on, how much time it is able to process work while it is on, how long it takes to process each projects work, what your resource share settings are, and then work to meet those requirements. Downloading work, aborting it, suspending & re-enabling crunching just makes it more difficult for the manager to do what you have set it to do. If you run more than one project, set a small cache (no more than a day), and let the manager work things out. It will still work things out with a larger cache, but it will takes ages (several weeks to a month or more) to do so. The smaller the cache, the sooner it can resolve your settings & the work it has to do. Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22508 Credit: 416,307,556 RAC: 380 |
Not only a small "primary cache" ("store at least x days"), but a really small "secondary cache" (the "store x additional days"). Something like 1 +0.01 Further, some projects will send you a very inflated amount of work and over-fill the caches which is not good for cache management within BOINC. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
Not only a small "primary cache" ("store at least x days"), but a really small "secondary cache" (the "store x additional days"). Something like 1 +0.01 Yup, I finally figured out settings on the World Community Grid website to get that to stop. But I am not sure I ever got the Einstein@Home to cooperate. Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Not only a small "primary cache" ("store at least x days"), but a really small "secondary cache" (the "store x additional days"). Something like 1 +0.01 +1 Only solution for Einstein is GET/NNT/EMPTY/GET. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I'm running Einstein too. I get what I ask for - no more, sometimes less. The Event Log records both request and allocation.Not only a small "primary cache" ("store at least x days"), but a really small "secondary cache" (the "store x additional days"). Something like 1 +0.01 Edit - most recent this machine: 30/08/2019 21:04:28 | Einstein@Home | Sending scheduler request: To fetch work. 30/08/2019 21:04:28 | Einstein@Home | Reporting 1 completed tasks 30/08/2019 21:04:28 | Einstein@Home | Requesting new tasks for Intel GPU 30/08/2019 21:04:28 | Einstein@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices 30/08/2019 21:04:28 | Einstein@Home | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices 30/08/2019 21:04:28 | Einstein@Home | [sched_op] Intel GPU work request: 4412.18 seconds; 0.00 devices 30/08/2019 21:04:32 | Einstein@Home | Scheduler request completed: got 1 new tasks 30/08/2019 21:04:32 | Einstein@Home | [sched_op] Server version 611 30/08/2019 21:04:32 | Einstein@Home | Project requested delay of 60 seconds 30/08/2019 21:04:32 | Einstein@Home | [sched_op] estimated total CPU task duration: 0 seconds 30/08/2019 21:04:32 | Einstein@Home | [sched_op] estimated total NVIDIA GPU task duration: 0 seconds 30/08/2019 21:04:32 | Einstein@Home | [sched_op] estimated total Intel GPU task duration: 18527 seconds 30/08/2019 21:04:32 | Einstein@Home | [sched_op] handle_scheduler_reply(): got ack for task LATeah1049ad_196.0_0_0.0_49327964_1 30/08/2019 21:04:32 | Einstein@Home | [sched_op] Deferring communication for 00:01:00 30/08/2019 21:04:32 | Einstein@Home | [sched_op] Reason: requested by project 30/08/2019 21:04:34 | Einstein@Home | Started download of templates_LATeah1049ae_0100_7551530.dat 30/08/2019 21:04:36 | Einstein@Home | Finished download of templates_LATeah1049ae_0100_7551530.dat |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Sure if you have a Einstein only project machine. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13848 Credit: 208,696,464 RAC: 304 |
30/08/2019 21:04:28 | Einstein@Home | [sched_op] Intel GPU work request: 4412.18 seconds; 0.00 devicesLooks like about 4 times as much as was requested was delivered. I guess it's a juggling act between "Store at least" and "Store up to an additional" along with system up time & BOINC crunching time while the system is up as to whether it waits until the time needed to process work requested is equal to or greater than the time to process the smallest runtime work unit that is available before issuing more (even to the point of running out of that projects work before getting more, or even not getting anymore until the resource share debt gets to the point that more than is requested is need to be processed in order to honour the resource share settings), or if it issues more work as soon as what the system has is less than what has been set as a minimum, even if it greatly exceeds the requested amount of processing time. Adding projects, some with longer running WUs, other with much shorter running WUs, and a mixture of all three all with different deadlines would make the Managers job of honouring Resource Share settings only possible over the long term. Unfortunately people tend to expect to see their resource share settings honoured immediately, or at least on a hour by hour basis. But with multiple projects, limited system up time, limited processing time while the system is up, and limited system resources (ie low clock speeds/low core count/low performance GPUs (if any)), the resource share can only be met over the long term- ie 4 or more weeks. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
30/08/2019 21:04:28 | Einstein@Home | [sched_op] Intel GPU work request: 4412.18 seconds; 0.00 devicesLooks like about 4 times as much as was requested was delivered. 30/08/2019 21:04:32 | Einstein@Home | Scheduler request completed: got 1 new tasksYou left that line out of the quote - can't send a fractional task. It sent "smallest amount possible". I guess it's a juggling act between "Store at least" and "Store up to an additional" along with system up time & BOINC crunching time while the system is up as to whether it waits until the time needed to process work requested is equal to or greater than the time to process the smallest runtime work unit that is available before issuing more (even to the point of running out of that projects work before getting more, or even not getting anymore until the resource share debt gets to the point that more than is requested is need to be processed in order to honour the resource share settings), or if it issues more work as soon as what the system has is less than what has been set as a minimum, even if it greatly exceeds the requested amount of processing time.The machine in question is running four separate projects. Given that they all have very different deadlines (from 5 days to 7 weeks, typically) and different scientific requirements, my cache settings are 0.25 days + 0.05 days. So yes, there are multiple issues at play, and it's the overall balance of the machine and the various resources at play that matters. Did you notice the 'intel_gpu'? There is one particular bug, which I've only seen very, very rarely - maybe four or five times in 15 years of crunching at Einstein (and it's only happened at Einstein). This is where the client requests a modest amount of work (as in my example), and gets a reasonable reply (again, as here). But a minute later, the client asks for the same amount of work again, and gets it. And a minute later, it asks again. And again. And again. And again.... But that's a client bug. It ends up asking for too much. And I've never been able to track it down, because (a) you can't trigger it deliberately, and (b) anything you do to investigate it, like setting new logging flags, stops the behaviour. It'll have to remain a mystery, unless it happens spontaneously while I have the logging flags set for another reason. But the critical point remains: it's the client that requests work - no project can send work unsolicited. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13848 Credit: 208,696,464 RAC: 304 |
You left that line out of the quote - can't send a fractional task. It sent "smallest amount possible".Which I was fully aware of, I guess I just didn't make it as obvious as I thought I did when I actually discussed the Scheduler suppling work even if the smallest amount available is considerably larger than the requested amount, along with the alternative option. The point you were addressing was people saying that project was giving them significantly more work than they needed to meet their cache & resource share settings. To me 50% more of something is significant, 400% more qualifies as a huge amount more (even if the absolute amount of that something is very small). Did you notice the 'intel_gpu'?Yes, although i'm not sure why that is of particular significance in this case? There is one particular bug, which I've only seen very, very rarely - maybe four or five times in 15 years of crunching at Einstein (and it's only happened at Einstein). This is where the client requests a modest amount of work (as in my example), and gets a reasonable reply (again, as here). But a minute later, the client asks for the same amount of work again, and gets it. And a minute later, it asks again. And again. And again. And again....Which may be what was happening with the people that were talking about getting more work than they should have according to their settings, although i'd have thought they'd have noticed if that was what was actually occurring. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Which was my primary reason for butting into this conversation. The client bug I've seen is, well, bugging me: but it's very elusive. Fixing it will rely on people making precise, formal, observations - and reporting them accurately. Sloppy thinking like "Einstein sends...", well, bugs me again.And a minute later, it asks again. And again. And again. And again....Which may be what was happening with the people that were talking about getting more work than they should have according to their settings, although i'd have thought they'd have noticed if that was what was actually occurring. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I guess I am volunteering again. My last experiment on Einstein work request was with these parameters. Einstein resource share =25 Seti resource share = 1000 Primary work cache = 0.01 days Secondary work cache = 0.0 days Unset NNT and received 260 Einstein Gamma Ray Pulsar tasks on the first connection. The second connection 60 seconds later netted another 225 tasks. I then set NNT again and started aborting all the work I couldn't finish in time. A Einstein task on my host crunches in between 465 -585 seconds depending on the card. I should have received only one task at a request of 864 seconds of work. So why did I get sent 130,000 seconds of work? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
rob smith Send message Joined: 7 Mar 03 Posts: 22508 Credit: 416,307,556 RAC: 380 |
Unlike many of the settings within BOINC the cache settings do not detect zero as "use a default value" (or even as "I really mean zero"), but try to interpret zero as a real value and probably in so doing end up with something very strange- try a setting of 0.01 or even 0.001. I'm just about to set this machine up on Einstein to see what it does with a "virgin" machine with similar settings to those used by Keith, but a non-zero secondary cache. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
rob smith Send message Joined: 7 Mar 03 Posts: 22508 Credit: 416,307,556 RAC: 380 |
A question before I do the actual test - roughly how long does a "typical" Einstein job take to run on a CPU? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.