How does the system schedule work.

Message boards : Number crunching : How does the system schedule work.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Ed_Anderson

Send message
Joined: 20 Jan 02
Posts: 34
Credit: 4,480,322
RAC: 38
United States
Message 1113168 - Posted: 4 Jun 2011, 21:03:10 UTC

I have opened my system to hold 5 days of work. So I have a bunch of work units waiting to be processed. The odd thing is that the ones with the soonest dates to be returned are not getting processed first.

There are a bunch that have to be returned by 6/15. Then a bunch that have to be returned by 7/20. The 7/20 units seem to be getting processed first.

I figured the ones with the closest due dates would go first.

Is it random or is there a method to the processing sequence?

Ed
ID: 1113168 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1113179 - Posted: 4 Jun 2011, 21:31:42 UTC
Last modified: 4 Jun 2011, 21:34:46 UTC

It works on the premise of FIFO. (First in First out) If Boinc determines that work is in danger of being returned late then it switches to EDL. (Earliest Deadline First) Also known as Panic mode.

Edit...........Welcome to (or back) Seti!!
Boinc....Boinc....Boinc....Boinc....
ID: 1113179 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1113184 - Posted: 4 Jun 2011, 21:36:01 UTC - in response to Message 1113168.  

Too add to what Geek@Play said, the reason why BOINC works in First In, First Out mode instead of by deadlines is due to the fact that BOINC supports every project connected to it, and each project has varying lengths of deadlines and time to completion for a workunit.

For example, ClimatePrediction's workunits often have a deadline of a year away. If it worked strictly by deadlines, ClimatePrediction's workunits would never get any CPU time until it neared the deadline.

Why is this a problem? Because BOINC only makes a best guess as to how long a workunit will take to crunch based upon several variables. If BOINC guesses wrong and it takes twice as long to finish a workunit than it anticipated, that workunit would be in trouble of missing its deadline, and BOINC is designed to meet deadlines with 99.99% accuracy so long as you let it do its thing and keep your caches reasonable.
ID: 1113184 · Report as offensive
Profile Ed_Anderson

Send message
Joined: 20 Jan 02
Posts: 34
Credit: 4,480,322
RAC: 38
United States
Message 1115751 - Posted: 11 Jun 2011, 3:32:18 UTC - in response to Message 1113184.  

Thanks!
ID: 1115751 · Report as offensive
Profile Ed_Anderson

Send message
Joined: 20 Jan 02
Posts: 34
Credit: 4,480,322
RAC: 38
United States
Message 1116732 - Posted: 13 Jun 2011, 16:51:04 UTC - in response to Message 1115751.  

When does it go into panic mode? I have a bunch of work units due 6/15. Today is 6/13 and it continues to start up work units with July due dates.

Not an issue, just a curiosity.
ID: 1116732 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1116776 - Posted: 13 Jun 2011, 18:30:47 UTC - in response to Message 1116732.  

When does it go into panic mode? I have a bunch of work units due 6/15. Today is 6/13 and it continues to start up work units with July due dates.

Not an issue, just a curiosity.

It looks at all the work in your cache (not just from SETI, from your other projects as well), and it looks at all the deadlines and all the estimated running times, and tries to predict the future. If it predicts that it will reach, and complete, the tasks before deadline, then it just carries on as normal. It's only if the prediction says that the work won't be completed in time that panic mode kicks in.

Note that I said it looks at when the work will be completed. No allowance for any time spent trying to upload the result, or get through to the servers to report it.

One thing that the calculation does take into account is you 'Connect Interval' - the figure shown in the 'Network Usage' section of your Computing Preferences, as:

Computer is connected to the Internet about every
Leave blank or 0 if always connected.
BOINC will try to maintain at least this much work.

The more I look at this, the sillier it is. I assume that, with work due in two days time but still unstarted, you've set that figure to the recommended '0', and increased "Maintain enough work for an additional..." to some significant number of days. BOINC will then assume that it can wait until the very last possible moment before completing the work, on the assumption that your always-on internet connection will take care of the rest.

But consider what would happen if your work was due this time tomorrow, instead of Wednesday. You might very well be connected to the internet at that time, but it would do no good at all. You can predict, and I can predict, but BOINC doesn't predict that you won't be able to report the work instantly - the servers will be down for weekly maintenance.

Now, looking at it purely from a selfish, personal gain, point of view, that doesn't matter at all. You would still be able to report the work after the maintenance period was over, and even though it was late, you'd still get credit, because there's no way anybody else could complete and report the task before you had a chance to.

But the servers don't think that way. They think a deadline is a deadline, even if it happens in the middle of maintenance. So they'll mark your task in red, create a replacement, and send it out to somebody else. That's an utter waste of bandwidth - the commodity that seems to be in shortest supply at the moment.

I do urge people who desire to run large caches, and run themselves close to missing deadlines (you all know my views on that already) to set at least a 12-hour or even 24-hour Connect Interval. Then panic mode will start 12 or 24 hours earlier, and you stand a much better chance of sending in the work before deadline, even if maintenance or one of these little mini-outages gets in the way. And that'll get the work done with fewer wasteful resends.
ID: 1116776 · Report as offensive
Profile Ed_Anderson

Send message
Joined: 20 Jan 02
Posts: 34
Credit: 4,480,322
RAC: 38
United States
Message 1116794 - Posted: 13 Jun 2011, 19:13:34 UTC - in response to Message 1116776.  
Last modified: 13 Jun 2011, 19:29:59 UTC

Thanks. I don't want to have the deadlines missed. That is why I was asking.

I changed the network connection to 1 day and reduced cache from 4 to 3. That should give me the same cache to cover outages of the servers or network.

Right?

If I understand what you are saying, that should get the older stuff worked sooner, so they don't miss the deadline.

Right?
ID: 1116794 · Report as offensive
Profile Ed_Anderson

Send message
Joined: 20 Jan 02
Posts: 34
Credit: 4,480,322
RAC: 38
United States
Message 1116799 - Posted: 13 Jun 2011, 19:39:37 UTC - in response to Message 1116776.  
Last modified: 13 Jun 2011, 19:40:18 UTC

That did it!

I had two units working that are due in July. I changed the connection time to 1 day from zero and cache to 3 days from 4.

Immediatly the system suspended the July WU and started the June 15 WUs. They are running at high priority.

Bingo!

Thanks!
ID: 1116799 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30692
Credit: 53,134,872
RAC: 32
United States
Message 1116811 - Posted: 13 Jun 2011, 20:43:21 UTC - in response to Message 1116776.  

Deadlines & Maintenance. A smart project would not create a deadline in a scheduled maintenance period. Smart might be to not have any for an hour before and for a couple of hours after the window.

ID: 1116811 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1116843 - Posted: 13 Jun 2011, 22:36:50 UTC - in response to Message 1116811.  

Spend the long weekend, cleaning 4 rigs, cases radiator/fans, but also
checkdisk, defragmentation.

Started with 'troubles'on my i7-2600, still not clear what cause these faults,
the systems suddenly stops, completely hung, which only a hard
RESET can solve.
The crash ocurs when only the GPUs have work, as well MB (SETI Bêta TEST)
Due to the fist crash, the only option, was going back, to a Earlier REStore Date
An install of BOINC 6.12.26, didn't work and restored to 6.10.60 (64bit)
Also 2 hours before a 'new'Incremental' Back-Up, was scheduled.
A new Seagate 250 GByte, gives troubles, the 3rd one in a year, still far too
much.
All the 2 SAMSUNG HDDs, SpinPoint 1000GByte still run for more then 2 years.
And 2 Western Digital 320GByte, 3 years now, as USB 2.0, Back-UP Drives, no
problems. (An 'older' 4 years USB {PHILIPS;Western Digital}, doesn't do nothing
, so probably 'dead'.

These problems, BOINC crashing, FREEZING is the appropriate word.
Only happens when CPU, isn't heavily used, cause only MB with >0.35 AR
are 'available' on this host and the GPU, are 'fully' loaded, ~35-~55% on both cards, running 2 at a time.
(Due, to setting a 'past time', just completed CPDN WU with 1004:33:15 expected
compute time ;), @ 78%, 31 hours to completion.
Dead-Line, about a year. (May 2012) and clearly lost them.)

Have 2 spare 500GByte (also SEAGATE) drives and a RAID array, comes to mind.
Sometimes it's easier, to Reïnstall WINDOWS 7 64Bit Pro, on a RAID configuration.
(Although SATA I & II are available a few years, but in Europe, most PCs sold in shops, use the 'old' (P)ATA IDE compatabillity mode, the "normal users" didn't notice this, until they looked, for whatever reason, how the HDD mode is set, IDE-Compatible), AHCI, RAID, in BIOS.

Sorry for my rude interruption, ofcoarse VERY important ;^), but OFF TOPIC........





ID: 1116843 · Report as offensive
Ianab
Volunteer tester

Send message
Joined: 11 Jun 08
Posts: 732
Credit: 20,635,586
RAC: 5
New Zealand
Message 1116888 - Posted: 14 Jun 2011, 1:29:38 UTC - in response to Message 1116732.  

When does it go into panic mode? I have a bunch of work units due 6/15. Today is 6/13 and it continues to start up work units with July due dates.

Not an issue, just a curiosity.


A common reason would be if you went on holiday and left your machine off for a week.

When it came back online it might see that you had a weeks work to do, but that some units were due in 5 days, and others 30 days. It would go to "panic mode" to get the early deadline ones completed, and not request any more work until it had the deadlines back under control.

Once the system is satisfied nothing is going to miss a deadline, things go back to normal.

I can also happen if the time estimates get messed up. Like thinking AP workunits are going to take 500 hrs or something. It starts them first, until the system gets the estimates correct.

Ian
ID: 1116888 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1116897 - Posted: 14 Jun 2011, 2:00:41 UTC - in response to Message 1116811.  

Deadlines & Maintenance. A smart project would not create a deadline in a scheduled maintenance period. Smart might be to not have any for an hour before and for a couple of hours after the window.

Back when we had a scheduled 3 day maintenance/science effort I proposed a change to BOINC which would allow a project to avoid deadline misses for down time. Basically the idea was that the project would set a configuration flag during down time (either planned or not) and the Transitioner would simply extend the deadline when it saw a miss rather than create new tasks. That was of course ignored.

The project work generation (splitters here) merely sets a max "in progress" duration for a task. The BOINC Scheduler adds that amount of time to "now" when sending a task. So it would be nearly impossible to avoid having some deadlines in maintenance periods, given various durations, reissued tasks, etc. But there are certainly other ways BOINC could be improved to help the project with the issue. Perhaps someone can come up with an idea which would be acceptable to Dr. Anderson.
                                                                  Joe
ID: 1116897 · Report as offensive
Profile Ed_Anderson

Send message
Joined: 20 Jan 02
Posts: 34
Credit: 4,480,322
RAC: 38
United States
Message 1118657 - Posted: 18 Jun 2011, 13:43:24 UTC - in response to Message 1116897.  

[quote] But there are certainly other ways BOINC could be improved to help the project with the issue. Perhaps someone can come up with an idea which would be acceptable to Dr. Anderson.
                                                                  Joe


It ain't perfect but it seems it works most of the time.

I would say, if it isn't sending out a LOT of extra work units then focus your attention elsewhere. But if you are sending out a lot of extra work units, say more than 5%, then this needs to be addressed.

But if it ain't broke, don't fix it.
ID: 1118657 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30692
Credit: 53,134,872
RAC: 32
United States
Message 1118877 - Posted: 19 Jun 2011, 2:57:52 UTC - in response to Message 1116897.  

Perhaps someone can come up with an idea which would be acceptable to Dr. Anderson.

Dr. A does not listen to users. Dr. A listens to project admin's. This is as it should be. Convince half a dozen project Admin's on your idea, and Dr. A will do it.

ID: 1118877 · Report as offensive

Message boards : Number crunching : How does the system schedule work.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.