Advance (Jun 25 2009)


log in

Advanced search

Message boards : Technical News : Advance (Jun 25 2009)

Previous · 1 · 2 · 3 · 4
Author Message
1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 912390 - Posted: 28 Jun 2009, 18:04:30 UTC - in response to Message 912295.

I can't understand why Boinc doesn't use
Always EDF without interruping current WU
per project?

EDIT: EDF here is without panic mode, i mean

We're going to keep coming back to multiple projects, because while you may not care, a lot of people do, and it is one of the requirements that BOINC must meet.

Deadlines can range from less than a half-hour to more than a year, processing is generally proportional to deadlines (a half-hour deadline may take 3 minutes of CPU, while a one-year deadline could take 3 or 4 months of CPU).

If we did strict EDF, most long work units (CPDN units) would be in big trouble before they even start processing. Any work downloaded from any other project would have a shorter deadline, and stop CPDN.

The normal scheduler mode is "round robin" -- work inside a project is done in "downloaded" order, and the projects get time based on resource share (and managed through short-term debt).

The scheduler runs a simulation and checks to see if all work will meet deadines if crunched in round-robin order. If not, it uses EDF to "get rid" of some of the work that is at-risk.

That's why, if you carry a big cache, and split time evenly between two projects, that there will always be times when you are devoting all of your resources to just one -- because the debts became imbalanced due to outages or EDF, and are being re-balanced.
____________

EPG
Send message
Joined: 3 Apr 99
Posts: 110
Credit: 10,405,863
RAC: 0
Hungary
Message 912425 - Posted: 28 Jun 2009, 19:16:57 UTC - in response to Message 912390.


We're going to keep coming back to multiple projects, because while you may not care, a lot of people do, and it is one of the requirements that BOINC must meet.

I do, if you check my sig, you can see, i do/did multiple projects.



If we did strict EDF, most long work units (CPDN units) would be in big trouble before they even start processing. Any work downloaded from any other project would have a shorter deadline, and stop CPDN.
The normal scheduler mode is "round robin" -- work inside a project is done in "downloaded" order, and the projects get time based on resource share (and managed through short-term debt).

But I advice EDF only for choosing for next work inside a project. All I asking why boinc use the "downloaded" order inside, i have no problem with the round-robin project selection.

Example: 1 cpu, 1 project with wus that takes 2 days to compute, cache 5 days.
We got wu A with deadline 10 days from now, and wu B with deadline 5 days from now. Wu A arrived first. And now we have to choose the next wu. Current Boinc would choose A then B, and the computer could finish it normally.
But if we got a electric blackout for 3 days after the first day then it can't do it. Using EDF in this scenario would solve it.
If there are multiple project on this comp. then "fifo goes into edf" maybe can solve it with debt, but edf would solve it without debt.

____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24391
Credit: 519,750
RAC: 26
United States
Message 912436 - Posted: 28 Jun 2009, 19:58:41 UTC - in response to Message 912425.


We're going to keep coming back to multiple projects, because while you may not care, a lot of people do, and it is one of the requirements that BOINC must meet.

I do, if you check my sig, you can see, i do/did multiple projects.



If we did strict EDF, most long work units (CPDN units) would be in big trouble before they even start processing. Any work downloaded from any other project would have a shorter deadline, and stop CPDN.
The normal scheduler mode is "round robin" -- work inside a project is done in "downloaded" order, and the projects get time based on resource share (and managed through short-term debt).

But I advice EDF only for choosing for next work inside a project. All I asking why boinc use the "downloaded" order inside, i have no problem with the round-robin project selection.

Example: 1 cpu, 1 project with wus that takes 2 days to compute, cache 5 days.
We got wu A with deadline 10 days from now, and wu B with deadline 5 days from now. Wu A arrived first. And now we have to choose the next wu. Current Boinc would choose A then B, and the computer could finish it normally.
But if we got a electric blackout for 3 days after the first day then it can't do it. Using EDF in this scenario would solve it.
If there are multiple project on this comp. then "fifo goes into edf" maybe can solve it with debt, but edf would solve it without debt.

EDF only within a project does now work well either as some projectsm, including SETI, have tasks with varying computation requirements and deadlines. If you keep the queue topped off, and the queue is longer than the time for a single one of the long duration tasks, and you are unlucy enough to get a bunch of thw short duration short deadline tasks, then it is entirely possible that computation will not start on the long duration task until after it is too late for it to complete. Using FIFO when possible tends to clear out the long duration tasks when EDF is not absoloutely required in oreder to meet deadlines.

There is a case for GPU tasks to run to completion, if possible, before a task switch as the cost of the task switch is very high. This does have a detrimental effect on the interesting assortment of work done on the computer, but in the case of GPU tasks, there are likely to be CPU tasks running as well, so this argument is less valid. If using FIFO only, even between projects, then the work fetch needs to be modified such that it does not fetch work from anywhere if there is enough work in the queue to fill it. The work would not be started for a long time anyway, so there is no need to download it.
____________


BOINC WIKI

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 912457 - Posted: 28 Jun 2009, 20:44:34 UTC - in response to Message 912425.


But I advice EDF only for choosing for next work inside a project. All I asking why boinc use the "downloaded" order inside, i have no problem with the round-robin project selection.

If you really feel strongly about this, you can download the BOINC source code, compile it, and test it out.

I went through the same process you're doing now.

As a general rule, round-robin works best. The oldest work units make progress toward completion, and even if you download work with shorter deadlines, BOINC continues to work on tasks that are started.

If BOINC finds that round-robin scheduling won't complete work on time, then throwing everything at the shortest deadlines will generally avoid trouble.

This works because it handles the normal cases normally, and the exceptional cases exceptionally.
____________

Chelski
Avatar
Send message
Joined: 3 Jan 00
Posts: 121
Credit: 8,816,929
RAC: 901
Malaysia
Message 912550 - Posted: 29 Jun 2009, 7:18:58 UTC - in response to Message 912193.

To get around BOINC limitation on the number of WUs, one work around is to let your PC live a double life with 2 separate BOINC instances, each linked to a unique computer name. Take some baby sitting to maintain but can be done. But unfortunately will effectively half the RAC. Did it for a while to feed BOINC to a PC that have a dead LAN connection, but at the end gave up coz the trouble required to change pc name, reboot, then only let BOINC connect, etc...
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 912643 - Posted: 29 Jun 2009, 15:57:30 UTC - in response to Message 912550.

To get around BOINC limitation on the number of WUs, one work around is to let your PC live a double life with 2 separate BOINC instances, each linked to a unique computer name. Take some baby sitting to maintain but can be done. But unfortunately will effectively half the RAC. Did it for a while to feed BOINC to a PC that have a dead LAN connection, but at the end gave up coz the trouble required to change pc name, reboot, then only let BOINC connect, etc...

If you run a computer half-time, BOINC will figure out how much time it actually runs (remember, BOINC works on computers that are used a few hours per day and then turned off when they aren't being used) and reduce the queue appropriately.

So, if you take one computer and make it look like two computers running a half-day, the cache on each will be reduced by half.
____________

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7062
Credit: 60,022,278
RAC: 21,402
Germany
Message 912764 - Posted: 30 Jun 2009, 0:53:59 UTC - in response to Message 912261.

...
I first realised this Policy wasn't working in 6.6.31, but had seen it in earlier vesions as well,
Anyway I posted to the Boinc Alpha list about this and the result is:

Changeset 18503

Author: davea
Message: - client: when suspending a GPU job,


always remove it from memory, even if it hasn't checkpointed.
Otherwise we'll typically run another GPU job right away,
and it will bomb out or revert to CPU mode because it
can't allocate video RAM


I think this will be a partial fix, it won't stop the switching of WU's, But it should stop Cuda apps being left in memory and causing CPU fallback mode on later WU's,
and then all hell breaking out.

Claggy


Thanks a lot! :-)

I have no knowledge about the 'Changeset area'.
The Devs are now informed and will change BOINC?

____________
BR



>Das Deutsche Cafe. The German Cafe.<

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24391
Credit: 519,750
RAC: 26
United States
Message 913043 - Posted: 1 Jul 2009, 21:55:25 UTC - in response to Message 912764.

...
I first realised this Policy wasn't working in 6.6.31, but had seen it in earlier vesions as well,
Anyway I posted to the Boinc Alpha list about this and the result is:

Changeset 18503

Author: davea
Message: - client: when suspending a GPU job,


always remove it from memory, even if it hasn't checkpointed.
Otherwise we'll typically run another GPU job right away,
and it will bomb out or revert to CPU mode because it
can't allocate video RAM


I think this will be a partial fix, it won't stop the switching of WU's, But it should stop Cuda apps being left in memory and causing CPU fallback mode on later WU's,
and then all hell breaking out.

Claggy


Thanks a lot! :-)

I have no knowledge about the 'Changeset area'.
The Devs are now informed and will change BOINC?

A changeset is a change to the program. In other words the code has been changed, not it has to be tested and released.

DaveA is the lead developer.
____________


BOINC WIKI

Previous · 1 · 2 · 3 · 4

Message boards : Technical News : Advance (Jun 25 2009)

Copyright © 2014 University of California