New rescheduler


log in

Advanced search

Message boards : Number crunching : New rescheduler

1 · 2 · 3 · 4 . . . 11 · Next
Author Message
Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1014652 - Posted: 11 Jul 2010, 14:01:59 UTC

A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0

V 0.6

Added user settable VLAR and VHAR limits.
Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset.

Any suggestions are welcome.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 3
United Kingdom
Message 1014656 - Posted: 11 Jul 2010, 14:29:38 UTC - in response to Message 1014652.

Nice one - just out of interest - does it work with fermi?
I only ask as the current rescheduler 1.9 doesn't support fermi, and takes some hacking of the app info to get it to work - this might put some people off so if you could get it working with fermi that would be a big win imho..

Thanx for your efforts in this..
____________

Profile MadMaC
Volunteer tester
Avatar
Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 3
United Kingdom
Message 1014657 - Posted: 11 Jul 2010, 14:30:33 UTC - in response to Message 1014652.

Forgot to mention

Good luck tonight :-)
____________

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1014664 - Posted: 11 Jul 2010, 14:50:08 UTC - in response to Message 1014656.

Nice one - just out of interest - does it work with fermi?
I only ask as the current rescheduler 1.9 doesn't support fermi, and takes some hacking of the app info to get it to work - this might put some people off so if you could get it working with fermi that would be a big win imho..

Thanx for your efforts in this..

I haven't tested it but it should.
Check the "Simulation mode" in the expert tab.
Press "Run"
Check the C:\Users\fred\AppData\Roaming\eFMer\BoincScheduler\capture\client_state.xml
If the client state looks ok.

Remove the simulating check.

1) Stop the BOINC client
2) Unplug any Internet connections.
3) Make a complete copy of the BOINC folder.
4) Press Run

Check if no tasks are removed.




____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

sarmitage
Send message
Joined: 2 Dec 09
Posts: 56
Credit: 1,123,857
RAC: 0
Canada
Message 1014680 - Posted: 11 Jul 2010, 15:36:45 UTC

Any chance of getting the source code for this? I've got a Fermi card and can do the testing when I get a chance, but I am generally much more confident when I can dig in and see exactly what the tool is /supposed/ to do, and compare it to what it actually did..

-Scott

Bearcat
Send message
Joined: 10 Sep 99
Posts: 106
Credit: 10,778,506
RAC: 0
United States
Message 1014687 - Posted: 11 Jul 2010, 15:59:37 UTC - in response to Message 1014664.

Thanks for taking the time to do this. This will be a valuable tool for the foreseeable future while they sort things out with the DCF etc.
____________

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1627
Credit: 200,945,558
RAC: 52,292
Australia
Message 1014697 - Posted: 11 Jul 2010, 16:29:45 UTC

Seems to work OK for me, but it has not had to do any actual rescheduling yet so I won't know till I get some VLAR's to try it out.

A very minor grizzle, In the "Expert" tab. It should be "BOINC_DATA" folder, not "BOINC" folder.

Thanks for your efforts.

Brodo

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4141
Credit: 1,005,254
RAC: 266
United States
Message 1014848 - Posted: 12 Jul 2010, 0:51:52 UTC - in response to Message 1014652.

A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0

V 0.6

Added user settable VLAR and VHAR limits.
Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset.

Any suggestions are welcome.

How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? That would be an alternative to applying the scaling factors to the bound while shifting work, and applied to all workunits for the SETI@home project (not just those being shifted). The value is much larger than any bound the splitters are now producing for either MB or AP, to allow for future 'Moore's Law' growth as well as any changes the project is likely to make. The "if they are lower" is also a safety factor in case the server-side adjustments get really whacky or a really revolutionary processing breakthrough happens.

Basically the idea is to protect against the -177 "Maximum Elapsed time exceeded" error even when doing VLAR tasks on a GPU which is extremely fast at midrange but extremely crippled by VLARs, and assuming the server-side estimates and bounds adjustments won't be compatible with running at a low DCF.
Joe

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1014929 - Posted: 12 Jul 2010, 7:28:38 UTC - in response to Message 1014848.

A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0

V 0.6

Added user settable VLAR and VHAR limits.
Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset.

Any suggestions are welcome.

How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? That would be an alternative to applying the scaling factors to the bound while shifting work, and applied to all workunits for the SETI@home project (not just those being shifted). The value is much larger than any bound the splitters are now producing for either MB or AP, to allow for future 'Moore's Law' growth as well as any changes the project is likely to make. The "if they are lower" is also a safety factor in case the server-side adjustments get really whacky or a really revolutionary processing breakthrough happens.

Basically the idea is to protect against the -177 "Maximum Elapsed time exceeded" error even when doing VLAR tasks on a GPU which is extremely fast at midrange but extremely crippled by VLARs, and assuming the server-side estimates and bounds adjustments won't be compatible with running at a low DCF.
Joe

As I have no idea what is going to happen. The rsc_fpops_bound and rsc_fpops_est are set at the calculate value, not the server value.
But only when a task is moved one way or the other.

For now I wait and see and tackle a problem when it's there.

____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1014933 - Posted: 12 Jul 2010, 7:41:07 UTC - in response to Message 1014848.


How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? Joe[/pre]

Why o why can't you do that on the sever side, until things stabilize.
That's way easier.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1014940 - Posted: 12 Jul 2010, 9:39:58 UTC - in response to Message 1014848.


How about an option to simply set all rsc_fpops_bound values to 5e+17

I did some checking and implemented some code.
From my 860 tasks about 600 fall in this range.
So that means changing about all the WU from the set value by the Server.

____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4141
Credit: 1,005,254
RAC: 266
United States
Message 1015017 - Posted: 12 Jul 2010, 15:52:50 UTC - in response to Message 1014940.


How about an option to simply set all rsc_fpops_bound values to 5e+17

I did some checking and implemented some code.
From my 860 tasks about 600 fall in this range.
So that means changing about all the WU from the set value by the Server.

All 860 tasks should be considerably less than 5e17, the largest possible value produced by the mb_splitter is 2.2275734e15 and all AP tasks have 1.821...e16. I deliberately chose a higher value to cover possible future changes such as S@H Enhanced moving to WUs with more data, etc.

I was thinking in terms of providing a better alternative to the technique of replacing all <rsc_fpops_bound> strings in a client_state.xml file with <rsc_fpops_bound>3 which some are using. That technique grows the bounds for all projects and tasks, and if used hourly on a host with a ten day cache the bound values would end up greater than 3.333e240 . If the host were also doing CPDN work the bound would overflow the range of a double long before a task completed. If used only once every two days the technique is probably safe, though.

Restricting changes to SETI@home, using a technique which doesn't grow the value indefinitely, and including it in a program which already has automatic periodic run capability are all improvements. But it wouldn't be sensible for anyone running stock builds on a multicore AMD host to use the option, since the stock S@H Enhanced builds have a tendency to hang in the "Choosing optimal functions" testing on those systems.

For that matter, there is some risk that any program may occasionally hang on any kind of system and that's why rsc_fpops_bound exists. Because BOINC must support many existing projects and should be prepared to support many more in future, I don't think Dr. Anderson would consider doing other than scaling the bound the same amount as the estimate. Just possibly he'd consider a sanity check approach which would establish a minimum bound value based on an estimated ten minutes or so. Outside that, he'd consider it a duty of the project to produce a sensible bound. For both S@H Enhanced and Astropulse, the splitter produces bounds which are exactly ten times the estimate if the host has a DCF of 1.0 but in practice many hosts have been running with lower DCF so the effective bound has been larger. I think Eric Korpela might consider raising the multiple moderately, but probably less than 50 times the estimate. I doubt it would be a crash priority change, though, so could take awhile.
Joe

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1015021 - Posted: 12 Jul 2010, 16:03:45 UTC - in response to Message 1015017.


All 860 tasks should be considerably less than 5e17, the largest possible value produced by the mb_splitter is 2.2275734e15 and all AP tasks have 1.821...e16. I deliberately chose a higher value to cover possible future changes such as S@H Enhanced moving to WUs with more data, etc.

I was thinking in terms of providing a better alternative to the technique of replacing all <rsc_fpops_bound> strings in a client_state.xml file with <rsc_fpops_bound>3 which some are using. That technique grows the bounds for all projects and tasks, and if used hourly on a host with a ten day cache the bound values would end up greater than 3.333e240 . If the host were also doing CPDN work the bound would overflow the range of a double long before a task completed. If used only once every two days the technique is probably safe, though.

Restricting changes to SETI@home, using a technique which doesn't grow the value indefinitely, and including it in a program which already has automatic periodic run capability are all improvements. But it wouldn't be sensible for anyone running stock builds on a multicore AMD host to use the option, since the stock S@H Enhanced builds have a tendency to hang in the "Choosing optimal functions" testing on those systems.

For that matter, there is some risk that any program may occasionally hang on any kind of system and that's why rsc_fpops_bound exists. Because BOINC must support many existing projects and should be prepared to support many more in future, I don't think Dr. Anderson would consider doing other than scaling the bound the same amount as the estimate. Just possibly he'd consider a sanity check approach which would establish a minimum bound value based on an estimated ten minutes or so. Outside that, he'd consider it a duty of the project to produce a sensible bound. For both S@H Enhanced and Astropulse, the splitter produces bounds which are exactly ten times the estimate if the host has a DCF of 1.0 but in practice many hosts have been running with lower DCF so the effective bound has been larger. I think Eric Korpela might consider raising the multiple moderately, but probably less than 50 times the estimate. I doubt it would be a crash priority change, though, so could take awhile.
Joe


For now a made an expert check to raise all bounds to at least the value you suggested.
And only for the SETI enhanced bounds.
That will at least eliminate all -177 error.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4814
Credit: 71,640,663
RAC: 9,507
Australia
Message 1015042 - Posted: 12 Jul 2010, 17:28:44 UTC

Hi Fred,
Am using v0.6 on both my machines at the moment & works great so far. One small usability suggestion: When you run the program, and it is already running, it complains with a dialog. Could you open the first instance instead? (bringing it to open foreground window state). That would make things easier on my p4, where the tray icons don't always appear at startup for some time (even though the tasks are running properly).

Cheers, Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1015069 - Posted: 12 Jul 2010, 18:44:09 UTC - in response to Message 1015042.

Hi Fred,
Am using v0.6 on both my machines at the moment & works great so far. One small usability suggestion: When you run the program, and it is already running, it complains with a dialog. Could you open the first instance instead? (bringing it to open foreground window state). That would make things easier on my p4, where the tray icons don't always appear at startup for some time (even though the tasks are running properly).

Cheers, Jason

That's an interesting problem and a good idea. Will not be in the next version that's nearly ready.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

Profile Tim Norton
Volunteer tester
Avatar
Send message
Joined: 2 Jun 99
Posts: 835
Credit: 33,540,164
RAC: 0
United Kingdom
Message 1015141 - Posted: 12 Jul 2010, 20:41:28 UTC

Fred

i am also using your 0.6 app and its working fine (win7 64bit) (also using your BoincTasks - also works well)

0.6 even allows you to move part run wu back and forth between cpu and gpu without problem - cool - not that you would do this a lot!

One thing i have noticed is that after you run the reschedule your app runs up boinc in the background - i.e. its "minimised" and not shown in the task try - could this be changed or put as an option - boinc runs up fine when you manually run it :)

from looking at the log it appears your app would shut down boinc if it was live before the changes are made - not tried this as i manually shut boinc down before i reschedule

One other thing that would be nice is to be able to set the app to run up with windows even if you are not going to put it in automatic mode in the settings options
____________
Tim

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1015286 - Posted: 13 Jul 2010, 5:57:48 UTC - in response to Message 1015141.

Fred

i am also using your 0.6 app and its working fine (win7 64bit) (also using your BoincTasks - also works well)

0.6 even allows you to move part run wu back and forth between cpu and gpu without problem - cool - not that you would do this a lot!

1) One thing i have noticed is that after you run the reschedule your app runs up boinc in the background - i.e. its "minimised" and not shown in the task try - could this be changed or put as an option - boinc runs up fine when you manually run it :)

2)from looking at the log it appears your app would shut down boinc if it was live before the changes are made - not tried this as i manually shut boinc down before i reschedule

One other thing that would be nice is to be able to set the app to run up with windows even if you are not going to put it in automatic mode in the settings options

1) I'm not exactly get what you're asking, but there is one option in the settings tab that allows you to show or hide the rescheduler when starting.
2) It shuts down BOINC before the rescheduling starts. It needs a stable capture.
3) I will add an auto start at login, in one of the next versions.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1015292 - Posted: 13 Jul 2010, 6:48:48 UTC - in response to Message 1015286.

New version 0.7

Added: Expert tab: A check to enable limiting the rsc_fpops_bound value to 5e17. To avoid -177 errors
Added: Expert tab: Option to include / exclude active tasks from rescheduling.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

[DPC] hansR
Volunteer tester
Avatar
Send message
Joined: 14 Jul 00
Posts: 39
Credit: 99,011,800
RAC: 72,827
Netherlands
Message 1015314 - Posted: 13 Jul 2010, 11:42:46 UTC - in response to Message 1015292.

After checking the expert tab the following messages start appearing:

13 July 2010 - 13:31:29 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:29 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:29 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:29 Rescheduling needed

13 July 2010 - 13:31:29 Shutting down BOINC client
13 July 2010 - 13:31:31 Shutdown of BOINC client completed
13 July 2010 - 13:31:32 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:32 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:32 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:33 Rescheduling CPU version: 603 ,Gpu version: 608 planclass: cuda
13 July 2010 - 13:31:34 Copied rescheduled client_state.xml
13 July 2010 - 13:31:34 Starting the BOINC client
13 July 2010 - 13:31:34 Moved to Cpu: 0, VLAR 0, VHAR 0 -- Moved to Gpu: 0, VLAR 0, VHAR 0

The next run reports the same total, so it seems nothing was changed ?
____________

Profile S@NL - eFMer - www.efmer.eu/boinc
Volunteer tester
Avatar
Send message
Joined: 7 Jun 99
Posts: 512
Credit: 122,604,114
RAC: 244
United States
Message 1015326 - Posted: 13 Jul 2010, 13:06:05 UTC - in response to Message 1015314.

After checking the expert tab the following messages start appearing:

13 July 2010 - 13:31:29 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:29 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:29 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:29 Rescheduling needed

13 July 2010 - 13:31:29 Shutting down BOINC client
13 July 2010 - 13:31:31 Shutdown of BOINC client completed
13 July 2010 - 13:31:32 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:32 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:32 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:33 Rescheduling CPU version: 603 ,Gpu version: 608 planclass: cuda
13 July 2010 - 13:31:34 Copied rescheduled client_state.xml
13 July 2010 - 13:31:34 Starting the BOINC client
13 July 2010 - 13:31:34 Moved to Cpu: 0, VLAR 0, VHAR 0 -- Moved to Gpu: 0, VLAR 0, VHAR 0

The next run reports the same total, so it seems nothing was changed ?

The problem is that the memory copy was not updated, the BOINC state file is updated correctly. Nothing wrong there.
But that is not as it should be and a bit confusing. I will correct this in V 0.8 ... expect it soon.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.

1 · 2 · 3 · 4 . . . 11 · Next

Message boards : Number crunching : New rescheduler

Copyright © 2014 University of California