| Author |
Message |
|
|
|
A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0
V 0.6
Added user settable VLAR and VHAR limits.
Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset.
Any suggestions are welcome.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 45,139,205 RAC: 23,999

|
|
Nice one - just out of interest - does it work with fermi?
I only ask as the current rescheduler 1.9 doesn't support fermi, and takes some hacking of the app info to get it to work - this might put some people off so if you could get it working with fermi that would be a big win imho..
Thanx for your efforts in this..
____________
|
|
|
MadMaCVolunteer tester
 Send message
Joined: 4 Apr 01 Posts: 201 Credit: 45,139,205 RAC: 23,999

|
|
Forgot to mention
Good luck tonight :-)
____________
|
|
|
|
|
Nice one - just out of interest - does it work with fermi?
I only ask as the current rescheduler 1.9 doesn't support fermi, and takes some hacking of the app info to get it to work - this might put some people off so if you could get it working with fermi that would be a big win imho..
Thanx for your efforts in this..
I haven't tested it but it should.
Check the "Simulation mode" in the expert tab.
Press "Run"
Check the C:\Users\fred\AppData\Roaming\eFMer\BoincScheduler\capture\client_state.xml
If the client state looks ok.
Remove the simulating check.
1) Stop the BOINC client
2) Unplug any Internet connections.
3) Make a complete copy of the BOINC folder.
4) Press Run
Check if no tasks are removed.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|
|
|
|
Any chance of getting the source code for this? I've got a Fermi card and can do the testing when I get a chance, but I am generally much more confident when I can dig in and see exactly what the tool is /supposed/ to do, and compare it to what it actually did..
-Scott |
|
|
|
|
|
Thanks for taking the time to do this. This will be a valuable tool for the foreseeable future while they sort things out with the DCF etc.
____________
|
|
|
|
|
|
Seems to work OK for me, but it has not had to do any actual rescheduling yet so I won't know till I get some VLAR's to try it out.
A very minor grizzle, In the "Expert" tab. It should be "BOINC_DATA" folder, not "BOINC" folder.
Thanks for your efforts.
Brodo |
|
|
|
|
A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0
V 0.6
Added user settable VLAR and VHAR limits.
Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset.
Any suggestions are welcome.
How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? That would be an alternative to applying the scaling factors to the bound while shifting work, and applied to all workunits for the SETI@home project (not just those being shifted). The value is much larger than any bound the splitters are now producing for either MB or AP, to allow for future 'Moore's Law' growth as well as any changes the project is likely to make. The "if they are lower" is also a safety factor in case the server-side adjustments get really whacky or a really revolutionary processing breakthrough happens.
Basically the idea is to protect against the -177 "Maximum Elapsed time exceeded" error even when doing VLAR tasks on a GPU which is extremely fast at midrange but extremely crippled by VLARs, and assuming the server-side estimates and bounds adjustments won't be compatible with running at a low DCF. Joe |
|
|
|
|
A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0
V 0.6
Added user settable VLAR and VHAR limits.
Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset.
Any suggestions are welcome.
How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? That would be an alternative to applying the scaling factors to the bound while shifting work, and applied to all workunits for the SETI@home project (not just those being shifted). The value is much larger than any bound the splitters are now producing for either MB or AP, to allow for future 'Moore's Law' growth as well as any changes the project is likely to make. The "if they are lower" is also a safety factor in case the server-side adjustments get really whacky or a really revolutionary processing breakthrough happens.
Basically the idea is to protect against the -177 "Maximum Elapsed time exceeded" error even when doing VLAR tasks on a GPU which is extremely fast at midrange but extremely crippled by VLARs, and assuming the server-side estimates and bounds adjustments won't be compatible with running at a low DCF. Joe
As I have no idea what is going to happen. The rsc_fpops_bound and rsc_fpops_est are set at the calculate value, not the server value.
But only when a task is moved one way or the other.
For now I wait and see and tackle a problem when it's there.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|
|
|
How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? Joe[/pre]
Why o why can't you do that on the sever side, until things stabilize.
That's way easier.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|
|
|
How about an option to simply set all rsc_fpops_bound values to 5e+17
I did some checking and implemented some code.
From my 860 tasks about 600 fall in this range.
So that means changing about all the WU from the set value by the Server.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|
|
|
How about an option to simply set all rsc_fpops_bound values to 5e+17
I did some checking and implemented some code.
From my 860 tasks about 600 fall in this range.
So that means changing about all the WU from the set value by the Server.
All 860 tasks should be considerably less than 5e17, the largest possible value produced by the mb_splitter is 2.2275734e15 and all AP tasks have 1.821...e16. I deliberately chose a higher value to cover possible future changes such as S@H Enhanced moving to WUs with more data, etc.
I was thinking in terms of providing a better alternative to the technique of replacing all <rsc_fpops_bound> strings in a client_state.xml file with <rsc_fpops_bound>3 which some are using. That technique grows the bounds for all projects and tasks, and if used hourly on a host with a ten day cache the bound values would end up greater than 3.333e240 . If the host were also doing CPDN work the bound would overflow the range of a double long before a task completed. If used only once every two days the technique is probably safe, though.
Restricting changes to SETI@home, using a technique which doesn't grow the value indefinitely, and including it in a program which already has automatic periodic run capability are all improvements. But it wouldn't be sensible for anyone running stock builds on a multicore AMD host to use the option, since the stock S@H Enhanced builds have a tendency to hang in the "Choosing optimal functions" testing on those systems.
For that matter, there is some risk that any program may occasionally hang on any kind of system and that's why rsc_fpops_bound exists. Because BOINC must support many existing projects and should be prepared to support many more in future, I don't think Dr. Anderson would consider doing other than scaling the bound the same amount as the estimate. Just possibly he'd consider a sanity check approach which would establish a minimum bound value based on an estimated ten minutes or so. Outside that, he'd consider it a duty of the project to produce a sensible bound. For both S@H Enhanced and Astropulse, the splitter produces bounds which are exactly ten times the estimate if the host has a DCF of 1.0 but in practice many hosts have been running with lower DCF so the effective bound has been larger. I think Eric Korpela might consider raising the multiple moderately, but probably less than 50 times the estimate. I doubt it would be a crash priority change, though, so could take awhile. Joe
|
|
|
|
|
All 860 tasks should be considerably less than 5e17, the largest possible value produced by the mb_splitter is 2.2275734e15 and all AP tasks have 1.821...e16. I deliberately chose a higher value to cover possible future changes such as S@H Enhanced moving to WUs with more data, etc.
I was thinking in terms of providing a better alternative to the technique of replacing all <rsc_fpops_bound> strings in a client_state.xml file with <rsc_fpops_bound>3 which some are using. That technique grows the bounds for all projects and tasks, and if used hourly on a host with a ten day cache the bound values would end up greater than 3.333e240 . If the host were also doing CPDN work the bound would overflow the range of a double long before a task completed. If used only once every two days the technique is probably safe, though.
Restricting changes to SETI@home, using a technique which doesn't grow the value indefinitely, and including it in a program which already has automatic periodic run capability are all improvements. But it wouldn't be sensible for anyone running stock builds on a multicore AMD host to use the option, since the stock S@H Enhanced builds have a tendency to hang in the "Choosing optimal functions" testing on those systems.
For that matter, there is some risk that any program may occasionally hang on any kind of system and that's why rsc_fpops_bound exists. Because BOINC must support many existing projects and should be prepared to support many more in future, I don't think Dr. Anderson would consider doing other than scaling the bound the same amount as the estimate. Just possibly he'd consider a sanity check approach which would establish a minimum bound value based on an estimated ten minutes or so. Outside that, he'd consider it a duty of the project to produce a sensible bound. For both S@H Enhanced and Astropulse, the splitter produces bounds which are exactly ten times the estimate if the host has a DCF of 1.0 but in practice many hosts have been running with lower DCF so the effective bound has been larger. I think Eric Korpela might consider raising the multiple moderately, but probably less than 50 times the estimate. I doubt it would be a crash priority change, though, so could take awhile. Joe
For now a made an expert check to raise all bounds to at least the value you suggested.
And only for the SETI enhanced bounds.
That will at least eliminate all -177 error.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|
jason_gee Volunteer developer Volunteer tester
 Send message
Joined: 24 Nov 06 Posts: 4189 Credit: 61,379,744 RAC: 33,666

|
|
Hi Fred,
Am using v0.6 on both my machines at the moment & works great so far. One small usability suggestion: When you run the program, and it is already running, it complains with a dialog. Could you open the first instance instead? (bringing it to open foreground window state). That would make things easier on my p4, where the tray icons don't always appear at startup for some time (even though the tasks are running properly).
Cheers, Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin
|
|
|
|
|
Hi Fred,
Am using v0.6 on both my machines at the moment & works great so far. One small usability suggestion: When you run the program, and it is already running, it complains with a dialog. Could you open the first instance instead? (bringing it to open foreground window state). That would make things easier on my p4, where the tray icons don't always appear at startup for some time (even though the tasks are running properly).
Cheers, Jason
That's an interesting problem and a good idea. Will not be in the next version that's nearly ready.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|
|
|
|
Fred
i am also using your 0.6 app and its working fine (win7 64bit) (also using your BoincTasks - also works well)
0.6 even allows you to move part run wu back and forth between cpu and gpu without problem - cool - not that you would do this a lot!
One thing i have noticed is that after you run the reschedule your app runs up boinc in the background - i.e. its "minimised" and not shown in the task try - could this be changed or put as an option - boinc runs up fine when you manually run it :)
from looking at the log it appears your app would shut down boinc if it was live before the changes are made - not tried this as i manually shut boinc down before i reschedule
One other thing that would be nice is to be able to set the app to run up with windows even if you are not going to put it in automatic mode in the settings options
____________
Tim
|
|
|
|
|
Fred
i am also using your 0.6 app and its working fine (win7 64bit) (also using your BoincTasks - also works well)
0.6 even allows you to move part run wu back and forth between cpu and gpu without problem - cool - not that you would do this a lot!
1) One thing i have noticed is that after you run the reschedule your app runs up boinc in the background - i.e. its "minimised" and not shown in the task try - could this be changed or put as an option - boinc runs up fine when you manually run it :)
2)from looking at the log it appears your app would shut down boinc if it was live before the changes are made - not tried this as i manually shut boinc down before i reschedule
One other thing that would be nice is to be able to set the app to run up with windows even if you are not going to put it in automatic mode in the settings options
1) I'm not exactly get what you're asking, but there is one option in the settings tab that allows you to show or hide the rescheduler when starting.
2) It shuts down BOINC before the rescheduling starts. It needs a stable capture.
3) I will add an auto start at login, in one of the next versions.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|
|
|
|
New version 0.7
Added: Expert tab: A check to enable limiting the rsc_fpops_bound value to 5e17. To avoid -177 errors
Added: Expert tab: Option to include / exclude active tasks from rescheduling.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|
|
|
|
After checking the expert tab the following messages start appearing:
13 July 2010 - 13:31:29 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:29 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:29 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:29 Rescheduling needed
13 July 2010 - 13:31:29 Shutting down BOINC client
13 July 2010 - 13:31:31 Shutdown of BOINC client completed
13 July 2010 - 13:31:32 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:32 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:32 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:33 Rescheduling CPU version: 603 ,Gpu version: 608 planclass: cuda
13 July 2010 - 13:31:34 Copied rescheduled client_state.xml
13 July 2010 - 13:31:34 Starting the BOINC client
13 July 2010 - 13:31:34 Moved to Cpu: 0, VLAR 0, VHAR 0 -- Moved to Gpu: 0, VLAR 0, VHAR 0
The next run reports the same total, so it seems nothing was changed ?
____________
|
|
|
|
|
After checking the expert tab the following messages start appearing:
13 July 2010 - 13:31:29 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:29 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:29 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:29 Rescheduling needed
13 July 2010 - 13:31:29 Shutting down BOINC client
13 July 2010 - 13:31:31 Shutdown of BOINC client completed
13 July 2010 - 13:31:32 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:32 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:32 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:33 Rescheduling CPU version: 603 ,Gpu version: 608 planclass: cuda
13 July 2010 - 13:31:34 Copied rescheduled client_state.xml
13 July 2010 - 13:31:34 Starting the BOINC client
13 July 2010 - 13:31:34 Moved to Cpu: 0, VLAR 0, VHAR 0 -- Moved to Gpu: 0, VLAR 0, VHAR 0
The next run reports the same total, so it seems nothing was changed ?
The problem is that the memory copy was not updated, the BOINC state file is updated correctly. Nothing wrong there.
But that is not as it should be and a bit confusing. I will correct this in V 0.8 ... expect it soon.
____________
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
|
|