New rescheduler

Message boards : Number crunching : New rescheduler
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 11 · Next

AuthorMessage
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1014652 - Posted: 11 Jul 2010, 14:01:59 UTC

A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0

V 0.6

Added user settable VLAR and VHAR limits.
Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset.

Any suggestions are welcome.
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1014652 · Report as offensive
Profile MadMaC
Volunteer tester
Avatar

Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1014656 - Posted: 11 Jul 2010, 14:29:38 UTC - in response to Message 1014652.  

Nice one - just out of interest - does it work with fermi?
I only ask as the current rescheduler 1.9 doesn't support fermi, and takes some hacking of the app info to get it to work - this might put some people off so if you could get it working with fermi that would be a big win imho..

Thanx for your efforts in this..
ID: 1014656 · Report as offensive
Profile MadMaC
Volunteer tester
Avatar

Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1014657 - Posted: 11 Jul 2010, 14:30:33 UTC - in response to Message 1014652.  

Forgot to mention

Good luck tonight :-)
ID: 1014657 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1014664 - Posted: 11 Jul 2010, 14:50:08 UTC - in response to Message 1014656.  

Nice one - just out of interest - does it work with fermi?
I only ask as the current rescheduler 1.9 doesn't support fermi, and takes some hacking of the app info to get it to work - this might put some people off so if you could get it working with fermi that would be a big win imho..

Thanx for your efforts in this..

I haven't tested it but it should.
Check the "Simulation mode" in the expert tab.
Press "Run"
Check the C:\Users\fred\AppData\Roaming\eFMer\BoincScheduler\capture\client_state.xml
If the client state looks ok.

Remove the simulating check.

1) Stop the BOINC client
2) Unplug any Internet connections.
3) Make a complete copy of the BOINC folder.
4) Press Run

Check if no tasks are removed.




TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1014664 · Report as offensive
sarmitage

Send message
Joined: 2 Dec 09
Posts: 56
Credit: 1,123,857
RAC: 0
Canada
Message 1014680 - Posted: 11 Jul 2010, 15:36:45 UTC

Any chance of getting the source code for this? I've got a Fermi card and can do the testing when I get a chance, but I am generally much more confident when I can dig in and see exactly what the tool is /supposed/ to do, and compare it to what it actually did..

-Scott
ID: 1014680 · Report as offensive
Bearcat

Send message
Joined: 10 Sep 99
Posts: 106
Credit: 10,778,506
RAC: 0
United States
Message 1014687 - Posted: 11 Jul 2010, 15:59:37 UTC - in response to Message 1014664.  

Thanks for taking the time to do this. This will be a valuable tool for the foreseeable future while they sort things out with the DCF etc.
ID: 1014687 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1014697 - Posted: 11 Jul 2010, 16:29:45 UTC

Seems to work OK for me, but it has not had to do any actual rescheduling yet so I won't know till I get some VLAR's to try it out.

A very minor grizzle, In the "Expert" tab. It should be "BOINC_DATA" folder, not "BOINC" folder.

Thanks for your efforts.

Brodo
ID: 1014697 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1014848 - Posted: 12 Jul 2010, 0:51:52 UTC - in response to Message 1014652.  

A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0

V 0.6

Added user settable VLAR and VHAR limits.
Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset.

Any suggestions are welcome.

How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? That would be an alternative to applying the scaling factors to the bound while shifting work, and applied to all workunits for the SETI@home project (not just those being shifted). The value is much larger than any bound the splitters are now producing for either MB or AP, to allow for future 'Moore's Law' growth as well as any changes the project is likely to make. The "if they are lower" is also a safety factor in case the server-side adjustments get really whacky or a really revolutionary processing breakthrough happens.

Basically the idea is to protect against the -177 "Maximum Elapsed time exceeded" error even when doing VLAR tasks on a GPU which is extremely fast at midrange but extremely crippled by VLARs, and assuming the server-side estimates and bounds adjustments won't be compatible with running at a low DCF.
                                                                  Joe
ID: 1014848 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1014929 - Posted: 12 Jul 2010, 7:28:38 UTC - in response to Message 1014848.  

A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0

V 0.6

Added user settable VLAR and VHAR limits.
Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset.

Any suggestions are welcome.

How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? That would be an alternative to applying the scaling factors to the bound while shifting work, and applied to all workunits for the SETI@home project (not just those being shifted). The value is much larger than any bound the splitters are now producing for either MB or AP, to allow for future 'Moore's Law' growth as well as any changes the project is likely to make. The "if they are lower" is also a safety factor in case the server-side adjustments get really whacky or a really revolutionary processing breakthrough happens.

Basically the idea is to protect against the -177 "Maximum Elapsed time exceeded" error even when doing VLAR tasks on a GPU which is extremely fast at midrange but extremely crippled by VLARs, and assuming the server-side estimates and bounds adjustments won't be compatible with running at a low DCF.
                                                                  Joe

As I have no idea what is going to happen. The rsc_fpops_bound and rsc_fpops_est are set at the calculate value, not the server value.
But only when a task is moved one way or the other.

For now I wait and see and tackle a problem when it's there.

TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1014929 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1014933 - Posted: 12 Jul 2010, 7:41:07 UTC - in response to Message 1014848.  


How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? Joe[/pre]

Why o why can't you do that on the sever side, until things stabilize.
That's way easier.
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1014933 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1014940 - Posted: 12 Jul 2010, 9:39:58 UTC - in response to Message 1014848.  


How about an option to simply set all rsc_fpops_bound values to 5e+17

I did some checking and implemented some code.
From my 860 tasks about 600 fall in this range.
So that means changing about all the WU from the set value by the Server.

TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1014940 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1015017 - Posted: 12 Jul 2010, 15:52:50 UTC - in response to Message 1014940.  


How about an option to simply set all rsc_fpops_bound values to 5e+17

I did some checking and implemented some code.
From my 860 tasks about 600 fall in this range.
So that means changing about all the WU from the set value by the Server.

All 860 tasks should be considerably less than 5e17, the largest possible value produced by the mb_splitter is 2.2275734e15 and all AP tasks have 1.821...e16. I deliberately chose a higher value to cover possible future changes such as S@H Enhanced moving to WUs with more data, etc.

I was thinking in terms of providing a better alternative to the technique of replacing all <rsc_fpops_bound> strings in a client_state.xml file with <rsc_fpops_bound>3 which some are using. That technique grows the bounds for all projects and tasks, and if used hourly on a host with a ten day cache the bound values would end up greater than 3.333e240 . If the host were also doing CPDN work the bound would overflow the range of a double long before a task completed. If used only once every two days the technique is probably safe, though.

Restricting changes to SETI@home, using a technique which doesn't grow the value indefinitely, and including it in a program which already has automatic periodic run capability are all improvements. But it wouldn't be sensible for anyone running stock builds on a multicore AMD host to use the option, since the stock S@H Enhanced builds have a tendency to hang in the "Choosing optimal functions" testing on those systems.

For that matter, there is some risk that any program may occasionally hang on any kind of system and that's why rsc_fpops_bound exists. Because BOINC must support many existing projects and should be prepared to support many more in future, I don't think Dr. Anderson would consider doing other than scaling the bound the same amount as the estimate. Just possibly he'd consider a sanity check approach which would establish a minimum bound value based on an estimated ten minutes or so. Outside that, he'd consider it a duty of the project to produce a sensible bound. For both S@H Enhanced and Astropulse, the splitter produces bounds which are exactly ten times the estimate if the host has a DCF of 1.0 but in practice many hosts have been running with lower DCF so the effective bound has been larger. I think Eric Korpela might consider raising the multiple moderately, but probably less than 50 times the estimate. I doubt it would be a crash priority change, though, so could take awhile.
                                                                  Joe

ID: 1015017 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1015021 - Posted: 12 Jul 2010, 16:03:45 UTC - in response to Message 1015017.  


All 860 tasks should be considerably less than 5e17, the largest possible value produced by the mb_splitter is 2.2275734e15 and all AP tasks have 1.821...e16. I deliberately chose a higher value to cover possible future changes such as S@H Enhanced moving to WUs with more data, etc.

I was thinking in terms of providing a better alternative to the technique of replacing all <rsc_fpops_bound> strings in a client_state.xml file with <rsc_fpops_bound>3 which some are using. That technique grows the bounds for all projects and tasks, and if used hourly on a host with a ten day cache the bound values would end up greater than 3.333e240 . If the host were also doing CPDN work the bound would overflow the range of a double long before a task completed. If used only once every two days the technique is probably safe, though.

Restricting changes to SETI@home, using a technique which doesn't grow the value indefinitely, and including it in a program which already has automatic periodic run capability are all improvements. But it wouldn't be sensible for anyone running stock builds on a multicore AMD host to use the option, since the stock S@H Enhanced builds have a tendency to hang in the "Choosing optimal functions" testing on those systems.

For that matter, there is some risk that any program may occasionally hang on any kind of system and that's why rsc_fpops_bound exists. Because BOINC must support many existing projects and should be prepared to support many more in future, I don't think Dr. Anderson would consider doing other than scaling the bound the same amount as the estimate. Just possibly he'd consider a sanity check approach which would establish a minimum bound value based on an estimated ten minutes or so. Outside that, he'd consider it a duty of the project to produce a sensible bound. For both S@H Enhanced and Astropulse, the splitter produces bounds which are exactly ten times the estimate if the host has a DCF of 1.0 but in practice many hosts have been running with lower DCF so the effective bound has been larger. I think Eric Korpela might consider raising the multiple moderately, but probably less than 50 times the estimate. I doubt it would be a crash priority change, though, so could take awhile.
                                                                  Joe


For now a made an expert check to raise all bounds to at least the value you suggested.
And only for the SETI enhanced bounds.
That will at least eliminate all -177 error.
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1015021 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1015042 - Posted: 12 Jul 2010, 17:28:44 UTC

Hi Fred,
Am using v0.6 on both my machines at the moment & works great so far. One small usability suggestion: When you run the program, and it is already running, it complains with a dialog. Could you open the first instance instead? (bringing it to open foreground window state). That would make things easier on my p4, where the tray icons don't always appear at startup for some time (even though the tasks are running properly).

Cheers, Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1015042 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1015069 - Posted: 12 Jul 2010, 18:44:09 UTC - in response to Message 1015042.  

Hi Fred,
Am using v0.6 on both my machines at the moment & works great so far. One small usability suggestion: When you run the program, and it is already running, it complains with a dialog. Could you open the first instance instead? (bringing it to open foreground window state). That would make things easier on my p4, where the tray icons don't always appear at startup for some time (even though the tasks are running properly).

Cheers, Jason

That's an interesting problem and a good idea. Will not be in the next version that's nearly ready.
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1015069 · Report as offensive
Profile Tim Norton
Volunteer tester
Avatar

Send message
Joined: 2 Jun 99
Posts: 835
Credit: 33,540,164
RAC: 0
United Kingdom
Message 1015141 - Posted: 12 Jul 2010, 20:41:28 UTC

Fred

i am also using your 0.6 app and its working fine (win7 64bit) (also using your BoincTasks - also works well)

0.6 even allows you to move part run wu back and forth between cpu and gpu without problem - cool - not that you would do this a lot!

One thing i have noticed is that after you run the reschedule your app runs up boinc in the background - i.e. its "minimised" and not shown in the task try - could this be changed or put as an option - boinc runs up fine when you manually run it :)

from looking at the log it appears your app would shut down boinc if it was live before the changes are made - not tried this as i manually shut boinc down before i reschedule

One other thing that would be nice is to be able to set the app to run up with windows even if you are not going to put it in automatic mode in the settings options
Tim

ID: 1015141 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1015286 - Posted: 13 Jul 2010, 5:57:48 UTC - in response to Message 1015141.  

Fred

i am also using your 0.6 app and its working fine (win7 64bit) (also using your BoincTasks - also works well)

0.6 even allows you to move part run wu back and forth between cpu and gpu without problem - cool - not that you would do this a lot!

1) One thing i have noticed is that after you run the reschedule your app runs up boinc in the background - i.e. its "minimised" and not shown in the task try - could this be changed or put as an option - boinc runs up fine when you manually run it :)

2)from looking at the log it appears your app would shut down boinc if it was live before the changes are made - not tried this as i manually shut boinc down before i reschedule

One other thing that would be nice is to be able to set the app to run up with windows even if you are not going to put it in automatic mode in the settings options

1) I'm not exactly get what you're asking, but there is one option in the settings tab that allows you to show or hide the rescheduler when starting.
2) It shuts down BOINC before the rescheduling starts. It needs a stable capture.
3) I will add an auto start at login, in one of the next versions.
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1015286 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1015292 - Posted: 13 Jul 2010, 6:48:48 UTC - in response to Message 1015286.  

New version 0.7

Added: Expert tab: A check to enable limiting the rsc_fpops_bound value to 5e17. To avoid -177 errors
Added: Expert tab: Option to include / exclude active tasks from rescheduling.
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1015292 · Report as offensive
Profile [DPC] hansR Project Donor
Volunteer tester
Avatar

Send message
Joined: 14 Jul 00
Posts: 47
Credit: 235,829,569
RAC: 8
Netherlands
Message 1015314 - Posted: 13 Jul 2010, 11:42:46 UTC - in response to Message 1015292.  

After checking the expert tab the following messages start appearing:

13 July 2010 - 13:31:29 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:29 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:29 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:29 Rescheduling needed

13 July 2010 - 13:31:29 Shutting down BOINC client
13 July 2010 - 13:31:31 Shutdown of BOINC client completed
13 July 2010 - 13:31:32 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:32 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:32 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:33 Rescheduling CPU version: 603 ,Gpu version: 608 planclass: cuda
13 July 2010 - 13:31:34 Copied rescheduled client_state.xml
13 July 2010 - 13:31:34 Starting the BOINC client
13 July 2010 - 13:31:34 Moved to Cpu: 0, VLAR 0, VHAR 0 -- Moved to Gpu: 0, VLAR 0, VHAR 0

The next run reports the same total, so it seems nothing was changed ?
ID: 1015314 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 1015326 - Posted: 13 Jul 2010, 13:06:05 UTC - in response to Message 1015314.  

After checking the expert tab the following messages start appearing:

13 July 2010 - 13:31:29 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:29 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:29 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:29 Rescheduling needed

13 July 2010 - 13:31:29 Shutting down BOINC client
13 July 2010 - 13:31:31 Shutdown of BOINC client completed
13 July 2010 - 13:31:32 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509
13 July 2010 - 13:31:32 Found: CPU: 452, VLAR: 74, VHAR: 47
13 July 2010 - 13:31:32 Found: GPU: 57, VLAR: 0, VHAR: 1
13 July 2010 - 13:31:33 Rescheduling CPU version: 603 ,Gpu version: 608 planclass: cuda
13 July 2010 - 13:31:34 Copied rescheduled client_state.xml
13 July 2010 - 13:31:34 Starting the BOINC client
13 July 2010 - 13:31:34 Moved to Cpu: 0, VLAR 0, VHAR 0 -- Moved to Gpu: 0, VLAR 0, VHAR 0

The next run reports the same total, so it seems nothing was changed ?

The problem is that the memory copy was not updated, the BOINC state file is updated correctly. Nothing wrong there.
But that is not as it should be and a bit confusing. I will correct this in V 0.8 ... expect it soon.
TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking.
ID: 1015326 · Report as offensive
1 · 2 · 3 · 4 . . . 11 · Next

Message boards : Number crunching : New rescheduler


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.