Message boards :
Number crunching :
New rescheduler
Message board moderation
Author | Message |
---|---|
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0 V 0.6 Added user settable VLAR and VHAR limits. Automatic duration_correction_factor correction. When the factor goes beyond the min and max values it is reset. Any suggestions are welcome. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
MadMaC Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 |
Nice one - just out of interest - does it work with fermi? I only ask as the current rescheduler 1.9 doesn't support fermi, and takes some hacking of the app info to get it to work - this might put some people off so if you could get it working with fermi that would be a big win imho.. Thanx for your efforts in this.. |
MadMaC Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 |
Forgot to mention Good luck tonight :-) |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
Nice one - just out of interest - does it work with fermi? I haven't tested it but it should. Check the "Simulation mode" in the expert tab. Press "Run" Check the C:\Users\fred\AppData\Roaming\eFMer\BoincScheduler\capture\client_state.xml If the client state looks ok. Remove the simulating check. 1) Stop the BOINC client 2) Unplug any Internet connections. 3) Make a complete copy of the BOINC folder. 4) Press Run Check if no tasks are removed. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
sarmitage Send message Joined: 2 Dec 09 Posts: 56 Credit: 1,123,857 RAC: 0 |
Any chance of getting the source code for this? I've got a Fermi card and can do the testing when I get a chance, but I am generally much more confident when I can dig in and see exactly what the tool is /supposed/ to do, and compare it to what it actually did.. -Scott |
Bearcat Send message Joined: 10 Sep 99 Posts: 106 Credit: 10,778,506 RAC: 0 |
Thanks for taking the time to do this. This will be a valuable tool for the foreseeable future while they sort things out with the DCF etc. |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Seems to work OK for me, but it has not had to do any actual rescheduling yet so I won't know till I get some VLAR's to try it out. A very minor grizzle, In the "Expert" tab. It should be "BOINC_DATA" folder, not "BOINC" folder. Thanks for your efforts. Brodo |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0 How about an option to simply set all rsc_fpops_bound values to 5e+17 if they are lower? That would be an alternative to applying the scaling factors to the bound while shifting work, and applied to all workunits for the SETI@home project (not just those being shifted). The value is much larger than any bound the splitters are now producing for either MB or AP, to allow for future 'Moore's Law' growth as well as any changes the project is likely to make. The "if they are lower" is also a safety factor in case the server-side adjustments get really whacky or a really revolutionary processing breakthrough happens. Basically the idea is to protect against the -177 "Maximum Elapsed time exceeded" error even when doing VLAR tasks on a GPU which is extremely fast at midrange but extremely crippled by VLARs, and assuming the server-side estimates and bounds adjustments won't be compatible with running at a low DCF. Joe |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
A new BOINC rescheduler can be found here: http://www.efmer.eu/forum_tt/index.php?topic=428.0 As I have no idea what is going to happen. The rsc_fpops_bound and rsc_fpops_est are set at the calculate value, not the server value. But only when a task is moved one way or the other. For now I wait and see and tackle a problem when it's there. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
Why o why can't you do that on the sever side, until things stabilize. That's way easier. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
I did some checking and implemented some code. From my 860 tasks about 600 fall in this range. So that means changing about all the WU from the set value by the Server. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
All 860 tasks should be considerably less than 5e17, the largest possible value produced by the mb_splitter is 2.2275734e15 and all AP tasks have 1.821...e16. I deliberately chose a higher value to cover possible future changes such as S@H Enhanced moving to WUs with more data, etc. I was thinking in terms of providing a better alternative to the technique of replacing all <rsc_fpops_bound> strings in a client_state.xml file with <rsc_fpops_bound>3 which some are using. That technique grows the bounds for all projects and tasks, and if used hourly on a host with a ten day cache the bound values would end up greater than 3.333e240 . If the host were also doing CPDN work the bound would overflow the range of a double long before a task completed. If used only once every two days the technique is probably safe, though. Restricting changes to SETI@home, using a technique which doesn't grow the value indefinitely, and including it in a program which already has automatic periodic run capability are all improvements. But it wouldn't be sensible for anyone running stock builds on a multicore AMD host to use the option, since the stock S@H Enhanced builds have a tendency to hang in the "Choosing optimal functions" testing on those systems. For that matter, there is some risk that any program may occasionally hang on any kind of system and that's why rsc_fpops_bound exists. Because BOINC must support many existing projects and should be prepared to support many more in future, I don't think Dr. Anderson would consider doing other than scaling the bound the same amount as the estimate. Just possibly he'd consider a sanity check approach which would establish a minimum bound value based on an estimated ten minutes or so. Outside that, he'd consider it a duty of the project to produce a sensible bound. For both S@H Enhanced and Astropulse, the splitter produces bounds which are exactly ten times the estimate if the host has a DCF of 1.0 but in practice many hosts have been running with lower DCF so the effective bound has been larger. I think Eric Korpela might consider raising the multiple moderately, but probably less than 50 times the estimate. I doubt it would be a crash priority change, though, so could take awhile. Joe |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
For now a made an expert check to raise all bounds to at least the value you suggested. And only for the SETI enhanced bounds. That will at least eliminate all -177 error. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Hi Fred, Am using v0.6 on both my machines at the moment & works great so far. One small usability suggestion: When you run the program, and it is already running, it complains with a dialog. Could you open the first instance instead? (bringing it to open foreground window state). That would make things easier on my p4, where the tray icons don't always appear at startup for some time (even though the tasks are running properly). Cheers, Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
Hi Fred, That's an interesting problem and a good idea. Will not be in the next version that's nearly ready. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
Tim Norton Send message Joined: 2 Jun 99 Posts: 835 Credit: 33,540,164 RAC: 0 |
Fred i am also using your 0.6 app and its working fine (win7 64bit) (also using your BoincTasks - also works well) 0.6 even allows you to move part run wu back and forth between cpu and gpu without problem - cool - not that you would do this a lot! One thing i have noticed is that after you run the reschedule your app runs up boinc in the background - i.e. its "minimised" and not shown in the task try - could this be changed or put as an option - boinc runs up fine when you manually run it :) from looking at the log it appears your app would shut down boinc if it was live before the changes are made - not tried this as i manually shut boinc down before i reschedule One other thing that would be nice is to be able to set the app to run up with windows even if you are not going to put it in automatic mode in the settings options Tim |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
Fred 1) I'm not exactly get what you're asking, but there is one option in the settings tab that allows you to show or hide the rescheduler when starting. 2) It shuts down BOINC before the rescheduling starts. It needs a stable capture. 3) I will add an auto start at login, in one of the next versions. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
New version 0.7 Added: Expert tab: A check to enable limiting the rsc_fpops_bound value to 5e17. To avoid -177 errors Added: Expert tab: Option to include / exclude active tasks from rescheduling. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
[DPC] hansR Send message Joined: 14 Jul 00 Posts: 47 Credit: 235,829,569 RAC: 8 |
After checking the expert tab the following messages start appearing: 13 July 2010 - 13:31:29 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509 13 July 2010 - 13:31:29 Found: CPU: 452, VLAR: 74, VHAR: 47 13 July 2010 - 13:31:29 Found: GPU: 57, VLAR: 0, VHAR: 1 13 July 2010 - 13:31:29 Rescheduling needed 13 July 2010 - 13:31:29 Shutting down BOINC client 13 July 2010 - 13:31:31 Shutdown of BOINC client completed 13 July 2010 - 13:31:32 Invalid rsc_fpops_bound < 500000000000000000.000000, total: 509 13 July 2010 - 13:31:32 Found: CPU: 452, VLAR: 74, VHAR: 47 13 July 2010 - 13:31:32 Found: GPU: 57, VLAR: 0, VHAR: 1 13 July 2010 - 13:31:33 Rescheduling CPU version: 603 ,Gpu version: 608 planclass: cuda 13 July 2010 - 13:31:34 Copied rescheduled client_state.xml 13 July 2010 - 13:31:34 Starting the BOINC client 13 July 2010 - 13:31:34 Moved to Cpu: 0, VLAR 0, VHAR 0 -- Moved to Gpu: 0, VLAR 0, VHAR 0 The next run reports the same total, so it seems nothing was changed ? |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
After checking the expert tab the following messages start appearing: The problem is that the memory copy was not updated, the BOINC state file is updated correctly. Nothing wrong there. But that is not as it should be and a bit confusing. I will correct this in V 0.8 ... expect it soon. TThrottle Control your temperatures. BoincTasks The best way to view BOINC. Anza Borrego Desert hiking. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.