Message boards :
Number crunching :
GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU
Message board moderation
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 37 · Next
Author | Message |
---|---|
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Just to report that BOINC Rescheduler from eFMer also have problem with SETI@home Beta: 01 October 2016 - 18:23:13 BoincRescheduler V: 2.7 01 October 2016 - 18:23:18 SETI@home v8 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_sah 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_cat132 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_nocal 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_SoG 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_SoG_cat132 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_SoG_nocal 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_atiapu_sah 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_atiapu_SoG 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati_sah 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati_cat132 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati_nocal 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 800 01 October 2016 - 18:23:18 Application setiathome_v8 found, Version: 819 , Plan class: opencl_ati5_sah 01 October 2016 - 18:23:18 Found: CPU: 0, VLAR: 0, VHAR: 0 01 October 2016 - 18:23:18 Found: GPU: 87, VLAR: 58, VHAR: 0 01 October 2016 - 18:23:18 Average: Ratio Gpu: 9.190836, count: 87, invalid 0 01 October 2016 - 18:23:18 Ratio CPU: min: -1.000000 max: -1.000000 - Ratio Gpu: min: 8.553368 max 9.820253 01 October 2016 - 18:23:18 Rescheduling needed 01 October 2016 - 18:24:51 Other 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_sah 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_cat132 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_nocal 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_SoG 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_SoG_cat132 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati5_SoG_nocal 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_atiapu_sah 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_atiapu_SoG 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati_sah 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati_cat132 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 812 , Plan class: opencl_ati_nocal 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 800 01 October 2016 - 18:24:52 Application setiathome_v8 found, Version: 819 , Plan class: opencl_ati5_sah 01 October 2016 - 18:24:52 Average: Ratio Gpu: 9.190836, count: 87, invalid 0 01 October 2016 - 18:24:52 Ratio CPU: min: -1.000000 max: -1.000000 - Ratio Gpu: min: 8.553368 max 9.820253 01 October 2016 - 18:24:52 No rescheduling needed [Other] tab have in the two boxes: http://setiweb.ssl.berkeley.edu/beta/ setiathome_v8 At that moment BoincTasks showed 87 tasks from SETI@home + 2 tasks from SETI@home Beta As you see from the BOINC Rescheduler log - it didn't find the 2 Beta tasks and instead "found" the same info for both projects "mixing" Version: 819 (which is only for Beta) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1855 Credit: 268,616,081 RAC: 1,349 |
Hi! I check for BOINC V7, and for the presence of GUPPIRescheduler, but do not specifically check for any Windows version. If you are successfully running Kevvy's 0.51, I would expect that QOpt will work just fine for you. Can't say for sure, but I have high confidence you'll be fine. I'm pretty sure that even if it doesn't work, it can't do any damage to try. Please let me know here how it goes! [edit] After a quick check, I would expect that Server 2008 r2 or later would work just fine. So Server 2012 should be no problem at all. And Yes, that means that if someone is running Kevvy 0.51 successfully on WinXP, the possibility exists that QOpt will function there as well. Not saying I'll support it if it fails, especially as I killed my last XP cruncher about a year ago, but the possibility does exist. I haven't seen any reason to explore whether all the things I am doing are supported that far back. [/edit] |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1855 Credit: 268,616,081 RAC: 1,349 |
Just to report that BOINC Rescheduler from eFMer also have problem with SETI@home Beta: Haven't looked at Fred's Rescheduler, as I thought it was a dead product. Wonder, would there be any value to add support for it to QOpt, or does he have the features I add already included? |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Fred's Rescheduler is GUI, no need for front-end I thought it was a dead product. I posted how to "revive" it here: http://setiathome.berkeley.edu/forum_thread.php?id=79954&postid=1810090#1810090 Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0 |
Jim, something went wrong. I tested qopt on another cruncher with windows 10 and everything was ok, attended and unattended mode. I then run qopt -u -l on the main cruncher..and something went wrong : on BoincManager I got a connection error popup : "The password you have provided is incorrect, please try again" Now BoincMgr has no project no task. I thought that adding Seti Project again should have fixed, but with Tools -> Add project the popup shows again. here are the logs : Scratch Log : Mr. Kevvy's GUPPI Rescheduler v0.51 - (c)2016 Kevin Dorner Reading configuration files... Found sched_request GPU platform=windows_intelx86 app_version=8 version_num=812 CPU platform=windows_x86_64 app_version=6 version_num=800 and GPU plan_class=opencl_nvidia_SoG Searching for and moving workunits in client state... Writing updated configuration client_state.xml... Done: 15 non-GUPPI workunits moved to GPU and 15 GUPPI workunits moved to CPU. 7 workunits not moved as they were in progress. Qopt log : QOpt Version 1.02g_x64 05/10/2016 13:53:06,16 on WIN2012ST Microsoft Windows [Version 6.3.9600] Program command line was: C:\ProgramData\BOINC\QOpt.exe -u -l 1.a. Checking the following processes: - boinccmd.exe is NOT running - boincmgr.exe is running - boincmgr.exe shutdown NOT requested - boinctray.exe is running - boinctray.exe shutdown requested - boinctray.exe shutdown confirmed - boinctasks64.exe is running - boinctasks64.exe shutdown NOT requested b. Boinc client shutdown requested - Boinc client shutdown confirmed c. Creating backup file: C:\ProgramData\BOINC\client_state_backup.xml - backup of: C:\ProgramData\BOINC\client_state.xml completed d. Comparing client_state_backup.xml to client_state.xml in C:\ProgramData\BOINC\ - client_state.xml is identical to client_state_backup.xml. 2. Starting GUPPIRescheduler.exe -------------------------------------------------------- GUPPIRescheduler.exe -b invoked Mr. Kevvy's GUPPI Rescheduler v0.51 - (c)2016 Kevin Dorner Reading configuration files... Found sched_request GPU platform=windows_intelx86 app_version=8 version_num=812 CPU platform=windows_x86_64 app_version=6 version_num=800 and GPU plan_class=opencl_nvidia_SoG Searching for and moving workunits in client state... Writing updated configuration client_state.xml... Done: 15 non-GUPPI workunits moved to GPU and 15 GUPPI workunits moved to CPU. 7 workunits not moved as they were in progress. -------------------------------------------------------- - GUPPIRescheduler.exe completed 3. Restarting Boinc processes - Boinc Client startup requested - boinc.exe startup confirmed QOpt completed 05/10/2016 13:53:21,57 on WIN2012ST ================================================================ Please advise.... :-( |
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0 |
After retrying again to restart BoincManager, it is now working again. Could it be a "timeout" error of some sort ? I'm currently running 15 CPU tasks and 15 GPU tasks...have you ever tested it on a machine with such high number of tasks..? I never had this popup before, so, it must be something related to Qopt....but anyway, at least something went wrong, but not horribly wrong.... :-D A. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
But in life you either have too little time or too little money. . . Funny about that :) . . Gives a person lots of time. Stephen . |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1855 Credit: 268,616,081 RAC: 1,349 |
I have seen this error a time or two, and keep it on my outstanding issues log. The fix is simple. Open BOINC Manager Then -> File ---> Select Computer -----> Hostname: localhost -----> Password: (should autocomplete when you enter host name) Will reconnect and all is good. - or - Just shut down and then restart BOINC Manager will usually solve. Just to clarify, BOINC.exe is the client tha does the actual crunching, BOINCManager is just a control shell that allows users to control and manage the client. It uses the same interface into the client that BOINCTasks and other simni;ar programs use. I have had this happen 2-3 times on a couple Win10 machines, but it's not common and I can't yet explain what the deal is. However, I think you will find the BOINC Client has been running and crunching, just not with the Manager connected and watching. If you end up with this happening each time QOpt runs, or frequently enough that it becomes annoying, try adding a -m to the command line. This will shut down and then restart the Manager each run, though this is not usually needed or recommended. Please let me know how you do. Thanks! |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1855 Credit: 268,616,081 RAC: 1,349 |
After retrying again to restart BoincManager, it is now working again. Number of tasks shouldn't really matter, given Manager's nature. As I mentioned, it's just a command shell. As for being related to QOPt, yes and no, I would say. I'm sure you've noticed that when exiting Manager, you get asked if you want to stop running tasks as well. Also, there's a place where you set it up as to whether Manager should start the client when it loads. (Options, other Options) A lot of folks just leave it set to where Manager starts and stops client as it is started and stopped, but there's nothing that requires you to do so. I don't actually use Manager that much, as I run BoincTasks to manage the 5 crunchers I have here. I do have Manager auto-start and load the client on boot-up, but do not let it shut it down, normally. As I mentioned, I have found this problem to be very intermittent. If it is consistent for you, that might give me an avenue to further troubleshoot. Jim ... |
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0 |
Jim thank you, I've started Qopt a couple of times again this evening and everything went smoothly, so the intermittent nature of the problem is confirmed on Win 2012 as well. Did install Qopt on other two Windows 10 crunchers, and had a rough time with Windows defender regularly deleting it, tho... Anyway, my main cruncher is at your disposal for testing and troubleshooting, Jim. Thank you A. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1855 Credit: 268,616,081 RAC: 1,349 |
Thanks, glad it's working for you, and I hope you find performance improvement and a reduction in hassle. Defender was seriously irritating me as well. Finally realized I had entered the directories into the exclusion list using their network names, but on the machine where the drive lives I had forgotten the local drive name. I did open a report with M$ on the false positive, but we know that will go no where. On the machines that only crunch, I think I'll just shut it all off. No real sense running an AV on a machine that I never even load the browser on, let alone do anything except run SETI and that lives behind a firewall. If you didn't notice it, there's a log file clean-up called bak_log.bat in the package that's been working nicely for me as well. I put that in Task Scheduler at 12:01 daily, with a run-once. Then I have QOpt in the scheduler as described in the read_me. For me at least, every 4 hours seems to be the sweet spot. Once I got the file path handling squared away, the only schedule failure I had was when Defender nuked QOpt.exe, as I described above. Jim ... |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1855 Credit: 268,616,081 RAC: 1,349 |
I am using I believe the latest version of the rescheduler, running every 30 minutes, is this the one that you are referring to? Don't know what capabilities efMer's had. I really need to check it out. Kevvy's current RS is a nice start, but only addresses one of several possible needs. Specifically (doing this from memory) it will move non-VLAR work from the CPU queue to the GPU queue, and move up to but no more than the same number of VLAR/GUPPIs from the GPU queue back to the CPU. As fast as things move through the GPUs, any imbalance that results is essentially self-healing. Another thing that could be worth addressing would be relocating APs from CPU to GPU queue. Going back to the message that started this discussion, none of this addresses a case where there's enough of an outage that a CPU or GPU queue runs dry while its counterpart still has work. That might well be worthwhile to develop. I'm going to step back onto the soapbox I was on a while ago with this. My traffic engineering background tells me one solution is to move the decision-making process for these matters from the servers to the client, and let folks configure that as they will, within certain limits. We have some hard limits we deal with, like 100 tasks per CPU max, and 100 tasks per GPU max, reduced from those maximums by user configurable such as max days work, etc. These limits do not take into consideration such things as the number of Cores/Threads per CPU, nor the number of Compute Units per GPU, when setting these limits. Perhaps they should, but who is to do that type of coding work, even assuming that approval could be granted, at the server level? The current RS programs can do some things without blowing up the world, and perhaps could do more. Currently the server looks at compute assets (for lack of a better term, including both CPU and GPU assets) as a queue and stacks work into them. What would be more effective would be to logically stack that incoming work into queues by work type at the client, then allow the client to determine based on its locally defined and enforced rules which compute assets should be primary, secondary, or backup choices for each work type. The queues would remain balanced by the RS "promoting" tasks as needed to maintain a reasonable balance that wouldn't give the SETI servers heartburn. The best balance would include using the provided estimated time remaining on tasks to keep CPU and GPU queues balanced in terms of work left to do or, at a minimum, start moving work if either queue is approaching depletion. Checks and balances need to exist to prevent such things as bypassing limits on maximum number of tasks per client or wholesale aborting of "low-paying" work in favor of the "sexy" stuff. I won't argue the "morality" or "ethics" of those choices, as the argument is irrelevant. If the ability to RS work at the client makes the "powers that be" excessively concerned, the whole effort will get squashed like a bug, as it should. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I forget what the implications are from running GUPPI Rescheduler IF AP tasks are present. I remember something being said about this and Jim put in a flag (Qopt -a) to halt processing. What is the issue? Just trying to remember, and not wanting to blow away my cache. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1855 Credit: 268,616,081 RAC: 1,349 |
I forget what the implications are from running GUPPI Rescheduler IF AP tasks are present. I remember something being said about this and Jim put in a flag (Qopt -a) to halt processing. If I recall correctly, Brent, there was an issue with APs that Kevvy identified on GR 0.50 and resolved in his GR 0.51 release (one reason I list that as the minimum supported rls for QOPt). I put the flag in out of an excess of caution, and in case future issues came up and a quick troubleshooting tool was needed. To my knowledge, there is no issue at this point. I know I've been sucking up and running as many APs as I can get, and QOpt is running here every 4 hours regardless. Haven't lost a single WU to anything like that (or anything else, for that matter:). Jim ... |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Funny thing, I just received 4 AP tasks today for the CPU and was thinking that very thing. They would process so much faster on the GPU and free up lots of time to crunch Guppis on the CPU. Stephen . |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Checks and balances need to exist to prevent such things as bypassing limits on maximum number of tasks per client or wholesale aborting of "low-paying" work in favor of the "sexy" stuff. I won't argue the "morality" or "ethics" of those choices, as the argument is irrelevant. If the ability to RS work at the client makes the "powers that be" excessively concerned, the whole effort will get squashed like a bug, as it should. That's really the crux of the matter right there. I think that rescheduling had gotten something of a bad rep a while back due to a small number of folks using it to manipulate their queues in order to stockpile excessive quantities of APs, since those tasks "paid" more. Then the topic pretty much faded into the background until guppi VLARs came around. Now, the issue for some of us is primarily about performance, processing whatever is sent to us while trying to make the most efficient use of our devices. For others, however, it's still all about the RAC, and guppi VLARs simply don't "pay" as well the non-VLARs from Arecibo. This leads to the sort of mass aborting, forced "ghosting" and other irresponsible behavior that we've seen from certain individuals. Once again, that sort of thing probably threatens to give the whole rescheduling concept, in whatever form, a bad rep. That's unfortunate, given that the sort of "rescheduling" those folks are engaged in doesn't actually need a rescheduler program, just some technical knowledge about how the task queue is structured and maintained. I honestly don't have a clue as to how any sort of client-side controls could be established to block irresponsible queue manipulation as long as the primary repository for the task info is an easily editable pseudo-XML file. Changing that mechanism would probably have to be the first order of business for any queue management changes, and I really can't imagine how that would ever come about. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1855 Credit: 268,616,081 RAC: 1,349 |
I honestly don't have a clue as to how any sort of client-side controls could be established to block irresponsible queue manipulation as long as the primary repository for the task info is an easily editable pseudo-XML file. Changing that mechanism would probably have to be the first order of business for any queue management changes, and I really can't imagine how that would ever come about. Yeah, don't see it happening. I mean, this is all open source. Anything one can put in, another can take out absence checks and balances that would take scarce resources away from more important work. Perhaps a little social shaming is in order when you see someone's tasks page and there are consistently more aborted than completed WUs? Or 5000 WUs on hardware that should be limited to 1000 ... Dunno ... One reason I got involved in this, aside from the technical challenge I've enjoyed, was to perhaps see if I could stand up and make the case that this can be done responsibly rather than just throwing things around in all directions looking to see what sticks to the wall. Chaos and Crusades don't do much for me :) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Did somebody say....Crusades? |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1855 Credit: 268,616,081 RAC: 1,349 |
Did somebody say....Crusades? Got one cruising the front yard daily. One of these days, I'll get the pellet gun ... :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Did somebody say....Crusades? . . Hey man, . . I used to love that cartoon :) Stephen Showing our age though. . |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.