Message boards :
Number crunching :
GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 37 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
This is puzzling me also. I have been having similar issues on one of my machines with it discarding phantom AP CPU tasks after running the rescheduler. I do have the proper AP CPU app installed in my app_info. I also currently have real 2 AP CPU tasks on board. I will process them normally with no issues. I have also done in the past. This is what my error log shows when I restart after a reschedule. Pipsqueek 11 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 12 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 13 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 14 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 15 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 16 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 17 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 18 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 19 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 20 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 21 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 22 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 23 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 24 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 25 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 26 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 27 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 28 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 29 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding 30 SETI@home 8/17/2016 2:09:50 PM [error] No application found for task: windows_x86_64 703 sse2; discarding I have mentioned this to Mr. Kevvy and Stubbles already but they haven't had any insight yet into why this is happening occasionally. Anyone have ideas? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Whew, at least I'm not the only one! ;-) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
-high_prec_timer quite experimental feature for now. . . I tried Zalster's suggestion and disabled -high_prec_timer and reduced -tt to 500 but lockups still bad, maybe worse. So for now I am returning to r3430, better the devil you know :). Lockups were a daily problem there but not an hourly one :) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
OK, here's what I think has happened. Your app_info.xml file is absolutely fine, and looks like it has been assembled from my installer. It will handle the following types of task: Astropulse for CPU ... v703 ... v701 ... v700 Astropulse for NVidia ... v710 ... v705 Multibeam for CPU ... v800 Multibeam for NVidia ... v812 ... v800 All of which match - as they should - the version numbers on offer from the server, as shown on the applications page for Windows. Plus a couple of older versions (701 and 705 for AP) designed to cover the case of people upgrading from older versions of the installer. So you're able to cope with anything sent by the server as stock, and anything you download yourself. So far, so good. But wait... Look at All tasks for computer 8012837. You have just 7 AP tasks listed, and filtering that down, only four are active - all assigned by the server as "AstroPulse v7 Anonymous platform (NVIDIA GPU)". But a little while ago, in message 1809987, you posted a list of 67 tasks with "No application found for task: windows_x86_64 710 opencl_nvidia_100". That's an AP version number and plan class, but no way do you have 67 Astropulse tasks loaded. So I looked at the source code for Mr. Kevvy's rescheduler. He properly swaps CPU and GPU plan_classes, version numbers and platforms using these variables string app_versionGPU = ""; // The app number (in text form) of the GPU plan_class string app_versionCPU = ""; // The app number (in text form) of CPU app string platformGPU = ""; // Platform name of the GPU app string platformCPU = ""; // Platform name of the CPU app string version_numGPU = ""; // Version number (in text form) of the GPU app string version_numCPU = ""; // Version number (in text form) of the CPU But note that there is only room for one of each - and the word "Astropulse" appears nowhere in his source code or comments. So, my strong suspicion is that the rescheduler scans the input files for version numbers, and latches on to the first one it finds - without noticing whether it belongs to an <app_name>setiathome_v8 or an <app_name>astropulse_v7 He pays some attention to "<app_version>n" in sched_request_setiathome.berkeley.edu.xml, but appears not to retrieve the full app_version structure from earlier in the file, where the full detail is available, as <app_version> <app_name>setiathome_v8</app_name> <version_num>800</version_num> <platform>windows_x86_64</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>4706468289.867001</flops> <api_version>7.5.0</api_version> </app_version> - the 'n' in <other_results> is simply an index into a list of those structures, and they need to be checked too. Altogether, it looks as if the rescheduler is currently capable of assigning an AP <version_num> to an <app_name>setiathome_v8</app_name> task, with the results that Al and Keith have reported. I'd suggest that you don't use the rescheduler if you have AP tasks in your cache, until Mr. Kevvy has had a chance to consider and respond to these comments. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Not normal for me since I have chosen to not run CPU work units. I have run some in the past. . . Umm you need both a GPU cache and a CPU cache to "swap" tasks between them. It is a rescheduler not a task remover. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Then how do you expect to Reschedule tasks GPU <--> CPU if you "have chosen to not run CPU work units"?? . . That is the function but you have to be running CPU tasks to have a CPU cache to move them to. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Then how do you expect to Reschedule tasks GPU <--> CPU if you "have chosen to not run CPU work units"?? . . I take it that requires "manually" installing the CPU handler app while your SETI preferences say to not send CPU work. I need to learn how to do that :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Wouldn't that require that the CPU app already be installed, either as stock (which would seem unlikely if CPU work wasn't selected), or through Anonymous Platform and the app_info.xml? . . Lunatics, that would be worth a try. That would install the CPU handler app and then you could move to the CPU cache. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Richard, thanks so much for your insight and concise analysis. I think you explained it very well. It doesn't quite explain for me why I have seen this problem ONLY on Pipsqueek. I have never seen the issue on Numbskull or Keith-Windows7 which also have both CPU/GPU AP tasks on board currently. In both machines, there were AP tasks before and after I ran the rescheduler on them within a ten minute window of rescheduling on Pipsqueek. They have been either lucky or there is some obscure differences in client_state or sched_request between the three machines. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Does "latches on to the first one it finds" (or that might be the last one - I didn't check the search logic) cover it? In other words, it grabs an AP number only if AP is the oldest WU in the cache, or alternatively if it's the most recent download? Either case would be pretty rare in the current state of AP splitting. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Well, my previous statement is no longer valid. I had one AP CPU task left on Keith-Windows7 just now. I had finished up the previous balance of GPU AP tasks. I just ran the rescheduler again and I just dumped 17 phantom AP CPU tasks on that machine. So, you are probably correct in that it depends on how old a task is sitting in client_state or something. I agree with your assessment now that you should not run the rescheduler on any machine with AP work on board. We will have to wait it out or you will dump work. Let's hope Mr. Kevvy can make some adjustments in the app to accommodate AP work besides MB work. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
Haven't been watching this thread much but was advised of issues... The app. doesn't check for AP tasks but does the opposite: the MB mover checks for the task name starting with ##xx two digit-day and month: if (isdigit(client_state[currentposition])) { if (isdigit(client_state[currentposition + 1])) { if (isalpha(client_state[currentposition + 2])) { if (isalpha(client_state[currentposition + 3])) { And the GUPPI mover checks for the WU name task name starting with "blc": if ( client_state.substr(currentposition, 3) == "blc" ) { So it should never touch AP work as neither would match as they start with "ap_" I apologize if the app. doesn't work for anyone (especially if it drops work units) but I am unsure why it does this, as it worked for me the Windows and several Linux machines I tested it and continue to use it on. I think this is due to BOINC using multiple platforms simultaneously for the same type of work for some people's builds. (I even ran it on the client_state files that people who had issues sent me, and it worked on them, so I'm still in the dark as to the cause.) As noted in the readme, back up your client_state.xml before first use, and hopefully if it works the first time it will keep working if you don't go and change platforms ie by installing a third-party client. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I can supply my backed-up client_state files that had tasks in it that were discarded on two machines if that helps. Not running any third party apps, only the official ones that install with the Lunatics installers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Haven't been watching this thread much but was advised of issues... I think that perhaps what Richard was getting at was not so much that the actual identification of tasks to be rescheduled has a problem, but rather that the initial identification of app version and plan class might have a problem when the first task that is found in the scheduler request file happens to be an AP. In other words, could: currentposition = sched_request.find("</app_version>\n <plan_class>"); // Now do the same as the block above but for app_versionGPU result in the extraction of the app_version (and, subsequently, the plan class) for an AP instead of for an MB? I don't know that you're checking for APs at that point. Just a thought. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
I think that perhaps what Richard was getting at was not so much that the actual identification of tasks to be rescheduled has a problem, but rather that the initial identification of app version and plan class might have a problem when the first task that is found in the scheduler request file happens to be an AP. In other words, could: Thanks... I will have a look at that. Will have time this weekend to check it over in detail (thankfully no plans for a change.) |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
 Until this is fixed: boinc_rescheduler_2_7.zip still exists: http://www.efmer.eu/download/boinc/scheduler/boinc_rescheduler_2_7.zip How to patch for SETI@home v8: http://setiathome.berkeley.edu/forum_thread.php?id=77586&postid=1763581#1763581 http://setiathome.berkeley.edu/forum_thread.php?id=77586&postid=1763938#1763938 Tool to use: HxD - Freeware Hex Editor https://mh-nexus.de/en/hxd/ On this Downloads page: https://mh-nexus.de/en/downloads.php?product=HxD Find (Ctrl+F): 94e57a52e4d3eca6576bc15a99e884b6cdd5b03a ... to easy find links for "HxD, English - portable 1.7.7.0"   - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
 Thanks for such clear instructions. WARNING to others who will try this way: 18 August 2016 - 10:17:37 Seti MB v818 August 2016 - 10:17:54 ERROR: Cpu and GPU count: More than one Cpu version number: 804 ,802 I think it's because of 8.02 in SETI main project and 8.04 is beta for CPU versions. Apparently one need to keep both identical (I have no 8.04 in app_info.xml for main project at all). SETI apps news We're not gonna fight them. We're gonna transcend them. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
To check/compare your original and patched files with mine: MD5: a072664a7063bb1a231a2aae967cc427 *BOINC Rescheduler.exe 07bd6714b0817e6faf1941575e56e0b5 *BOINC Rescheduler v7.exe cf29f5efe0eb427b37364a3432246762 *BOINC Rescheduler v8.exe I didn't edit BOINC Rescheduler64 - I can't test it and don't really know why it exists (why there is need for 64-bit Rescheduler) Is it impossible for a 32-bit program to access and edit client_state.xml on 64-bit Windows? Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
To check/compare your original and patched files with mine: Of course MD5 will only match if exact same changes were made. In my case I changed "human readable" name to Seti MB v8 so MD5 would hardly match. P.S. Interesting, that "Other" tab allows to move CPU<-> GPU for particular project, but doesn't understand VLAR/VHAR and main SETI MB one understands VLAR/VHAR, but doesn't make the difference between main and beta projects... EDIT: and more on version mismatch issue: when I suspended all beta CPU tasks error gone. And seems it works OK for GBT tasks too. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
|
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.