Postponed: Waiting to acquire lock

Author	Message
Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1911215 - Posted: 6 Jan 2018, 19:41:54 UTC - in response to Message 1911211. The rescheduler program could just make that more common because it shutdown and restart the Boinc process too. Yes, I'd say that's accurate. The rescheduler tells the BOINC Manager to shut down and, from there on, it's the Manager (and OS) that control the client shutdown, just as it would any other time you Exit the Manager from the menu. ID: 1911215 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1911216 - Posted: 6 Jan 2018, 19:44:27 UTC - in response to Message 1911212. Last modified: 6 Jan 2018, 19:49:14 UTC If you say so. My recommendation stands. There are reasons I don't use scripts to edit my state file, in any platform. Oh, and I don't use a third party App to control BOINC Tasks either, I use the BOINC Manager. ID: 1911216 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1911221 - Posted: 6 Jan 2018, 19:49:03 UTC - in response to Message 1911215. What we need to find it's why when the exit happening the GPU related slots close the lock file and that not happening with the CPU related slots. But that only happening when the Boinc is running the last WU of the cache...... Weird..... Will do like you, go for a new beer. ID: 1911221 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1911225 - Posted: 6 Jan 2018, 19:58:34 UTC - in response to Message 1911216. It's irrelevant if the client_state file gets edited while BOINC is running. BOINC only reads that file at startup, then maintains and updates it in memory. The only thing that will happen is that BOINC periodically overwrites the file on disc, thus negating any changes that might have been made on disc while BOINC is running. ID: 1911225 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1911227 - Posted: 6 Jan 2018, 20:00:35 UTC - in response to Message 1911225. It's irrelevant if the client_state file gets edited while BOINC is running. BOINC only reads that file at startup, then maintains and updates it in memory. The only thing that will happen is that BOINC periodically overwrites the file on disc, thus negating any changes that might have been made on disc while BOINC is running. Correct. ID: 1911227 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1911229 - Posted: 6 Jan 2018, 20:08:35 UTC - in response to Message 1911221. Last modified: 6 Jan 2018, 20:20:06 UTC What we need to find it's why when the exit happening the GPU related slots close the lock file and that not happening with the CPU related slots. But that only happening when the Boinc is running the last WU of the cache...... Weird..... I wonder if it's possible that the BOINC client actually finishes shutting down before the individual science apps have completely terminated. On the other hand, perhaps the client is forcing app termination prematurely. Off the top of my head, I'm not sure if it's the science app or the client that maintains the lockfile. EDIT: Ah, here's a snippet from one of my old Process Monitor logs. This isn't from a client shutdown, but does seem to show that it's the science app that deletes the lockfile. 5:45:17.7139951 PM Lunatics_x41zc_win32_cuda50.exe 4020 CloseFile C:\Documents and Settings\All Users\Application Data\BOINC\slots\3\boinc_lockfile SUCCESS 5:45:17.7142202 PM Lunatics_x41zc_win32_cuda50.exe 4020 CreateFile C:\Documents and Settings\All Users\Application Data\BOINC\slots\3 SUCCESS Desired Access: Read Data/List Directory, Synchronize, Disposition: Open, Options: Directory, Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened 5:45:17.7142423 PM Lunatics_x41zc_win32_cuda50.exe 4020 QueryDirectory C:\Documents and Settings\All Users\Application Data\BOINC\slots\3\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 5:45:17.7142763 PM Lunatics_x41zc_win32_cuda50.exe 4020 CloseFile C:\Documents and Settings\All Users\Application Data\BOINC\slots\3 SUCCESS 5:45:17.7144252 PM Lunatics_x41zc_win32_cuda50.exe 4020 CreateFile C:\Documents and Settings\All Users\Application Data\BOINC\slots\3\boinc_lockfile SUCCESS Desired Access: Read Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened 5:45:17.7144544 PM Lunatics_x41zc_win32_cuda50.exe 4020 QueryAttributeTagFile C:\Documents and Settings\All Users\Application Data\BOINC\slots\3\boinc_lockfile SUCCESS Attributes: A, ReparseTag: 0x0 5:45:17.7144668 PM Lunatics_x41zc_win32_cuda50.exe 4020 SetDispositionInformationFile C:\Documents and Settings\All Users\Application Data\BOINC\slots\3\boinc_lockfile SUCCESS Delete: True 5:45:17.7144813 PM Lunatics_x41zc_win32_cuda50.exe 4020 CloseFile C:\Documents and Settings\All Users\Application Data\BOINC\slots\3\boinc_lockfile SUCCESS ID: 1911229 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1911230 - Posted: 6 Jan 2018, 20:10:32 UTC Is this happening ONLY with tasks that you have rescheduled? ID: 1911230 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1911232 - Posted: 6 Jan 2018, 20:15:28 UTC - in response to Message 1911227. Last modified: 6 Jan 2018, 20:15:49 UTC You people act as though Juan never restarts boinc, the last log file contains 23 restarts, all it takes is one to preserve the edits that were made during running. 23 is a lot... ID: 1911232 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1911235 - Posted: 6 Jan 2018, 20:21:23 UTC - in response to Message 1911232. That would only make a difference if it followed the exact sequence Edit file Stop BOINC Save file Start BOINC If the file is saved at any other moment - either before BOINC stops, or after it restarts - nothing will be preserved from the editing. I agree with Jeff: this line of thought is a red herring. So I'll butt out again. ID: 1911235 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1911239 - Posted: 6 Jan 2018, 20:27:38 UTC - in response to Message 1911232. You people act as though Juan never restarts boinc, the last log file contains 23 restarts, all it takes is one to preserve the edits that were made during running. 23 is a lot... Sure, and every time the client shuts down it will overwrite any client_state.xml file on disc with the contents held in memory, thus wiping out any disc edits. Now, if you can find a reference to a client_state_prev file somewhere in the log following a restart, then there might be a very slim chance of a manual edit sneaking in. ID: 1911239 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1911240 - Posted: 6 Jan 2018, 20:30:07 UTC - in response to Message 1911235. Run it dry, and Nuke it from Orbit. Only way to be sure. Something is obvious borked, best to start with a New file and empty slots. ID: 1911240 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1911252 - Posted: 6 Jan 2018, 20:46:19 UTC - in response to Message 1911230. Last modified: 6 Jan 2018, 20:52:16 UTC Is this happening ONLY with tasks that you have rescheduled? Cant say for sure yes or no. I believe no because there are CPU files and i always rescheduler CPU to GPU. What i could say it's happening only when the crunching process of the set of WU starts. And only on CPU. ID: 1911252 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1911255 - Posted: 6 Jan 2018, 20:50:35 UTC - in response to Message 1911232. You people act as though Juan never restarts boinc, the last log file contains 23 restarts, all it takes is one to preserve the edits that were made during running. 23 is a lot... Normally i don't do that. Those stop and restarts are mainly rescheduling (we have a lot of troubles to obtain new WU in the last days) and some test i'm doing this days trying to understand the problem. But today i just made one adjust and was the change of the 5 CPU WU to 6, and no rescheduled or other tests you could se that in my latest log file ID: 1911255 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1911258 - Posted: 6 Jan 2018, 20:57:20 UTC Last modified: 6 Jan 2018, 20:59:36 UTC Before to take extreme measures like kill the client file (i know how to preserve the host ID) I was thinking to do this: Stop the AVX2 builds and put the stock Linux to crunch and see if something changes. or whatever other build more common used. Something is not working fine in the lock /unlock of the file , or maybe my host uses too much time to exit and something mess with the lock/unlock process. Opening for sugestions ID: 1911258 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1911260 - Posted: 6 Jan 2018, 21:02:16 UTC - in response to Message 1911258. You deleted all the slot folders and with them all the lockfiles, right? Let it run like that until empty. See what effect that deletion has made. While it runs, LOOK but DON'T TOUCH. Do you have CPU tasks? Do you have GPU tasks? Are both types running? Are any tasks postponed? Gather evidence. ID: 1911260 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1911268 - Posted: 6 Jan 2018, 21:22:06 UTC - in response to Message 1911260. Last modified: 6 Jan 2018, 21:23:02 UTC You deleted all the slot folders and with them all the lockfiles, right? Let it run like that until empty. See what effect that deletion has made. I done that before when i first see the problem. It kills all active WU (including the postponed). When i do that all return to normal, after killing the postponed WU BTW The GPU WU continues to crunch normally when the postponed error still happening on the CPU WU. While it runs, LOOK but DON'T TOUCH. Do you have CPU tasks? Do you have GPU tasks? Are both types running? Are any tasks postponed? Gather evidence. Not clearly understand what you ask for but let try to explain since i do a lot of tries. Both types of WU are running. Runs for hours normally. For some reason my host stop to receive new WU (server crash like yesterday) The work continues Normally the GPU WU cache ends first as expected. Then when the CPU WU caches is at the end the last WU is the one who aparentlly starts the problem. The host could restart to receive GPU WU and they restarted to crunch as normal only the CPU WU crunching part stops to work. Set to NNT to dry the cache to try cleaning the host file. But that will take few hours maybe most of you will sleeping when that happening.B ID: 1911268 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1911272 - Posted: 6 Jan 2018, 21:25:08 UTC - in response to Message 1911252. Is this happening ONLY with tasks that you have rescheduled? Cant say for sure yes or no. I believe no because there are CPU files and i always rescheduler CPU to GPU. What i could say it's happening only when the crunching process of the set of WU starts. And only on CPU. The reason I asked was because there is no string of cmdline or api_version is the rescheduling script ... So it is not specifically looking for these when modifying the client state. You have added/removed lines of the app_info from what has been 'normal' so if cpu2gpu is looking to remove/add X number of lines, it could very well be reformatting it incorrectly. You would have to move a single file and do a comparison of the output to know for sure if that is the case. ID: 1911272 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 1911277 - Posted: 6 Jan 2018, 21:35:17 UTC - in response to Message 1911268. For some reason my host stop to receive new WU (server crash like yesterday) There has been an issue with the Scheduler for the last 12 months where it will randomly stop allocating work to certain systems, even though they've just reported work. And then it will start allocating work again, when it feels like it. You may or may not run out of work in the mean time. The Scheduler response is usually "Project has no tasks available", and very occasionally it'll say there is no work available for your selected application, but there is work available for others. Generally Tbar's triple update gets things going again. In the BOINC Manager, click on Update. Once the Scheduler request is in progress, click on Update again. When that Scheduler request has completed, click on update again. On the next automatic update, work should start flowing again. If you're having problems downloading work once it's allocated, then it's time to edit your Hosts file again... Grant Darwin NT ID: 1911277 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1911278 - Posted: 6 Jan 2018, 21:35:28 UTC - in response to Message 1911272. You have added/removed lines of the app_info from what has been 'normal' I use app_config to pass the commands not mess with app_info. Only change in app_info file to enable the AVX2 builds but i take extreme care to not touch anything else since i know if i do it's a time bomb. This is my file, it's extreamelly clean <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90</name> <executable/> </file_info> <file_info> <name>libcudart.so.9.0</name> </file_info> <file_info> <name>libcufft.so.9.0</name> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>801</version_num> <plan_class>cuda90</plan_class> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>1</avg_ncpus> <max_ncpus>1</max_ncpus> <file_ref> <file_name>setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90</file_name> <main_program/> </file_ref> <file_ref> <file_name>libcudart.so.9.0</file_name> </file_ref> <file_ref> <file_name>libcufft.so.9.0</file_name> </file_ref> </app_version> <app> <name>astropulse_v7</name> </app> <file_info> <name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name> <executable/> </file_info> <file_info> <name>AstroPulse_Kernels_r2751.cl</name> </file_info> <file_info> <name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name> </file_info> <app_version> <app_name>astropulse_v7</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>708</version_num> <plan_class>opencl_nvidia_100</plan_class> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>1</avg_ncpus> <max_ncpus>1</max_ncpus> <file_ref> <file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name> <main_program/> </file_ref> <file_ref> <file_name>AstroPulse_Kernels_r2751.cl</file_name> </file_ref> <file_ref> <file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name> <open_name>ap_cmdline.txt</open_name> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.22r3712_avx2_x86_64-pc-linux-gnu</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>800</version_num> <api_version>6.1.0</api_version> <file_ref> <file_name>MBv8_8.22r3712_avx2_x86_64-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app> <name>astropulse_v7</name> </app> <file_info> <name>ap_7.05r2728_sse3_linux64</name> <executable/> </file_info> <app_version> <app_name>astropulse_v7</app_name> <version_num>704</version_num> <platform>x86_64-pc-linux-gnu</platform> <plan_class></plan_class> <file_ref> <file_name>ap_7.05r2728_sse3_linux64</file_name> <main_program/> </file_ref> </app_version> </app_info> ID: 1911278 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1911292 - Posted: 6 Jan 2018, 22:22:35 UTC Juan, here's a little experiment you could try. While BOINC is running, use gedit to open a boinc_lockfile from one of the slots where a CPU app is running. It's an empty file, but it still should open okay. Then shut down BOINC. Once BOINC is completely shut down, simply hit Save in gedit. That should recreate the file in that same slot folder. Restart BOINC and see if the task that was running in that slot gets postponed. You could also try doing the same thing with one of the GPU tasks. I tried that test with CPU tasks on both my daily driver (Win 7) and one of my Linux boxes. Neither of those apps cared that there was already a lockfile present in the slot. They both restarted fine. So, if your CPU app has a problem with a pre-existing lockfile, then there might be an app-specific issue. On the other hand, if yours restart smoothly even with the lockfile present, then it would seem as if there's some other factor involved besides just the lockfile. At least that would be a bit more info worth knowing, I think. ID: 1911292 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.