Message boards :
Number crunching :
Postponed: Waiting to acquire lock
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 · Next
Author | Message |
---|---|
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Here's another bit of hard evidence to add to the pile. Here on my daily driver, running Win7, I just experimented with two BOINC restarts, running 1 GPU and 5 CPU tasks. The first run had boinc_lockfiles in the GPU and one of the CPU slots. The second run had all slots clear of lockfiles. Before each restart I turned on Process Monitor, just long enough to capture all events until all tasks were in a "Running" state. I then tried to determine what might be going on with the lockfiles and/or with the slots themselves. Here are the events I extracted from the first trial. The first thing to note is that, even before the science apps start trying to create lockfiles, the BOINC client polls all the existing slots to see what files are already present. You can see that, in this case, slots 2 and 5 already had lockfiles present. Secondly, when the science apps then try to allocate lockfiles in each slot, all CreateFile attempts end with SUCCESS, but....if you look to the end of each of those lines, you'll find "OpenResult: Created" for the 4 slots that were free of lockfiles, while the two with pre-existing lockfiles both got "OpenResult: Opened". The apps didn't care if the lockfiles needed to be created first, only that they could open them as non-shared objects. A third thing to note, though I have no idea what it means, is that a subsequent polling of the lockfiles by Explorer only looked at the 4 slots that had newly created lockfiles. 2:21:21.0503030 PM boinc.exe 7564 QueryDirectory C:\ProgramData\BOINC\slots\4 SUCCESS 0: .., 1: boinc_task_state.xml, 2: init_data.xml, 3: libfftw3f-3-3-4_x86.dll, 4: MB8_win_x86_SSE3_VS2008_r3330.exe, 5: mb_cmdline.txt, 6: result.sah, 7: state.sah, 8: stderr.txt, 9: work_unit.sah 2:21:21.0504632 PM boinc.exe 7564 QueryDirectory C:\ProgramData\BOINC\slots\1 SUCCESS 0: .., 1: boinc_task_state.xml, 2: init_data.xml, 3: libfftw3f-3-3-4_x86.dll, 4: MB8_win_x86_SSE3_VS2008_r3330.exe, 5: mb_cmdline.txt, 6: result.sah, 7: state.sah, 8: stderr.txt, 9: work_unit.sah 2:21:21.0506212 PM boinc.exe 7564 QueryDirectory C:\ProgramData\BOINC\slots\0 SUCCESS 0: .., 1: boinc_task_state.xml, 2: init_data.xml, 3: libfftw3f-3-3-4_x86.dll, 4: MB8_win_x86_SSE3_VS2008_r3330.exe, 5: mb_cmdline.txt, 6: result.sah, 7: state.sah, 8: stderr.txt, 9: work_unit.sah 2:21:21.0507793 PM boinc.exe 7564 QueryDirectory C:\ProgramData\BOINC\slots\2 SUCCESS 0: .., 1: boinc_lockfile, 2: boinc_task_state.xml, 3: init_data.xml, 4: libfftw3f-3-3-4_x86.dll, 5: MB8_win_x86_SSE3_VS2008_r3330.exe, 6: mb_cmdline.txt, 7: result.sah, 8: state.sah, 9: stderr.txt, 10: work_unit.sah 2:21:21.0509374 PM boinc.exe 7564 QueryDirectory C:\ProgramData\BOINC\slots\3 SUCCESS 0: .., 1: boinc_task_state.xml, 2: init_data.xml, 3: libfftw3f-3-3-4_x86.dll, 4: MB8_win_x86_SSE3_VS2008_r3330.exe, 5: mb_cmdline.txt, 6: result.sah, 7: state.sah, 8: stderr.txt, 9: work_unit.sah 2:21:21.0510946 PM boinc.exe 7564 QueryDirectory C:\ProgramData\BOINC\slots\5 SUCCESS 0: .., 1: boinc_lockfile, 2: boinc_task_state.xml, 3: cudart32_50_35.dll, 4: cufft32_50_35.dll, 5: init_data.xml, 6: Lunatics_x41zi_win32_cuda50.exe, 7: mbcuda.cfg, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:21:21.4824767 PM MB8_win_x86_SSE3_VS2008_r3330.exe 3052 CreateFile C:\ProgramData\BOINC\slots\4\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:21:21.4836711 PM MB8_win_x86_SSE3_VS2008_r3330.exe 2108 CreateFile C:\ProgramData\BOINC\slots\0\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:21:21.4908853 PM MB8_win_x86_SSE3_VS2008_r3330.exe 2788 CreateFile C:\ProgramData\BOINC\slots\1\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:21:21.5190641 PM MB8_win_x86_SSE3_VS2008_r3330.exe 6396 CreateFile C:\ProgramData\BOINC\slots\3\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:21:21.5321052 PM MB8_win_x86_SSE3_VS2008_r3330.exe 5884 CreateFile C:\ProgramData\BOINC\slots\2\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Opened 2:21:21.6638521 PM Lunatics_x41zi_win32_cuda50.exe 7312 CreateFile C:\ProgramData\BOINC\slots\5\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Opened 2:21:21.7998308 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\0\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:21:21.8063409 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\1\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:21:21.8075796 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\3\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:21:21.8089144 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\4\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:21:21.8339625 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\5 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: cudart32_50_35.dll, 5: cufft32_50_35.dll, 6: init_data.xml, 7: Lunatics_x41zi_win32_cuda50.exe, 8: mbcuda.cfg, 9: result.sah, 10: state.sah, 11: stderr.txt, 12: work_unit.sah 2:21:22.1674186 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\0 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:21:22.1677302 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\1 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:21:22.1680209 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\2 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:21:22.1683021 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\3 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:21:22.1685834 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\4 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah The second run, with no lockfiles in any of the slots, doesn't appear to hold any surprises. All 6 slots had new lockfiles created and got "OpenResult: Created". 2:42:22.7528413 PM boinc.exe 3236 QueryDirectory C:\ProgramData\BOINC\slots\0 SUCCESS 0: .., 1: boinc_task_state.xml, 2: init_data.xml, 3: libfftw3f-3-3-4_x86.dll, 4: MB8_win_x86_SSE3_VS2008_r3330.exe, 5: mb_cmdline.txt, 6: result.sah, 7: state.sah, 8: stderr.txt, 9: work_unit.sah 2:42:22.7530051 PM boinc.exe 3236 QueryDirectory C:\ProgramData\BOINC\slots\2 SUCCESS 0: .., 1: boinc_task_state.xml, 2: init_data.xml, 3: libfftw3f-3-3-4_x86.dll, 4: MB8_win_x86_SSE3_VS2008_r3330.exe, 5: mb_cmdline.txt, 6: result.sah, 7: state.sah, 8: stderr.txt, 9: work_unit.sah 2:42:22.7531944 PM boinc.exe 3236 QueryDirectory C:\ProgramData\BOINC\slots\3 SUCCESS 0: .., 1: boinc_task_state.xml, 2: init_data.xml, 3: libfftw3f-3-3-4_x86.dll, 4: MB8_win_x86_SSE3_VS2008_r3330.exe, 5: mb_cmdline.txt, 6: result.sah, 7: state.sah, 8: stderr.txt, 9: work_unit.sah 2:42:22.7533824 PM boinc.exe 3236 QueryDirectory C:\ProgramData\BOINC\slots\5 SUCCESS 0: .., 1: boinc_task_state.xml, 2: cudart32_50_35.dll, 3: cufft32_50_35.dll, 4: init_data.xml, 5: Lunatics_x41zi_win32_cuda50.exe, 6: mbcuda.cfg, 7: result.sah, 8: state.sah, 9: stderr.txt, 10: work_unit.sah 2:42:22.7535422 PM boinc.exe 3236 QueryDirectory C:\ProgramData\BOINC\slots\4 SUCCESS 0: .., 1: boinc_task_state.xml, 2: init_data.xml, 3: libfftw3f-3-3-4_x86.dll, 4: MB8_win_x86_SSE3_VS2008_r3330.exe, 5: mb_cmdline.txt, 6: result.sah, 7: state.sah, 8: stderr.txt, 9: work_unit.sah 2:42:22.7537047 PM boinc.exe 3236 QueryDirectory C:\ProgramData\BOINC\slots\1 SUCCESS 0: .., 1: boinc_task_state.xml, 2: init_data.xml, 3: libfftw3f-3-3-4_x86.dll, 4: MB8_win_x86_SSE3_VS2008_r3330.exe, 5: mb_cmdline.txt, 6: result.sah, 7: state.sah, 8: stderr.txt, 9: work_unit.sah 2:42:23.1579723 PM MB8_win_x86_SSE3_VS2008_r3330.exe 9068 CreateFile C:\ProgramData\BOINC\slots\0\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:42:23.1801118 PM MB8_win_x86_SSE3_VS2008_r3330.exe 8440 CreateFile C:\ProgramData\BOINC\slots\2\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:42:23.2202281 PM MB8_win_x86_SSE3_VS2008_r3330.exe 9280 CreateFile C:\ProgramData\BOINC\slots\3\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:42:23.2249198 PM MB8_win_x86_SSE3_VS2008_r3330.exe 5504 CreateFile C:\ProgramData\BOINC\slots\4\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:42:23.2515987 PM MB8_win_x86_SSE3_VS2008_r3330.exe 9232 CreateFile C:\ProgramData\BOINC\slots\1\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:42:23.3393588 PM Lunatics_x41zi_win32_cuda50.exe 10144 CreateFile C:\ProgramData\BOINC\slots\5\boinc_lockfile SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OpenIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: None, AllocationSize: 0, OpenResult: Created 2:42:23.5946561 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\0\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:42:23.5969254 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\1\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:42:23.5981805 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\2\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:42:23.6025992 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\3\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:42:23.6031871 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\4\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:42:23.6054408 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\5\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:42:23.6062870 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\0\boinc_lockfile SUCCESS Filter: boinc_lockfile, 1: boinc_lockfile 2:42:23.7187158 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\0 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:42:23.7863756 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\0 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:42:23.7864861 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\1 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:42:23.7866737 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\2 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:42:23.7867948 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\3 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:42:23.7869701 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\4 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:42:23.7870933 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\5 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: cudart32_50_35.dll, 5: cufft32_50_35.dll, 6: init_data.xml, 7: Lunatics_x41zi_win32_cuda50.exe, 8: mbcuda.cfg, 9: result.sah, 10: state.sah, 11: stderr.txt, 12: work_unit.sah 2:42:26.2022608 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\0 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah 2:42:26.2024415 PM explorer.exe 2572 QueryDirectory C:\ProgramData\BOINC\slots\1 SUCCESS 0: ., 1: .., 2: boinc_lockfile, 3: boinc_task_state.xml, 4: init_data.xml, 5: libfftw3f-3-3-4_x86.dll, 6: MB8_win_x86_SSE3_VS2008_r3330.exe, 7: mb_cmdline.txt, 8: result.sah, 9: state.sah, 10: stderr.txt, 11: work_unit.sah Of course, as I reported before, none of my restarted tasks in Windows resulted in "Task postponed" messages. What this test shows me is that it's not just the presence or absence of a lockfile that these apps care about so much as it is the ability to take ownership of that lockfile as a non-shared resource. I suspect that the apps running into a problem may be taking a slightly different approach. Unfortunately, I don't know that there's a Linux equivalent to Process Monitor to get such a detailed view of exactly what's going on at the application level. EDIT: BTW, on this machine I'm running BOINC 7.6.33. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Then stop rescheduling, so that that can be taken out of the equation to see if BOINC and the apps operate normally without changing the client_state file on restarts. I don't do any rescheduling. GitHub: Ricks-Lab Instagram: ricks_labs |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I don't do any rescheduling. Thanks for the answer. Did you follow the thread? Since apparently only we two has the issue. I believe we find a way to bypass the issue. Just delete the lock file on the "postponed" WU slot. Try the next time you get the issue and share to us is that works for you too. So that give the answer of the question... rescheduler is not the source of the problem. And I believe that takes out the client file as a source of the problem too. My clue the resheduler just made it worst because it stops & starts the Boinc more times, so the "timing error" has more chances to happening. That could explain why i see the issue more commonly. Now is with you guy's |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Not sure if this observation is relevant, but on my Linux system I have always had an issue where if I started boincmgr too soon after exiting it, it would not connect to the project and I would have to terminate and try again. To avoid this issue I would always monitor MB processes in system monitor and wait for all to finish before starting boincmgr again. It usually takes a long time (~1 min) of some processes being idle before they stop running. I did this just now and found 8 processes still listed after 3min: Now it has been over 10min and 3 of those 8 are still listed in the system monitor. I just restarted boincmgr and after more than 30min, those 3 processes still show up as active: 18836 1696 0 Jan06 pts/18 00:00:17 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 18837 1696 0 Jan06 pts/18 00:00:17 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 18838 1696 0 Jan06 pts/18 00:00:17 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59633 59606 96 17:45 pts/18 00:21:42 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59635 59606 98 17:45 pts/18 00:22:07 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59637 59606 95 17:45 pts/18 00:21:36 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59638 59606 96 17:45 pts/18 00:21:47 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59640 59606 97 17:45 pts/18 00:22:00 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59642 59606 96 17:45 pts/18 00:21:50 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59645 59606 96 17:45 pts/18 00:21:48 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59646 59606 96 17:45 pts/18 00:21:50 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59648 59606 97 17:45 pts/18 00:21:54 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59650 59606 96 17:45 pts/18 00:21:48 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59652 59606 96 17:45 pts/18 00:21:43 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59654 59606 96 17:45 pts/18 00:21:40 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59656 59606 95 17:45 pts/18 00:21:33 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59659 59606 96 17:45 pts/18 00:21:40 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59661 59606 97 17:45 pts/18 00:21:58 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59663 59606 96 17:45 pts/18 00:21:38 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59665 59606 97 17:45 pts/18 00:22:02 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59667 59606 96 17:45 pts/18 00:21:44 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59669 59606 96 17:45 pts/18 00:21:41 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59670 59606 97 17:45 pts/18 00:21:58 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59671 59606 95 17:45 pts/18 00:21:25 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59673 59606 95 17:45 pts/18 00:21:37 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59681 59606 97 17:45 pts/18 00:22:01 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59685 59606 97 17:45 pts/18 00:22:00 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59687 59606 96 17:45 pts/18 00:21:45 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59825 59606 97 17:54 pts/18 00:13:40 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59954 59606 94 18:03 pts/18 00:04:39 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59973 59606 95 18:04 pts/18 00:03:29 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 59977 59606 97 18:04 pts/18 00:03:27 ../../projects/setiathome.berkeley.edu/MBv8_8.05r3345_avx_linux64 GitHub: Ricks-Lab Instagram: ricks_labs |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Thanks - that's very clear about which app to focus on, too. Just to be absolutely clear, you are aware that BOINC Manager (boincmgr) doesn't need to be running for the BOINC Client (boinc) to do its work? There is an option "Stop running tasks when exiting the BOINC Manager": if that option is unchecked, it will behave - deliberately - as you are describing. The option is contained in the Exit Confirmation dialog: if that doesn't appear, enable it from the Options --> Other options... menu in BOINC Manager. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Thanks - that's very clear about which app to focus on, too. Yes, I am aware of that option and checked it and indicated it should remember, so it should stop all tasks each time I quit. Plus all but 3 MB process did exit. GitHub: Ricks-Lab Instagram: ricks_labs |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Thanks again. Just wanted to be certain. So, MBv8_8.05r3345_avx_linux64 needs to go under the microscope. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Thanks again. Just wanted to be certain. So, MBv8_8.05r3345_avx_linux64 needs to go under the microscope. One more item to point out. Even though those 3 tasks were still active, I did not observe the "Waiting to acquire lock" error. Actually, I have only observed that error the one time I posted here. I was only raising these observations as being potentially relevant. GitHub: Ricks-Lab Instagram: ricks_labs |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Not sure if this observation is relevant, but on my Linux system I have always had an issue where if I started boincmgr too soon after exiting it, it would not connect to the project and I would have to terminate and try again. To avoid this issue I would always monitor MB processes in system monitor and wait for all to finish before starting boincmgr again. The same behaviour is happening with my Linux box. Sometimes after i stop the Boinc (yes myStop running tasks .. is checked) when i try to restart, it restart with a completely empty screen (like when we start with no projects attached, no projects or Wu are displayed). To fix that i need to exit Boinc. Wait few seconds and restart. Most of the times the second try restart it normally, when no i repeat the cycle. The next time it happening i will look the process monitor and check if something was left behind like posted. I only start Boinc by the Boinc Manager. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
If you are using System Monitor to see those zombie tasks still running, then you can also check at the top of the list of System Monitor for the two processes, boinc and boincmgr. Unless you check the option to exit the client as well when exiting or shutting down the Manager the Client can still be running and supporting all your processes. It was explained earlier in the thread, and code was posted, that after the Manager was shut down as well as the Client, that if there any zombie processes or in other words left running, there is a 60 second timer plus 5 seconds before the the zombie running tasks are "bopped on the head" with a "kill" command. I have seen this in action many times. The blank Manager after a fast restart is caused by these zombie tasks. The question now is ..... why were 3 zombie tasks running 3 minutes after supposed client and manager shut down. I would doubly make sure to check to see if the client is still running when tasks don't ever disappear from the System Monitor. If the client is still running, then I wouldn't expect it to issue the kill command. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
The blank Manager after a fast restart is caused by these zombie tasks.I'm not sure about that - or maybe I'm just thinking of a slightly different way of describing it. When you close down the Manager (with the 'stop tasks' box ticked), the Manager will tell the client to initiate closedown, and the client will tell the tasks to closedown. When the tasks have all finished, the client will close, and all will be clean. When you re-start the Manager, it will try to start a new Client. But only one client is allowed to run at the same time (without setting special switches). So, if the tasks have gone zombie, the old client will still be running, and the new client won't start - it'll exit again immediately. That's why you don't see client data in the new Manager. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Back when I was testing the older versions of BOINC in Linux I found it was common for tasks to take up to a minute to quit on some systems after the Manager was exited. I found another method that quit the tasks within seconds. When you wish to stop all running tasks go to the File Menu and select Shut down connected client... Answer OK to the first dialogue, then answer Cancel to the second. That should stop all tasks quickly, then exit the manager. I don't know why it takes so long on some systems, but that method will speed up the process. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Thanks Richard for better explaining it is not the zombie tasks directly preventing restart but the old client still running. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
I have seen that too, on the newer BOINC versions. I would bet that somewhere in the code there is a "kill" exit when you do the Cancel in the second step. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I don't know why it takes so long on some systems A long shot, could be that is the origin of the issue i have? If the Task takes too long to close and the Client is ended before that could leave the file lock? Maybe is the way the Linux kill the task who does that. I know i last to many questions. LOL All is working fine for now. I made few reschedules , I know that's is not needed with only bls05 WU available, made just to make the test more real. Keep the 6 CPU WU running + AVX2 builds. My caches are full . Let's wait tomorrow outage to see if something changes. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
When the tasks have all finished, the client will close, and all will be clean. It's expected to work this way, but that is not what really happening. For some reason sometimes in my host the client closes but the task remain in the memory. I will try to post an example when i see that. I look in the system monitor. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Is it possible that BOINC version enters into these Zombie task situations? I just tried a normal BOINC Manager Exit on each of my 3 Linux systems. All of mine are set to automatically shut down the client and running tasks. On all 3, System Monitor showed the longest shutdown delay for the last of the running tasks was no more than 4 seconds. All three of my Linux boxes are running BOINC 7.2.42. One other observation. I notice from Ruelke's screenshot that his tasks are running with "Normal" priority, while on all 3 of my boxes the tasks were set to "Very Low" priority. (boinc and boincmgr show as Normal priority.) Could that be a factor? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
It could be, all the systems that have reported the problem are running 7.8.3. I don't regularly read the BOINC GitHub Manager feeds but I do drop in occasionally to see what is brewing in changes and look over past commits in earlier versions. I don't see anything in the areas that theoretically could be a vector. I am not a code writer so someone more expert than me would have to comment. I have only had cpu zombie tasks hang around until killed and they were always running in "low" priority because I run a script to make that so. I also use the script to elevate any gpu task to high priority mode. I have never seen a gpu task take more than a couple of seconds to drop off the System Monitor. So, low priority processes might be the clue here, they might be so low that the system takes too long in polling to get around to looking for the kill command and misses them. And once the 65 seconds has timed out, the process won't be revisited. At least that is my suspicion, I would have to crawl through that API code that was posted to see if I could find if the kill process is reentered. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Actually, I was noting that my own tasks, all running with "Very Low" priority, were the ones that took no more than 4 seconds to kill, whereas the Zombie examples that Ruelke posted were ones running in "Normal" priority. EDIT: It may also be worth noting that my machines all have much slower CPUs than yours and Ruelke's. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
FYI All my tasks runs with Very Low priority. But why i have all this process running in my host if i only have 4 GPU + 6 CPU actually running? Since yesterday I not even run SSE4.1 anymore!!! . Something is not clearing the old process from the memory. https://1drv.ms/i/s!Asjkc9Jyluh3zxCec5AdKTaWh7Ll |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.