Postponed: Waiting to acquire lock

Author	Message
RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894051 - Posted: 8 Oct 2017, 7:49:28 UTC Recently I have had an issue on my new Threadripper/VegaFE system where each new GPU task will stop 34s after starting with the message "Postponed: Waiting to acquire lock" another task will immediately start with no issues. When that task finishes, the postponed task will start over and run normally, and as soon as it makes progress another task will start and be postponed. My app_config file specifies 1 GPU and 1 CPU per task. I had run some MB_bench_13 tests recently, so that could be a factor. Any ideas on what could be the cause? GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894051 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1894080 - Posted: 8 Oct 2017, 12:49:37 UTC - in response to Message 1894051. what does your commandline look like? ID: 1894080 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894082 - Posted: 8 Oct 2017, 13:02:05 UTC - in response to Message 1894080. what does your commandline look like? -v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 300 -no_defaults_scaling -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleep GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894082 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1894085 - Posted: 8 Oct 2017, 13:30:41 UTC Remove the following parms. -tt 300 and -np_default_scaling. -no_use_sleep is also senseless because it only affects Nvidia setup. See if this changes anything. With each crime and every kindness we birth our future. ID: 1894085 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894086 - Posted: 8 Oct 2017, 13:38:53 UTC - in response to Message 1894085. After reboot, the Postpone messages are gone. I have also removed the command line options you suggested to see if it fixes problems I am having with tasks hanging. Remove the following parms. -tt 300 and -np_default_scaling. -no_use_sleep is also senseless because it only affects Nvidia setup. See if this changes anything. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894086 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1894087 - Posted: 8 Oct 2017, 13:41:33 UTC - in response to Message 1894085. I'm seeing this in his stderr report Info: CPU affinity mask used: 4; system mask is ffffffff That particular error he is getting, I thought we saw that when we used -cpu_lock and there were more tasks than CPU cores. example 16 work units on 12 cores, even though they weren't using a full core, 1 core was reserved for each work units so the "extra" work units would crunch, stop. await a core then when it got one, start all over again. Not sure how or why the ATI is doing that. I don't have familiarity with ATI cards. ID: 1894087 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1894089 - Posted: 8 Oct 2017, 13:44:36 UTC Run a offline bench using bench.cfg with following parm sequence. Its in Knabench main folder. add those lines. MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 10 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 30 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 45 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 60 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 75 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 90 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180 Replace correct app version off course. So you can see if -tt values can benefit on your setup. With each crime and every kindness we birth our future. ID: 1894089 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1894092 - Posted: 8 Oct 2017, 13:53:14 UTC - in response to Message 1894087. Last modified: 8 Oct 2017, 13:56:31 UTC I'm seeing this in his stderr report Info: CPU affinity mask used: 4; system mask is ffffffff That particular error he is getting, I thought we saw that when we used -cpu_lock and there were more tasks than CPU cores. example 16 work units on 12 cores, even though they weren't using a full core, 1 core was reserved for each work units so the "extra" work units would crunch, stop. await a core then when it got one, start all over again. Not sure how or why the ATI is doing that. I don't have familiarity with ATI cards. Its not an error Zalster, its an informal message that CPU affinity is active. On ATI hosts its enabled by default. He can try -no_cpu_lock to check it out off course. But we fixed cpu affinity in recent versions. And the Threadripper has 16 physical cores so i don`t think it matters here anyways. With each crime and every kindness we birth our future. ID: 1894092 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894200 - Posted: 9 Oct 2017, 1:11:45 UTC - in response to Message 1894089. Here are the DOE results I have collected per your suggestion: DOE Results Looks like the parameters have no effect on processing speed in this test case. Full arguments are as follows: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 500 -no_defaults_scaling -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleep Run a offline bench using bench.cfg with following parm sequence. Its in Knabench main folder. add those lines. MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 10 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 30 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 45 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 60 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 75 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 90 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180 Replace correct app version off course. So you can see if -tt values can benefit on your setup. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894200 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894202 - Posted: 9 Oct 2017, 1:13:37 UTC Last modified: 9 Oct 2017, 1:22:26 UTC I have rebooted the system and the error mentioned is now gone. I suspect it was caused by work I was doing with bench testing. Additional info is that CPU is used only for LHC and GPU for SETI. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894202 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894220 - Posted: 9 Oct 2017, 2:39:32 UTC Here are the DOE results. Original embedded image didn't work. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894220 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1894247 - Posted: 9 Oct 2017, 10:41:28 UTC Last modified: 9 Oct 2017, 10:56:33 UTC What task did you use ? Did you use any other app args besides -tt and -sbs ? On this bench it looks like kernel target time has no effect on your card. All best results are within one second. Have you deleted cached binaries before the bench ? With each crime and every kindness we birth our future. ID: 1894247 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894499 - Posted: 10 Oct 2017, 11:32:55 UTC - in response to Message 1894247. What task did you use ? Did you use any other app args besides -tt and -sbs ? On this bench it looks like kernel target time has no effect on your card. All best results are within one second. Have you deleted cached binaries before the bench ? I am using this task: blc4_2bit_guppi_57424_82736_HIP9598_0011.19766.416.18.27.99.vlar. It is the older smaller version. I am using r3584 with the following arguments: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 512 -period_iterations_num 1 -tt 300 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf I did not delete cached binaries before the bench. I thought it would just use the binary in the science app directory. Let me know the implications and if I should approach bench testing differently. Thanks! GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894499 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1894509 - Posted: 10 Oct 2017, 12:13:12 UTC Last modified: 10 Oct 2017, 12:16:04 UTC First of all on each new bench you should delete the cached binaries. Also don`t use any other app args except the one you want to test. A fix -sbs value is possible, i suggest -sbs 384. Only this way you can be sure to get clean results. Next step would be to use best values from test 1 with app args you are using live. Beware: you have -tt 300 in your comandline so test was senseless because comandline overrides preselected values. With each crime and every kindness we birth our future. ID: 1894509 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894511 - Posted: 10 Oct 2017, 12:22:29 UTC - in response to Message 1894509. First of all on each new bench you should delete the cached binaries. Also don`t use any other app args except the one you want to test. A fix -sbs value is possible, i suggest -sbs 384. Only this way you can be sure to get clean results. Next step would be to use best values from test 1 with app args you are using live. Beware: you have -tt 300 in your comandline so test was senseless because comandline overrides preselected values. This was just one sample command line from BenchCFG. I have modified the arguments in each entry of the BenchCFG to execute the DOE. In the Testdatas log file, I can see each execution using the arguments from the BenchCFG file. Can I email it over to you to check if it is working as expected? GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894511 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894566 - Posted: 11 Oct 2017, 4:24:39 UTC Hi Mike, I found the problem. The script was using my arguments from the config file plus those specified in the command line txt file, so all runs got the same arguments. I am trying it again. Thanks for getting me on the right track. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894566 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1894581 - Posted: 11 Oct 2017, 8:12:40 UTC - in response to Message 1894511. First of all on each new bench you should delete the cached binaries. Also don`t use any other app args except the one you want to test. A fix -sbs value is possible, i suggest -sbs 384. Only this way you can be sure to get clean results. Next step would be to use best values from test 1 with app args you are using live. Beware: you have -tt 300 in your comandline so test was senseless because comandline overrides preselected values. This was just one sample command line from BenchCFG. I have modified the arguments in each entry of the BenchCFG to execute the DOE. In the Testdatas log file, I can see each execution using the arguments from the BenchCFG file. Can I email it over to you to check if it is working as expected? Sure i will PM you my email. With each crime and every kindness we birth our future. ID: 1894581 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894588 - Posted: 11 Oct 2017, 10:05:32 UTC Last modified: 11 Oct 2017, 10:16:07 UTC I found an issue in my setup of the MB_bench_213 script and have corrected it for this run of the tt vs. sbs DOE. Here are the updated results using r3584 on a VegaFE running an older guppi: GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894588 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1894591 - Posted: 11 Oct 2017, 10:16:50 UTC Very interesting Rick. As you can see benefit from bigger -sbs value is higher than from kernel target time. From 512 to 2048 its almost 25 seconds. I`m surprised -sbs 2048 gives so much better performance on your GPU. Probably the HBM memory. Now to test best -tt settings with additional app args. With each crime and every kindness we birth our future. ID: 1894591 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1894592 - Posted: 11 Oct 2017, 10:18:10 UTC - in response to Message 1894591. Very interesting Rick. As you can see benefit from bigger -sbs value is higher than from kernel target time. From 512 to 2048 its almost 25 seconds. I`m surprised -sbs 2048 gives so much better performance on your GPU. Probably the HBM memory. Now to test best -tt settings with additional app args. Let me know of any recommendations. I plan to rerun this case with a newer, longer guppi. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1894592 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.