Message boards :
Number crunching :
Postponed: Waiting to acquire lock
Message board moderation
Author | Message |
---|---|
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Recently I have had an issue on my new Threadripper/VegaFE system where each new GPU task will stop 34s after starting with the message "Postponed: Waiting to acquire lock" another task will immediately start with no issues. When that task finishes, the postponed task will start over and run normally, and as soon as it makes progress another task will start and be postponed. My app_config file specifies 1 GPU and 1 CPU per task. I had run some MB_bench_13 tests recently, so that could be a factor. Any ideas on what could be the cause? GitHub: Ricks-Lab Instagram: ricks_labs |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
what does your commandline look like? |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
what does your commandline look like? -v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 300 -no_defaults_scaling -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleep GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Remove the following parms. -tt 300 and -np_default_scaling. -no_use_sleep is also senseless because it only affects Nvidia setup. See if this changes anything. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
After reboot, the Postpone messages are gone. I have also removed the command line options you suggested to see if it fixes problems I am having with tasks hanging. Remove the following parms. GitHub: Ricks-Lab Instagram: ricks_labs |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
I'm seeing this in his stderr report Info: CPU affinity mask used: 4; system mask is ffffffff That particular error he is getting, I thought we saw that when we used -cpu_lock and there were more tasks than CPU cores. example 16 work units on 12 cores, even though they weren't using a full core, 1 core was reserved for each work units so the "extra" work units would crunch, stop. await a core then when it got one, start all over again. Not sure how or why the ATI is doing that. I don't have familiarity with ATI cards. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Run a offline bench using bench.cfg with following parm sequence. Its in Knabench main folder. add those lines. MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 10 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 30 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 45 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 60 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 75 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 90 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180 MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180 Replace correct app version off course. So you can see if -tt values can benefit on your setup. With each crime and every kindness we birth our future. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
I'm seeing this in his stderr report Its not an error Zalster, its an informal message that CPU affinity is active. On ATI hosts its enabled by default. He can try -no_cpu_lock to check it out off course. But we fixed cpu affinity in recent versions. And the Threadripper has 16 physical cores so i don`t think it matters here anyways. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here are the DOE results I have collected per your suggestion: DOE Results Looks like the parameters have no effect on processing speed in this test case. Full arguments are as follows: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 500 -no_defaults_scaling -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleep Run a offline bench using bench.cfg with following parm sequence. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I have rebooted the system and the error mentioned is now gone. I suspect it was caused by work I was doing with bench testing. Additional info is that CPU is used only for LHC and GPU for SETI. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Here are the DOE results. Original embedded image didn't work. GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
What task did you use ? Did you use any other app args besides -tt and -sbs ? On this bench it looks like kernel target time has no effect on your card. All best results are within one second. Have you deleted cached binaries before the bench ? With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
What task did you use ? I am using this task: blc4_2bit_guppi_57424_82736_HIP9598_0011.19766.416.18.27.99.vlar. It is the older smaller version. I am using r3584 with the following arguments: MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 512 -period_iterations_num 1 -tt 300 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf I did not delete cached binaries before the bench. I thought it would just use the binary in the science app directory. Let me know the implications and if I should approach bench testing differently. Thanks! GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
First of all on each new bench you should delete the cached binaries. Also don`t use any other app args except the one you want to test. A fix -sbs value is possible, i suggest -sbs 384. Only this way you can be sure to get clean results. Next step would be to use best values from test 1 with app args you are using live. Beware: you have -tt 300 in your comandline so test was senseless because comandline overrides preselected values. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
First of all on each new bench you should delete the cached binaries. This was just one sample command line from BenchCFG. I have modified the arguments in each entry of the BenchCFG to execute the DOE. In the Testdatas log file, I can see each execution using the arguments from the BenchCFG file. Can I email it over to you to check if it is working as expected? GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Hi Mike, I found the problem. The script was using my arguments from the config file plus those specified in the command line txt file, so all runs got the same arguments. I am trying it again. Thanks for getting me on the right track. GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
First of all on each new bench you should delete the cached binaries. Sure i will PM you my email. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I found an issue in my setup of the MB_bench_213 script and have corrected it for this run of the tt vs. sbs DOE. Here are the updated results using r3584 on a VegaFE running an older guppi: GitHub: Ricks-Lab Instagram: ricks_labs |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Very interesting Rick. As you can see benefit from bigger -sbs value is higher than from kernel target time. From 512 to 2048 its almost 25 seconds. I`m surprised -sbs 2048 gives so much better performance on your GPU. Probably the HBM memory. Now to test best -tt settings with additional app args. With each crime and every kindness we birth our future. |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Very interesting Rick. Let me know of any recommendations. I plan to rerun this case with a newer, longer guppi. GitHub: Ricks-Lab Instagram: ricks_labs |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.