Postponed: Waiting to acquire lock

Message boards : Number crunching : Postponed: Waiting to acquire lock
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894051 - Posted: 8 Oct 2017, 7:49:28 UTC

Recently I have had an issue on my new Threadripper/VegaFE system where each new GPU task will stop 34s after starting with the message "Postponed: Waiting to acquire lock" another task will immediately start with no issues. When that task finishes, the postponed task will start over and run normally, and as soon as it makes progress another task will start and be postponed. My app_config file specifies 1 GPU and 1 CPU per task. I had run some MB_bench_13 tests recently, so that could be a factor. Any ideas on what could be the cause?
YouTube Channel: Rick's Performance Computing
ID: 1894051 · Report as offensive     Reply Quote
Profile ZalsterProject Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 3992
Credit: 208,944,766
RAC: 48,793
United States
Message 1894080 - Posted: 8 Oct 2017, 12:49:37 UTC - in response to Message 1894051.  

what does your commandline look like?
ID: 1894080 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894082 - Posted: 8 Oct 2017, 13:02:05 UTC - in response to Message 1894080.  

what does your commandline look like?


-v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 300 -no_defaults_scaling -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleep

YouTube Channel: Rick's Performance Computing
ID: 1894082 · Report as offensive     Reply Quote
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30604
Credit: 57,614,045
RAC: 30,334
Germany
Message 1894085 - Posted: 8 Oct 2017, 13:30:41 UTC

Remove the following parms.

-tt 300 and -np_default_scaling.
-no_use_sleep is also senseless because it only affects Nvidia setup.

See if this changes anything.
With each crime and every kindness we birth our future.
ID: 1894085 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894086 - Posted: 8 Oct 2017, 13:38:53 UTC - in response to Message 1894085.  

After reboot, the Postpone messages are gone. I have also removed the command line options you suggested to see if it fixes problems I am having with tasks hanging.

Remove the following parms.

-tt 300 and -np_default_scaling.
-no_use_sleep is also senseless because it only affects Nvidia setup.

See if this changes anything.

YouTube Channel: Rick's Performance Computing
ID: 1894086 · Report as offensive     Reply Quote
Profile ZalsterProject Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 3992
Credit: 208,944,766
RAC: 48,793
United States
Message 1894087 - Posted: 8 Oct 2017, 13:41:33 UTC - in response to Message 1894085.  

I'm seeing this in his stderr report

Info: CPU affinity mask used: 4; system mask is ffffffff


That particular error he is getting, I thought we saw that when we used -cpu_lock and there were more tasks than CPU cores.
example 16 work units on 12 cores, even though they weren't using a full core, 1 core was reserved for each work units so the "extra" work units would crunch, stop. await a core then when it got one, start all over again.

Not sure how or why the ATI is doing that. I don't have familiarity with ATI cards.
ID: 1894087 · Report as offensive     Reply Quote
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30604
Credit: 57,614,045
RAC: 30,334
Germany
Message 1894089 - Posted: 8 Oct 2017, 13:44:36 UTC

Run a offline bench using bench.cfg with following parm sequence.
Its in Knabench main folder.
add those lines.

MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 10
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 30
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 45
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 60
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 75
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 90
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180

Replace correct app version off course.
So you can see if -tt values can benefit on your setup.
With each crime and every kindness we birth our future.
ID: 1894089 · Report as offensive     Reply Quote
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30604
Credit: 57,614,045
RAC: 30,334
Germany
Message 1894092 - Posted: 8 Oct 2017, 13:53:14 UTC - in response to Message 1894087.  
Last modified: 8 Oct 2017, 13:56:31 UTC

I'm seeing this in his stderr report

Info: CPU affinity mask used: 4; system mask is ffffffff


That particular error he is getting, I thought we saw that when we used -cpu_lock and there were more tasks than CPU cores.
example 16 work units on 12 cores, even though they weren't using a full core, 1 core was reserved for each work units so the "extra" work units would crunch, stop. await a core then when it got one, start all over again.

Not sure how or why the ATI is doing that. I don't have familiarity with ATI cards.


Its not an error Zalster, its an informal message that CPU affinity is active.
On ATI hosts its enabled by default.

He can try -no_cpu_lock to check it out off course.
But we fixed cpu affinity in recent versions.
And the Threadripper has 16 physical cores so i don`t think it matters here anyways.
With each crime and every kindness we birth our future.
ID: 1894092 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894200 - Posted: 9 Oct 2017, 1:11:45 UTC - in response to Message 1894089.  

Here are the DOE results I have collected per your suggestion:

DOE Results

Looks like the parameters have no effect on processing speed in this test case. Full arguments are as follows:
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 1024 -period_iterations_num 1 -tt 500 -no_defaults_scaling -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf -no_use_sleep


Run a offline bench using bench.cfg with following parm sequence.
Its in Knabench main folder.
add those lines.

MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 10
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 30
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 45
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 60
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 75
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 90
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -device 0 -tt 180

Replace correct app version off course.
So you can see if -tt values can benefit on your setup.

YouTube Channel: Rick's Performance Computing
ID: 1894200 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894202 - Posted: 9 Oct 2017, 1:13:37 UTC
Last modified: 9 Oct 2017, 1:22:26 UTC

I have rebooted the system and the error mentioned is now gone. I suspect it was caused by work I was doing with bench testing. Additional info is that CPU is used only for LHC and GPU for SETI.
YouTube Channel: Rick's Performance Computing
ID: 1894202 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894220 - Posted: 9 Oct 2017, 2:39:32 UTC

Here are the DOE results. Original embedded image didn't work.

YouTube Channel: Rick's Performance Computing
ID: 1894220 · Report as offensive     Reply Quote
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30604
Credit: 57,614,045
RAC: 30,334
Germany
Message 1894247 - Posted: 9 Oct 2017, 10:41:28 UTC
Last modified: 9 Oct 2017, 10:56:33 UTC

What task did you use ?
Did you use any other app args besides -tt and -sbs ?

On this bench it looks like kernel target time has no effect on your card.
All best results are within one second.
Have you deleted cached binaries before the bench ?
With each crime and every kindness we birth our future.
ID: 1894247 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894499 - Posted: 10 Oct 2017, 11:32:55 UTC - in response to Message 1894247.  

What task did you use ?
Did you use any other app args besides -tt and -sbs ?

On this bench it looks like kernel target time has no effect on your card.
All best results are within one second.
Have you deleted cached binaries before the bench ?


I am using this task: blc4_2bit_guppi_57424_82736_HIP9598_0011.19766.416.18.27.99.vlar. It is the older smaller version.
I am using r3584 with the following arguments:
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3584.exe -v 1 -instances_per_device 1 -sbs 512 -period_iterations_num 1 -tt 300 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -high_perf


I did not delete cached binaries before the bench. I thought it would just use the binary in the science app directory. Let me know the implications and if I should approach bench testing differently. Thanks!
YouTube Channel: Rick's Performance Computing
ID: 1894499 · Report as offensive     Reply Quote
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30604
Credit: 57,614,045
RAC: 30,334
Germany
Message 1894509 - Posted: 10 Oct 2017, 12:13:12 UTC
Last modified: 10 Oct 2017, 12:16:04 UTC

First of all on each new bench you should delete the cached binaries.
Also don`t use any other app args except the one you want to test.
A fix -sbs value is possible, i suggest -sbs 384.
Only this way you can be sure to get clean results.

Next step would be to use best values from test 1 with app args you are using live.

Beware: you have -tt 300 in your comandline so test was senseless because comandline overrides preselected values.
With each crime and every kindness we birth our future.
ID: 1894509 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894511 - Posted: 10 Oct 2017, 12:22:29 UTC - in response to Message 1894509.  

First of all on each new bench you should delete the cached binaries.
Also don`t use any other app args except the one you want to test.
A fix -sbs value is possible, i suggest -sbs 384.
Only this way you can be sure to get clean results.

Next step would be to use best values from test 1 with app args you are using live.

Beware: you have -tt 300 in your comandline so test was senseless because comandline overrides preselected values.

This was just one sample command line from BenchCFG. I have modified the arguments in each entry of the BenchCFG to execute the DOE. In the Testdatas log file, I can see each execution using the arguments from the BenchCFG file. Can I email it over to you to check if it is working as expected?
YouTube Channel: Rick's Performance Computing
ID: 1894511 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894566 - Posted: 11 Oct 2017, 4:24:39 UTC

Hi Mike, I found the problem. The script was using my arguments from the config file plus those specified in the command line txt file, so all runs got the same arguments. I am trying it again. Thanks for getting me on the right track.
YouTube Channel: Rick's Performance Computing
ID: 1894566 · Report as offensive     Reply Quote
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30604
Credit: 57,614,045
RAC: 30,334
Germany
Message 1894581 - Posted: 11 Oct 2017, 8:12:40 UTC - in response to Message 1894511.  

First of all on each new bench you should delete the cached binaries.
Also don`t use any other app args except the one you want to test.
A fix -sbs value is possible, i suggest -sbs 384.
Only this way you can be sure to get clean results.

Next step would be to use best values from test 1 with app args you are using live.

Beware: you have -tt 300 in your comandline so test was senseless because comandline overrides preselected values.

This was just one sample command line from BenchCFG. I have modified the arguments in each entry of the BenchCFG to execute the DOE. In the Testdatas log file, I can see each execution using the arguments from the BenchCFG file. Can I email it over to you to check if it is working as expected?


Sure i will PM you my email.
With each crime and every kindness we birth our future.
ID: 1894581 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894588 - Posted: 11 Oct 2017, 10:05:32 UTC
Last modified: 11 Oct 2017, 10:16:07 UTC

I found an issue in my setup of the MB_bench_213 script and have corrected it for this run of the tt vs. sbs DOE. Here are the updated results using r3584 on a VegaFE running an older guppi:


YouTube Channel: Rick's Performance Computing
ID: 1894588 · Report as offensive     Reply Quote
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30604
Credit: 57,614,045
RAC: 30,334
Germany
Message 1894591 - Posted: 11 Oct 2017, 10:16:50 UTC

Very interesting Rick.
As you can see benefit from bigger -sbs value is higher than from kernel target time.
From 512 to 2048 its almost 25 seconds.
I`m surprised -sbs 2048 gives so much better performance on your GPU.
Probably the HBM memory.
Now to test best -tt settings with additional app args.
With each crime and every kindness we birth our future.
ID: 1894591 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,254,183
RAC: 236,406
Taiwan
Message 1894592 - Posted: 11 Oct 2017, 10:18:10 UTC - in response to Message 1894591.  

Very interesting Rick.
As you can see benefit from bigger -sbs value is higher than from kernel target time.
From 512 to 2048 its almost 25 seconds.
I`m surprised -sbs 2048 gives so much better performance on your GPU.
Probably the HBM memory.
Now to test best -tt settings with additional app args.


Let me know of any recommendations. I plan to rerun this case with a newer, longer guppi.
YouTube Channel: Rick's Performance Computing
ID: 1894592 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Postponed: Waiting to acquire lock


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.