OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 21 · Next

AuthorMessage
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8923
Credit: 49,849,242
RAC: 65
Sweden
Message 1767248 - Posted: 24 Feb 2016, 11:10:09 UTC - in response to Message 1767247.  

Well, -period_iterations_num N divides single call to N separate calls so should "make life more easy" for GPU.

With -sbs N situation more complex.
It defines maximum size of memory buffer that will be used to unroll periods in PulseFind. The more unroll the more items can be used in single workgroup (especially important for low FFT values, that most often occur at VLAR tasks). But from other side too big unroll could decrease number of separate workgroups making some CUs sit idle. To investigate real configs deeper I added -v 8 switch in new builds (hope Eric will deploy them on Beta soon enough, maybe even today).

Running with -v 8 will print PulseFind configs into stderr so one can see how -sbs N affects PulseFind geometry and if currently used heuristic works OK or not.

New builds are available as standalone packs here: https://cloud.mail.ru/public/DMkN/x4BRCYuAV

Is there any big differences/changes in this new build, which would make it neccessary for me to upgrade from SoG Build 3366?
ID: 1767248 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1767249 - Posted: 24 Feb 2016, 12:03:50 UTC - in response to Message 1767248.  
Last modified: 24 Feb 2016, 12:04:13 UTC

Is there any big differences/changes in this new build, which would make it neccessary for me to upgrade from SoG Build 3366?


No. Attempt of better auto-handling of low-end devices and -v 8 option for helping in PulseFind tune.
ID: 1767249 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1767250 - Posted: 24 Feb 2016, 12:08:13 UTC - in response to Message 1767003.  
Last modified: 24 Feb 2016, 12:10:29 UTC


When I used -sbs 384 I got this: WARNING: can't open binary kernel file for oclFFT plan: C:\SETI\Data/projects/setiathome.berkeley.edu\MB_clFFTplan_GeForceGTX780_1024_gr256_lr16_wg256_tw0_ls512_bn16_cw16_r3366.bin_35330, continue with recompile...

Workunit 2069914747,
http://setiathome.berkeley.edu/result.php?resultid=4744742557

I have the Stderr output file saved. If you need it just ask. All my slots had this warning in the stderr.txt files. Changing -sbs to either 256 or 192 stopped the warnings. I'm using -sbs 192 for now.


"Together with" doesn't mean "because of".
-sbs 384 was one of your first tasks with this app so binary cache of kernels formed.
When you switched to another sbs value binary cache was already formed so no new warnings received.

mb_cmdline_win_x86_SSE3_OpenCL_NV.txt “current”
-no_cpu_lock -sbs 192 -instances_per_device 3 -period_iterations_num 40 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16

Looks OK. NV build doesn't have CPUlock enabled by default so -no_cpu_lock excessive in this case but should do no harm.
ID: 1767250 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8923
Credit: 49,849,242
RAC: 65
Sweden
Message 1767279 - Posted: 24 Feb 2016, 16:00:59 UTC - in response to Message 1767249.  
Last modified: 24 Feb 2016, 16:01:11 UTC

Is there any big differences/changes in this new build, which would make it neccessary for me to upgrade from SoG Build 3366?


No. Attempt of better auto-handling of low-end devices and -v 8 option for helping in PulseFind tune.

Thank you Raistmer. Then I'll just continue as if nothing happened. :-)
ID: 1767279 · Report as offensive
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1767308 - Posted: 24 Feb 2016, 18:53:40 UTC - in response to Message 1767247.  
Last modified: 24 Feb 2016, 18:53:52 UTC

Thank you Raistmer.
It really seems there is no fixed rule to get the right recipe.
I will then experiment a bit more to find the right balance.!

Thank you again!

Stefano
ID: 1767308 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1767335 - Posted: 24 Feb 2016, 21:45:45 UTC

Raistmer,
Thanks so much for all the great work and support that you do for SETI.


NV build doesn't have CPUlock enabled by default so -no_cpu_lock excessive in this case but should do no harm.


I removed the -no_cpu_lock from the file.

Thanks
Joe

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1767335 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1767408 - Posted: 25 Feb 2016, 4:44:55 UTC - in response to Message 1767247.  

Well, -period_iterations_num N divides single call to N separate calls so should "make life more easy" for GPU.

With -sbs N situation more complex.
It defines maximum size of memory buffer that will be used to unroll periods in PulseFind. The more unroll the more items can be used in single workgroup (especially important for low FFT values, that most often occur at VLAR tasks). But from other side too big unroll could decrease number of separate workgroups making some CUs sit idle. To investigate real configs deeper I added -v 8 switch in new builds (hope Eric will deploy them on Beta soon enough, maybe even today).

Running with -v 8 will print PulseFind configs into stderr so one can see how -sbs N affects PulseFind geometry and if currently used heuristic works OK or not.

New builds are available as standalone packs here: https://cloud.mail.ru/public/DMkN/x4BRCYuAV


FYI
I ran with the -v 8 switch. Higher CPU usage with and without the switch. Here are some Wu's before I turned off the -v 8 switch. I think I will go back to Rev. #3366 if I can't tune Rev.#3381. I hope the data will give you some good info, because this is over my head. As always, I vary grateful for any help.

Thanks
Joe

http://setiathome.berkeley.edu/workunit.php?wuid=2074646207
http://setiathome.berkeley.edu/workunit.php?wuid=2074646670
http://setiathome.berkeley.edu/workunit.php?wuid=2074646844

without switch:

http://setiathome.berkeley.edu/workunit.php?wuid=2074646604
http://setiathome.berkeley.edu/workunit.php?wuid=2074646641
http://setiathome.berkeley.edu/workunit.php?wuid=2074646616

mb_cmdline_win_x86_SSE3_OpenCL_NV.txt
-sbs 256 -instances_per_device 3 -period_iterations_num 40 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16 -v 8






17mr10af.25361.3343.4.31.218_1
http://setiathome.berkeley.edu/result.php?resultid=4751543823

Stderr output

<core_client_version>7.4.42</core_client_version>
<![CDATA[
<stderr_txt>
, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=2MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=240 NDRange={128,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=68, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 3; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2.7MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 4; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=2MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_partial_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_partial_kernel_cl, pass 5; PulsePoTLen=960 NDRange={32,17,3}, WG={32,1,1},single_period_size=1.6MB, WG num=51, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 3; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 3; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2.7MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 4; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 4; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=2MB, WG num=34, CU num=16
WARNING: suboptimal config for PC_find_pulse_kernel_cl pass 5; WG size doesn't fill wave
starting PC_find_pulse_kernel_cl, pass 5; PulsePoTLen=480 NDRange={64,17,1}, WG={32,1,1},single_period_size=1.6MB, WG num=34, CU num=16
WARNING: suboptimal config f

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1767408 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1767410 - Posted: 25 Feb 2016, 4:53:22 UTC - in response to Message 1767408.  
Last modified: 25 Feb 2016, 5:09:38 UTC

Joe, Is there a new kernal in this version?

Edit..

Raistmer

What the difference between your nonSOG version and the SoG version?
ID: 1767410 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1767416 - Posted: 25 Feb 2016, 5:29:03 UTC - in response to Message 1767410.  

Joe, Is there a new kernal in this version?

Edit..

Raistmer

What the difference between your nonSOG version and the SoG version?


Zalster,
Thanks for pointing that out, it should be MultiBeam_Kernels_r3381.cl. I'll update my file.
Joe

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1767416 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1767432 - Posted: 25 Feb 2016, 6:27:02 UTC - in response to Message 1767416.  

The right Kernel (MultiBeam_Kernels_r3381.cl) seem to lower my CPU usage:)
I think I'll do the -v 8 switch test over.

Thanks again Zalster.
Joe

app_info.xml

<app_info>
<app>
<name>setiathome_v7</name>
</app>
<file_info>
<name>AKv8c_r2549_winx86-64_AVXxjfs.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x64.dll</name>
<executable/>
</file_info>
<file_info>
<name>cmdline_AKv8c_r2549_winx86-64_AVXxjfs.txt</name>
</file_info>
<app_version>
<app_name>setiathome_v7</app_name>
<version_num>700</version_num>
<platform>windows_intelx86</platform>
<file_ref>
<file_name>AKv8c_r2549_winx86-64_AVXxjfs.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>cmdline_AKv8c_r2549_winx86-64_AVXxjfs.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>setiathome_v7</app_name>
<version_num>700</version_num>
<platform>windows_x86_64</platform>
<file_ref>
<file_name>AKv8c_r2549_winx86-64_AVXxjfs.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>cmdline_AKv8c_r2549_winx86-64_AVXxjfs.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
<app>
<name>astropulse_v7</name>
</app>
<file_info>
<name>AP7_win_x64_AVX_CPU_r2692.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x64.dll</name>
<executable/>
</file_info>
<file_info>
<name>ap_cmdline_win_x64_AVX_CPU.txt</name>
</file_info>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>703</version_num>
<platform>windows_x86_64</platform>
<plan_class>sse2</plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>703</version_num>
<platform>windows_intelx86</platform>
<plan_class>sse2</plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>703</version_num>
<platform>windows_x86_64</platform>
<plan_class>sse</plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>703</version_num>
<platform>windows_intelx86</platform>
<plan_class>sse</plan_class>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>701</version_num>
<platform>windows_x86_64</platform>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>701</version_num>
<platform>windows_intelx86</platform>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>700</version_num>
<platform>windows_x86_64</platform>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>700</version_num>
<platform>windows_intelx86</platform>
<cmdline></cmdline>
<file_ref>
<file_name>AP7_win_x64_AVX_CPU_r2692.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x64_AVX_CPU.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app>
<name>astropulse_v7</name>
</app>
<file_info>
<name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x86.dll</name>
<executable/>
</file_info>
<file_info>
<name>AstroPulse_Kernels_r2887.cl</name>
</file_info>
<file_info>
<name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</name>
</file_info>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>705</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>cuda_opencl_cc1</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2887.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>MB8_win_x64_AVX_VS2010_r3330.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x64.dll</name>
<executable/>
</file_info>
<file_info>
<name>mb_cmdline_win_x64_AVX_VS2010.txt</name>
</file_info>
<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_x86_64</platform>
<api_version>7.5.0</api_version>
<file_ref>
<file_name>MB8_win_x64_AVX_VS2010_r3330.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x64_AVX_VS2010.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_intelx86</platform>
<api_version>7.5.0</api_version>
<file_ref>
<file_name>MB8_win_x64_AVX_VS2010_r3330.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x64_AVX_VS2010.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>MB8_win_x86_SSE3_OpenCL_NV_r3381_SoG.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x86.dll</name>
<executable/>
</file_info>
<file_ref>
<file_name>AstroPulse_Kernels_r3381.cl</file_name>
</file_ref>

<file_info>
<name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</name>
</file_info>
<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_SoG</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>MB8_win_x86_SSE3_OpenCL_NV_r3381_SoG.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
</app_info>

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1767432 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1767435 - Posted: 25 Feb 2016, 6:31:27 UTC - in response to Message 1767416.  
Last modified: 25 Feb 2016, 6:37:39 UTC

Joe,

I'm seeing an interesting (or concerning) pattern developing. Let me know if you see it too

The good part first, the amount of CPU required is similar to Cuda to begin with.

Now the concerning part, after about 75% of the computation is done on a work unit, I begin to see an increase in the total amount of CPU cores used until the work unit ends.

After the work unit ends,the total CPU cores drop down again.

Not a big thing unless you start all the work units at the same time, and if they are similar in run time then they all finish about the time same, cause a high level of total CPU usage.

In mine after I restarted the work units and they finished the system went to 100% of all cores for about 3 minutes. As the work units finished and dropped off it back off to about 35% of all cores. This said sluggish and unresponsiveness to the computer.

Given time, each individual work unit will not finish with the others, so there will be a staggered completion and this spike in total CPU cores should lessen.

However, when first starting back up the system, there might certainly be lock ups for a few minutes. (example coming back from Tuesday maintenance once new work is downloaded)

If I could stagger the start times of the work units then this occurrence could be lessen, thou the law of averages would say eventually you will get several meeting their completion at the same time. (even a broke clock is right twice a day)

I'll keep an eye on them and see how it goes. Hopefully I won't get a lock up or crash.

APs seem to do ok with them.

Edit..

No sure yet but looks like they might be 2-3 minutes slower than 3366 build
ID: 1767435 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8923
Credit: 49,849,242
RAC: 65
Sweden
Message 1767439 - Posted: 25 Feb 2016, 6:44:48 UTC
Last modified: 25 Feb 2016, 6:58:13 UTC

Heh, it took over 6000 valid WU's before I got the first invalid with SoG. However I doubt it is a real invalid, just a difference between the apps, when it comes to the order they find the signals:

http://setiathome.berkeley.edu/workunit.php?wuid=2073365509


Onwards and upwards :-)
ID: 1767439 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1767441 - Posted: 25 Feb 2016, 6:51:03 UTC - in response to Message 1767408.  


I ran with the -v 8 switch. Higher CPU usage with and without the switch.


Yes, cause -v 8 means more verbose output it slowdowns processing (and increase CPU usage cause CPU responsible for processing fprintf() ).
This option only for debugging and tuning -sbs N value. When tuning ends it should be switched off/removed.


Here are some Wu's before I turned off the -v 8 switch. ...
I hope the data will give you some good info, because this is over my head.

It did, thanks. More work required in this area and your logs show that.
ID: 1767441 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1767443 - Posted: 25 Feb 2016, 6:55:53 UTC - in response to Message 1767410.  


Raistmer

What the difference between your nonSOG version and the SoG version?

The difference just as before - SoG uses SoG path, non-SoG - doesn't.
The reason I still provide both - to make speed comparison on wider number of hosts (on beta) before release. For now it seems SoG faster, especially on NV, but I just want be more sure. Maybe on some configs SoG slower then we need to find such configs and devise some plan class that would exclude such hosts from app distribution. Better to do that before release if possible cause BOINC's natural selection mechanism doesn't work right sometimes.
ID: 1767443 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1767446 - Posted: 25 Feb 2016, 7:04:42 UTC - in response to Message 1767435.  



Now the concerning part, after about 75% of the computation is done on a work unit, I begin to see an increase in the total amount of CPU cores used until the work unit ends.
....
In mine after I restarted the work units and they finished the system went to 100% of all cores for about 3 minutes.


There is some uncertainty when CPU usage increase? "they finished" and CPU load increases could mean that increase happens on new task start, not on its end. On task start indeed there is some CPU-only amount of time when app's state initialization occurs.
But as task progresses (and this holds true after 75% of progress) the less and less non-SoG searches constitute workload (and that was the reason why SoG path so non-linear in its % done reporting) so SoG app the less and less communicate with CPU. Usually that's the time when CPU load could drop in SoG build a lot against CPU usage of non-SoG version.
But you report CPU usage increase in that area? Worth to check that more carefully - how CPU usage increase correlates with task progress stages. Cause it would be valuable info.
ID: 1767446 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1767449 - Posted: 25 Feb 2016, 7:10:54 UTC - in response to Message 1767439.  
Last modified: 25 Feb 2016, 7:19:15 UTC

Heh, it took over 6000 valid WU's before I got the first invalid with SoG. However I doubt it is a real invalid, just a difference between the apps, when it comes to the order they find the signals:

http://setiathome.berkeley.edu/workunit.php?wuid=2073365509


Onwards and upwards :-)


To preserve available info on this case:

SoG:
Pulse: peak=3.108338, time=13.45, period=0.9503, d_freq=1420006799.68, score=1.022, chirp=-3.5473, fft_len=512
Pulse: peak=6.418058, time=73.96, period=2.602, d_freq=1420005226.29, score=1.009, chirp=5.6759, fft_len=128
Pulse: peak=5.558758, time=80.68, period=1.733, d_freq=1420005264.46, score=1.021, chirp=5.6759, fft_len=128
Pulse: peak=5.660017, time=80.68, period=1.737, d_freq=1420005264.45, score=1.04, chirp=6.6214, fft_len=128
Pulse: peak=5.608606, time=80.68, period=1.733, d_freq=1420005264.51, score=1.03, chirp=7.5678, fft_len=128
Pulse: peak=5.448968, time=100.7, period=1.783, d_freq=1420005341.76, score=1, chirp=-7.5678, fft_len=128
Pulse: peak=5.766086, time=100.7, period=1.783, d_freq=1420005342.35, score=1.058, chirp=-11.352, fft_len=128
Pulse: peak=5.745099, time=73.96, period=1.786, d_freq=1420005213.07, score=1.054, chirp=-15.135, fft_len=128
Pulse: peak=5.683627, time=100.7, period=1.783, d_freq=1420005343.04, score=1.043, chirp=-15.135, fft_len=128
Pulse: peak=5.83024, time=73.96, period=1.786, d_freq=1420005219.37, score=1.07, chirp=-16.081, fft_len=128
Pulse: peak=6.33339, time=73.96, period=2.317, d_freq=1420005226.54, score=1, chirp=17.027, fft_len=128
Pulse: peak=10.06626, time=80.68, period=3.493, d_freq=1420005341.03, score=1.04, chirp=17.027, fft_len=128
Pulse: peak=5.553938, time=73.96, period=1.786, d_freq=1420005225.73, score=1.019, chirp=-17.027, fft_len=128
Pulse: peak=6.593595, time=73.96, period=2.317, d_freq=1420005220.24, score=1.041, chirp=17.973, fft_len=128
Pulse: peak=10.49988, time=80.68, period=3.493, d_freq=1420005341.09, score=1.085, chirp=17.973, fft_len=128
Pulse: peak=6.643026, time=73.96, period=2.382, d_freq=1420005213.87, score=1.048, chirp=18.919, fft_len=128
Pulse: peak=10.59672, time=80.68, period=3.493, d_freq=1420005341.08, score=1.095, chirp=18.919, fft_len=128
Pulse: peak=10.10738, time=94.13, period=4.175, d_freq=1420005303.12, score=1.037, chirp=19.865, fft_len=128
Pulse: peak=6.532632, time=73.96, period=2.382, d_freq=1420005207.58, score=1.031, chirp=19.865, fft_len=128
Pulse: peak=6.212343, time=80.68, period=1.747, d_freq=1420005341.15, score=1.141, chirp=19.865, fft_len=128
Pulse: peak=6.00212, time=80.68, period=1.747, d_freq=1420005341.14, score=1.102, chirp=20.811, fft_len=128
Pulse: peak=5.650387, time=80.68, period=1.747, d_freq=1420005341.21, score=1.038, chirp=21.757, fft_len=128
Spike: peak=25.8012, time=56.41, d_freq=1420005916.71, chirp=-0.014788, fft_len=4k
Spike: peak=25.89606, time=56.41, d_freq=1420005916.83, chirp=0.029576, fft_len=4k
Spike: peak=25.35256, time=56.41, d_freq=1420005916.6, chirp=-0.059153, fft_len=4k
Spike: peak=25.63649, time=56.41, d_freq=1420005916.95, chirp=0.073941, fft_len=4k
Spike: peak=24.56402, time=56.41, d_freq=1420005916.48, chirp=-0.10352, fft_len=4k
Spike: peak=25.03482, time=56.41, d_freq=1420005917.07, chirp=0.11831, fft_len=4k
Spike: peak=24.11559, time=56.41, d_freq=1420005917.19, chirp=0.16267, fft_len=4k
Spike: peak=24.89538, time=56.41, d_freq=1420005917.07, chirp=-0.17746, fft_len=4k
OpenCL queue synchronized
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.

Best spike: peak=27.01501, time=60.4, d_freq=1420013971.7, chirp=-11.073, fft_len=128k
Best autocorr: peak=16.52522, time=100.7, delay=0.025293, d_freq=1420010420.34, chirp=6.504, fft_len=128k
Best gaussian: peak=3.146213, mean=0.6692209, ChiSq=1.396932, time=39.43, d_freq=1420007177.9,
score=-7.616322, null_hyp=1.8053, chirp=-0.0073941, fft_len=16k
Best pulse: peak=6.25204, time=80.68, period=1.747, d_freq=1420005341.08, score=1.148, chirp=18.919, fft_len=128
Best triplet: peak=0, time=-2.121e+011, period=0, d_freq=0, chirp=0, fft_len=0


Flopcounter: 1869834161008.562700

Spike count: 8
Autocorr count: 0
Pulse count: 22
Triplet count: 0
Gaussian count: 0

CPU:
Spike count: 30
Autocorr count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 0

CUDA:
Spike count: 30
Autocorr count: 0
Pulse count: 0
Triplet count: 0
Gaussian count: 0

Unfortunately 2 other apps provide too little info about found signals inside log. And task definitely was purged already so offline re-run impossible.
I added more verbose signal output in stock too but perhaps it appears in live app not too soon - only when current CPU stock build will be replaced...

And yes, this discrepancy most probably caused by different processing order.
Though I tried to emulate serial processing order time to time such discrepancies will happen until validator will be changed to deal with overflows in more correct way. Overflowed result is just subset of all signals and selection one or another such subset quite arbitrary.

Good that such invalids rate low enough not to pose any real problem for SoG even with current validator implementation.
ID: 1767449 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1767451 - Posted: 25 Feb 2016, 7:24:16 UTC - in response to Message 1767446.  


There is some uncertainty when CPU usage increase? "they finished" and CPU load increases could mean that increase happens on new task start, not on its end. On task start indeed there is some CPU-only amount of time when app's state initialization occurs.
But as task progresses (and this holds true after 75% of progress) the less and less non-SoG searches constitute workload (and that was the reason why SoG path so non-linear in its % done reporting) so SoG app the less and less communicate with CPU. Usually that's the time when CPU load could drop in SoG build a lot against CPU usage of non-SoG version.
But you report CPU usage increase in that area? Worth to check that more carefully - how CPU usage increase correlates with task progress stages. Cause it would be valuable info.



The very first time it happen was just after I installed the new build.

All work units on 4 GPUs started at exactly the same time.

When the first units approached 60% complete I began to see the complete total of CPU cores increasing. When they approached mid 80 complete the system was a 96% of all cores. As they passed 90% complete, 100% of all core were used. This lasted about 3 minutes roughly.

Currently, I have all the work units spread out evenly from 1% up to 76% and the total CPU work is at 69%.

I think alot of that has to do with all the work units being spread out with different percentage to complete.

On the rare occasion that I get several work units with near completion times, I begin to see total CPU increase but not to the level when I first install.
ID: 1767451 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1767458 - Posted: 25 Feb 2016, 8:11:56 UTC - in response to Message 1767451.  
Last modified: 25 Feb 2016, 8:12:53 UTC

Next time you will see such coherence please record AR of involved tasks also.
What if you load same number of tasks per GPU in offline bench on just single GPU let other 3 remains idle? Do you see same CPU behavior?

It's important observation cause can indicate possible issue with SoG on some systems.
With ATi card I saw CPU usage increase in SoG (just opposite "usual" NV response) all of that increase was system one. It appeared that ATi OpenCL runtime switches to busy-wait after some amount of time. SoG build allows very long periods of decoupling between CPU and GPU and such long period triggered busy-wait loop inside ATi Runtime. The solution was to force (unneeded!) synching between CPU and GPU just to not allow Runtime to get into busy-wait loop. And CPU usage dropped back indeed. Of course such senseless arbitrary synching just decrease app performance, but it allows to save CPU so total performance win possible (provided we can't improve badly implemented Runtime per se). Is it case with NV OpenCL runtime too? It uses busy-wait even with frequent synching and before your report it seems that (opposite to ATi's one) long periods of decoupling switched it out of busy-wait hence CPU usage drops a lot for VHARs. That's why important to record with what ARs do you see CPU load increase and to re-produce this in offline benchmark.
ID: 1767458 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1767485 - Posted: 25 Feb 2016, 13:12:35 UTC - in response to Message 1767458.  

Here's a link to my computer that has been running SoG since you posted it over on main. Basically see the same as everyone else. I only run 1 wu at a time because GPU utilization is at 96-98% all the time. Shorties use practically no CPU and the CPU usage goes up to more or less equal the runtime for any of the .44 AR or below ranges. Overall though throughput for the machine has increased.

Do you need to look at the ATI varient on a wider range of cards? I've got a good variety but they are all in Macs so I'd have to see about convincing Xcode to spit something useful out of your sources in the repository unless there is already a Mac version out there somewhere.

http://setiathome.berkeley.edu/results.php?hostid=7251681

Thanks,

Chris
ID: 1767485 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1767489 - Posted: 25 Feb 2016, 13:47:58 UTC - in response to Message 1767485.  


Do you need to look at the ATI varient on a wider range of cards? I've got a good variety but they are all in Macs so I'd have to see about convincing Xcode to spit something useful out of your sources in the repository unless there is already a Mac version out there somewhere.


Yes, that wold be interesting. Especially in comparison with corresponding non-SoG ones.
AFAIK TBar and Urs did some ATi builds for OS X. Worth to look at them (perhaps non-SoG). Maybe they could do SoG ones too (-D SIGNALS_ON_GPU in compiler options)...
ID: 1767489 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 21 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.