Astropulse v7 on ATi, with max workgroup size 128, not running

Message boards : AstroPulse : Astropulse v7 on ATi, with max workgroup size 128, not running
Message board moderation

To post messages, you must log in.

AuthorMessage
HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Dec 09
Posts: 74
Credit: 1,248,766
RAC: 0
United States
Message 51873 - Posted: 11 Aug 2014, 22:45:28 UTC

The HD6370m in my notebook With Cat 14.4 does not seem to care for the AP v7 app.

It is restarting ever 4-5 seconds with the same error.

Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used
Used GPU device parameters are:
	Number of compute units: 2
	Single buffer allocation size: 256MB
	Total device global memory: 1024MB
	max WG size: 128
	-unroll default value used: 2
	-ffa_block default value used: 512
	-ffa_block_fetch default value used: 256

Build features: Non-graphics	BLANKIT	OpenCL	TWIN_FFA	OCL_ZERO_COPY	COMBINED_DECHIRP_KERNEL	FFTW	USE_INCREASED_PRECISION	USE_SSE2	x86	
     CPUID: Intel(R) Core(TM) i3 CPU       M 390  @ 2.67GHz 

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 
AstroPulse v7 Windows x86 rev 2601, V7 match, by Raistmer with support of Lunatics.kwsn.net team.	SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mods by Joe Segur
SSE3 dechirping by JDWhale
Combined dechirp kernel by Frizz
Number of OpenCL platforms:				 1


 OpenCL Platform Name:					 AMD Accelerated Parallel Processing
Number of devices:				 1
  Max compute units:				 2
  Max work group size:				 128
  Max clock frequency:				 750Mhz
  Max memory allocation:			 536870912
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 1073741824
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 Cedar
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1445.5 (VM)
  Version:					 OpenCL 1.2 AMD-APP (1445.5)
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event 


state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
WARNING: can't open binary kernel file for oclFFT plan: S:\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_Cedar_32768_gr64_lr8_wg256_tw0_r2601.bin_14455VM, continue with recompile...
ERROR: clFFT_CreatePlan failed: -46


Modifying the ap_cmdline_7.02_windows_intelx86__opencl_ati_100.txt with parameters such as [/b]-tune 1 32 4 1[/b] does not change the error or warning, but does add the TUNE: line to the output.
TUNE: kernel 1 now has workgroup size of (32,4,1)

I would have expected the _wg256_ potion to change when modifying with the tune setting.
ID: 51873 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51874 - Posted: 11 Aug 2014, 22:49:36 UTC - in response to Message 51873.  

What -oclFFT 64 8 128 gives?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51874 · Report as offensive
HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Dec 09
Posts: 74
Credit: 1,248,766
RAC: 0
United States
Message 51875 - Posted: 11 Aug 2014, 23:04:14 UTC
Last modified: 12 Aug 2014, 0:01:33 UTC

What -oclFFT 64 8 128 gives?


Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used
Used GPU device parameters are:
	Number of compute units: 2
	Single buffer allocation size: 256MB
	Total device global memory: 1024MB
	max WG size: 128
	-unroll default value used: 2
	-ffa_block default value used: 512
	-ffa_block_fetch default value used: 256

Build features: Non-graphics	BLANKIT	OpenCL	TWIN_FFA	OCL_ZERO_COPY	COMBINED_DECHIRP_KERNEL	FFTW	USE_INCREASED_PRECISION	USE_SSE2	x86	
     CPUID: Intel(R) Core(TM) i3 CPU       M 390  @ 2.67GHz 

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 
AstroPulse v7 Windows x86 rev 2601, V7 match, by Raistmer with support of Lunatics.kwsn.net team.	SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mods by Joe Segur
SSE3 dechirping by JDWhale
Combined dechirp kernel by Frizz
Number of OpenCL platforms:				 1


 OpenCL Platform Name:					 AMD Accelerated Parallel Processing
Number of devices:				 1
  Max compute units:				 2
  Max work group size:				 128
  Max clock frequency:				 750Mhz
  Max memory allocation:			 536870912
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 1073741824
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 Cedar
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1445.5 (VM)
  Version:					 OpenCL 1.2 AMD-APP (1445.5)
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event 


state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
WARNING: can't open binary kernel file for oclFFT plan: S:\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_Cedar_32768_gr64_lr8_wg256_tw0_r2601.bin_14455VM, continue with recompile...
ERROR: clFFT_CreatePlan failed: -46
ID: 51875 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51881 - Posted: 12 Aug 2014, 6:57:59 UTC - in response to Message 51875.  

nope, option inactive. There should be indication of option in the very beginning of stderr output.

Sorry, I named it little different than in code :)

-oclFFT_plan A B C : to override defaults for FFT 32k plan generation. Read oclFFT code and explanations in comments before any tweaking.
AP_clFFTplan* binary cache should be deleted after change in this option.
A - global radix
B - local radix
C - max size of workgroup used by oclFFT kernel generation algorithm
Usage example: -oclFFT_plan 64 8 256 (this corresponds to old defaults);
-oclFFT_plan 0 0 0 (this effectively means this option not used, hardwired defaults in play).


Please try again.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51881 · Report as offensive
HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Dec 09
Posts: 74
Credit: 1,248,766
RAC: 0
United States
Message 51885 - Posted: 12 Aug 2014, 12:26:47 UTC - in response to Message 51881.  

nope, option inactive. There should be indication of option in the very beginning of stderr output.

Sorry, I named it little different than in code :)

-oclFFT_plan A B C : to override defaults for FFT 32k plan generation. Read oclFFT code and explanations in comments before any tweaking.
AP_clFFTplan* binary cache should be deleted after change in this option.
A - global radix
B - local radix
C - max size of workgroup used by oclFFT kernel generation algorithm
Usage example: -oclFFT_plan 64 8 256 (this corresponds to old defaults);
-oclFFT_plan 0 0 0 (this effectively means this option not used, hardwired defaults in play).


Please try again.

OK it looks like it might be running. Now up to 00:04:00 and no restarting.

Output from stderr.txt looks like:

]pre]Running on device number: 0
oclFFT plan class overrides requested: global radix 64; local radix 8; max workgroup size 128
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used
Used GPU device parameters are:
Number of compute units: 2
Single buffer allocation size: 256MB
Total device global memory: 1024MB
max WG size: 128
-unroll default value used: 2
-ffa_block default value used: 512
-ffa_block_fetch default value used: 256

Build features: Non-graphics BLANKIT OpenCL TWIN_FFA OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86
CPUID: Intel(R) Core(TM) i3 CPU M 390 @ 2.67GHz

Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2
AstroPulse v7 Windows x86 rev 2601, V7 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mods by Joe Segur
SSE3 dechirping by JDWhale
Combined dechirp kernel by Frizz
Number of OpenCL platforms: 1


OpenCL Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Max compute units: 2
Max work group size: 128
Max clock frequency: 750Mhz
Max memory allocation: 536870912
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Queue properties:
Out-of-Order: No
Name: Cedar
Vendor: Advanced Micro Devices, Inc.
Driver version: 1445.5 (VM)
Version: OpenCL 1.2 AMD-APP (1445.5)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event


state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
WARNING: can't open binary kernel file for oclFFT plan: S:\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_Cedar_32768_gr64_lr8_wg128_tw0_r2601.bin_14455VM, continue with recompile...[/pre]
ID: 51885 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51888 - Posted: 12 Aug 2014, 16:03:23 UTC - in response to Message 51885.  

Fine, could you also try this build: https://www.dropbox.com/s/nl2ualzqzaf9acp/AP7_win_x86_SSE2_OpenCL_ATI_r2605.7zw/o any additional options, please. I need check that stock config can work on your device too.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51888 · Report as offensive
HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Dec 09
Posts: 74
Credit: 1,248,766
RAC: 0
United States
Message 51897 - Posted: 13 Aug 2014, 4:39:42 UTC - in response to Message 51888.  

Fine, could you also try this build: https://www.dropbox.com/s/nl2ualzqzaf9acp/AP7_win_x86_SSE2_OpenCL_ATI_r2605.7zw/o any additional options, please. I need check that stock config can work on your device too.

So far so good. Here is stderr while waiting for task to complete.

Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used
Used GPU device parameters are:
	Number of compute units: 2
	Single buffer allocation size: 256MB
	Total device global memory: 1024MB
	max WG size: 128
	-unroll default value used: 2
	-ffa_block default value used: 512
	-ffa_block_fetch default value used: 256

Build features: Non-graphics	BLANKIT	OpenCL	TWIN_FFA	OCL_ZERO_COPY	COMBINED_DECHIRP_KERNEL	FFTW	USE_INCREASED_PRECISION	USE_SSE2	x86	
     CPUID: Intel(R) Core(TM) i3 CPU       M 390  @ 2.67GHz 

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 
AstroPulse v7 Windows x86 rev 2605, V7 match, by Raistmer with support of Lunatics.kwsn.net team.	SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mods by Joe Segur
SSE3 dechirping by JDWhale
Combined dechirp kernel by Frizz
Built with uncommitted modifications
Number of OpenCL platforms:				 1


 OpenCL Platform Name:					 AMD Accelerated Parallel Processing
Number of devices:				 1
  Max compute units:				 2
  Max work group size:				 128
  Max clock frequency:				 750Mhz
  Max memory allocation:			 536870912
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 1073741824
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 Cedar
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1445.5 (VM)
  Version:					 OpenCL 1.2 AMD-APP (1445.5)
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event 


state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
INFO: can't open binary kernel file: S:\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2605.cl_Cedar.bin_V7_TWIN_FFA_14455VM, continue with recompile...
INFO: binary kernel file created
ID: 51897 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51898 - Posted: 13 Aug 2014, 6:26:10 UTC

Fine, thanks.
7.03 could be expected soon then.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51898 · Report as offensive
HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Dec 09
Posts: 74
Credit: 1,248,766
RAC: 0
United States
Message 51901 - Posted: 13 Aug 2014, 12:18:50 UTC

Finished & currently pending.
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17490657

Seems I always have troublesome hardware. All the way back from HD3850 & hybrid app development.
ID: 51901 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51902 - Posted: 13 Aug 2014, 12:48:24 UTC - in response to Message 51901.  

WG of 128 is big limitation indeed. Even my entry-level C-60 APU has WG of 256...
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51902 · Report as offensive

Message boards : AstroPulse : Astropulse v7 on ATi, with max workgroup size 128, not running


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.