oclFFT's -34 failure in last AP builds debugging

Message boards : AstroPulse : oclFFT's -34 failure in last AP builds debugging
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51659 - Posted: 20 Jul 2014, 22:41:30 UTC

Please try this debug build:
https://www.dropbox.com/s/6to8zalyo6silhc/AP7_win_x86_SSE2_OpenCL_ATI_r2559_debug.7z

It will generate very long stderr, i'm interesting in first few dozens lines + separately generated clFFT_dumpPLAN.txt file if any.

Example of stderr from my Loveland (offline run):

Not using ap_cmdline.txt-file, using commandline options.
02:31:44 (5840): Can't set up shared mem: -1. Will run in standalone mode.
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
GPU device # not found in init_data.xml
WARNING: BOINC was unable to find GPU device, using own enumeration
OpenCL platform detected: Advanced Micro Devices, Inc.
WARNING: BOINC supplied wrong platform!
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 0
WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities
call 'clGetDeviceIDs (second call)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 919
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
Used GPU device parameters are:
Number of compute units: 2
Single buffer allocation size: 167MB
Total device global memory: 669MB
max WG size: 256
-unroll default value used: 2
-ffa_block default value used: 512
-ffa_block_fetch default value used: 256

Build features: Non-graphics BLANKIT OpenCL TWIN_FFA OCL_VERBOSE OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86
CPUID: AMD C-60 APU with Radeon(tm) HD Graphics

Cache: L1=64K L2=512K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4A
AstroPulse v7 Windows x86 rev 2559, V7 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2

OpenCL version by Raistmer

oclFFT fix for ATI GPUs by Urs Echternacht
ffa threshold mods by Joe Segur
SSE3 dechirping by JDWhale
Combined dechirp kernel by Frizz
Built with uncommitted modifications
Number of OpenCL platforms: 1


OpenCL Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Max compute units: 2
Max work group size: 256
Max clock frequency: 275Mhz
Max memory allocation: 175374336
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 701497344
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Queue properties:
Out-of-Order: No
Name: Loveland
Vendor: Advanced Micro Devices, Inc.
Driver version: 1268.1 (VM)
Version: OpenCL 1.2 AMD-APP (1268.1)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_dx9_media_sharing cl_amd_image2d_from_buffer_read_only


state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
INFO: can't open binary kernel file: .\\AstroPulse_Kernels_r2559.cl_Loveland.bin_V7_TWIN_FFA_12681VM, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585
INFO: binary kernel file created
call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130
call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139
call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459
call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474
call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480
call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486
call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516
call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549
call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577
call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580
call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582
call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583
call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
WARNING: can't open binary kernel file for oclFFT plan: .\\AP_clFFTplan_Loveland_32768_r2559.bin_12681VM, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_setup.cpp near line 723
oclFFT context=6d1b58
AP main context=6d1b58
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
in ap oclFFT_1 ok.
in ap oclFFT_1 ok.
in ap oclFFT_1 ok.
call 'clFFT_ExecuteInterleaved_ap' is finished OK in file ..\..\ap_science.cpp near line 2680
call 'Setting kernel argument: dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 2117
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51659 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,612,930
RAC: 23,787
United States
Message 51663 - Posted: 21 Jul 2014, 0:53:26 UTC - in response to Message 51659.  
Last modified: 21 Jul 2014, 1:34:43 UTC

The nVidia app is working;
INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_GeForce8800GT.bin_V7_TWIN_FFA_26658, continue with recompile...
INFO: binary kernel file created
WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_GeForce8800GT_32768_r2559.bin_26658, continue with recompile...

The Two ATI Cards aren't, they both say;
Error in ap oclFFT_1: -34
ERROR: OpenCL kernel/call 'clFFT_ExecuteInterleaved_ap' call failed (-34) in file ..\..\ap_science.cpp near line 2680.
Waiting 30 sec before restart...


The first few lines of the 6770 say;
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 0, slots 0 to 0 (including) will be checked
Used slot is 0;	Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
Info: CPU affinity mask used: 1

clFFT_dumpPLAN.txt;
Run kernel fft0 with global dim = {4096*BatchSize}, local dim={128}
Run kernel fft1 with global dim = {4096*BatchSize}, local dim={128}
Run kernel fft2 with global dim = {4096*BatchSize}, local dim={256}


The first few lines of the 4670 say;
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 1, slots 1 to 1 (including) will be checked
Used slot is 1;	Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
Info: CPU affinity mask used: 2

clFFT_dumpPLAN.txt;
Run kernel fft0 with global dim = {4096*BatchSize}, local dim={32}
Run kernel fft1 with global dim = {4096*BatchSize}, local dim={32}
Run kernel fft2 with global dim = {4096*BatchSize}, local dim={32}
#ifndef M_PI
#define M_PI 0x1.921fb54442d18p+1
#endif
#define complexMul(a,b) ((float2)(mad(-(a).y, (b).y, (a).x * (b).x), mad((a).y, (b).x, (a).x * (b).y)))

#define cos_sinLUT1(res,dir,i,cossinLUT)\
{\
(res)=(float2)((cossinLUT)[i].x , (dir)*(cossinLUT)[i].y);\
}

#define cos_sinLUT2(res,dir,_i,_k,cossinLUT1,cossinLUT2) \
{   float _sin_1= (cossinLUT1)[_i].y;    \
    float _sin_2= (cossinLUT2)[_k].y;    \
    float _cos_1= (cossinLUT1)[_i].x;    \
    float _cos_2= (cossinLUT2)[_k].x;    \
    float _cos_res = _cos_1 * _cos_2 - _sin_1 * _sin_2; \
    float _sin_res = (dir) * (_sin_1 * _cos_2 + _cos_1 * _sin_2); \
    (res)=(float2)(_cos_res,_sin_res);    \
}

#define conj(a) ((float2)((a).x, -(a).y))
#define conjTransp(a) ((float2)(-(a).y, (a).x))

#define fftKernel2(a,dir) \
{ \
    float2 c = (a)[0];    \
    (a)[0] = c + (a)[1];  \
    (a)[1] = c - (a)[1];  \
}

#define fftKernel2S(d1,d2,dir) \
{ \
    float2 c = (d1);   \
    (d1) = c + (d2);   \
    (d2) = c - (d2);   \
}

#define fftKernel4(a,dir) \
{ \
    fftKernel2S((a)[0], (a)[2], dir); \
    fftKernel2S((a)[1], (a)[3], dir); \
    fftKernel2S((a)[0], (a)[1], dir); \
    (a)[3] = (float2)(dir)*(conjTransp((a)[3])); \
    fftKernel2S((a)[2], (a)[3], dir); \
    float2 c = (a)[1]; \
    (a)[1] = (a)[2]; \
    (a)[2] = c; \
}

#define fftKernel4s(a0,a1,a2,a3,dir) \
{ \
    fftKernel2S((a0), (a2), dir); \
    fftKernel2S((a1), (a3), dir); \
    fftKernel2S((a0), (a1), dir); \
    (a3) = (float2)(dir)*(conjTransp((a3))); \
    fftKernel2S((a2), (a3), dir); \
    float2 c = (a1); \
    (a1) = (a2); \
    (a2) = c; \
}

#define bitreverse8(a) \
{ \
    float2 c; \
    c = (a)[1]; \
    (a)[1] = (a)[4]; \
    (a)[4] = c; \
    c = (a)[3]; \
    (a)[3] = (a)[6]; \
    (a)[6] = c; \
}

#define fftKernel8(a,dir) \
{ \
	const float2 w1  = (float2)(0x1.6a09e6p-1f,  dir*0x1.6a09e6p-1f);  \
	const float2 w3  = (float2)(-0x1.6a09e6p-1f, dir*0x1.6a09e6p-1f);  \
	float2 c; \
	fftKernel2S((a)[0], (a)[4], dir); \
	fftKernel2S((a)[1], (a)[5], dir); \
	fftKernel2S((a)[2], (a)[6], dir); \
	fftKernel2S((a)[3], (a)[7], dir); \
	(a)[5] = complexMul(w1, (a)[5]); \
	(a)[6] = (float2)(dir)*(conjTransp((a)[6])); \
	(a)[7] = complexMul(w3, (a)[7]); \
	fftKernel2S((a)[0], (a)[2], dir); \
	fftKernel2S((a)[1], (a)[3], dir); \
	fftKernel2S((a)[4], (a)[6], dir); \
	fftKernel2S((a)[5], (a)[7], dir); \
	(a)[3] = (float2)(dir)*(conjTransp((a)[3])); \
	(a)[7] = (float2)(dir)*(conjTransp((a)[7])); \
	fftKernel2S((a)[0], (a)[1], dir); \
	fftKernel2S((a)[2], (a)[3], dir); \
	fftKernel2S((a)[4], (a)[5], dir); \
	fftKernel2S((a)[6], (a)[7], dir); \
	bitreverse8((a)); \
}

#define bitreverse4x4(a) \
{ \
	float2 c; \
	c = (a)[1];  (a)[1]  = (a)[4];  (a)[4]  = c; \
	c = (a)[2];  (a)[2]  = (a)[8];  (a)[8]  = c; \
	c = (a)[3];  (a)[3]  = (a)[12]; (a)[12] = c; \
	c = (a)[6];  (a)[6]  = (a)[9];  (a)[9]  = c; \
	c = (a)[7];  (a)[7]  = (a)[13]; (a)[13] = c; \
	c = (a)[11]; (a)[11] = (a)[14]; (a)[14] = c; \
}

#define fftKernel16(a,dir) \
{ \
    const float w0 = 0x1.d906bcp-1f; \
    const float w1 = 0x1.87de2ap-2f; \
    const float w2 = 0x1.6a09e6p-1f; \
    fftKernel4s((a)[0], (a)[4], (a)[8],  (a)[12], dir); \
    fftKernel4s((a)[1], (a)[5], (a)[9],  (a)[13], dir); \
    fftKernel4s((a)[2], (a)[6], (a)[10], (a)[14], dir); \
    fftKernel4s((a)[3], (a)[7], (a)[11], (a)[15], dir); \
    (a)[5]  = complexMul((a)[5], (float2)(w0, dir*w1)); \
    (a)[6]  = complexMul((a)[6], (float2)(w2, dir*w2)); \
    (a)[7]  = complexMul((a)[7], (float2)(w1, dir*w0)); \
    (a)[9]  = complexMul((a)[9], (float2)(w2, dir*w2)); \
    (a)[10] = (float2)(dir)*(conjTransp((a)[10])); \
    (a)[11] = complexMul((a)[11], (float2)(-w2, dir*w2)); \
    (a)[13] = complexMul((a)[13], (float2)(w1, dir*w0)); \
    (a)[14] = complexMul((a)[14], (float2)(-w2, dir*w2)); \
    (a)[15] = complexMul((a)[15], (float2)(-w0, dir*-w1)); \
    fftKernel4((a), dir); \
    fftKernel4((a) + 4, dir); \
    fftKernel4((a) + 8, dir); \
    fftKernel4((a) + 12, dir); \
    bitreverse4x4((a)); \
}

#define bitreverse32(a) \
{ \
    float2 c1, c2; \
    c1 = (a)[2];   (a)[2] = (a)[1];   c2 = (a)[4];   (a)[4] = c1;   c1 = (a)[8];   (a)[8] = c2;    c2 = (a)[16];  (a)[16] = c1;   (a)[1] = c2; \
    c1 = (a)[6];   (a)[6] = (a)[3];   c2 = (a)[12];  (a)[12] = c1;  c1 = (a)[24];  (a)[24] = c2;   c2 = (a)[17];  (a)[17] = c1;   (a)[3] = c2; \
    c1 = (a)[10];  (a)[10] = (a)[5];  c2 = (a)[20];  (a)[20] = c1;  c1 = (a)[9];   (a)[9] = c2;    c2 = (a)[18];  (a)[18] = c1;   (a)[5] = c2; \
    c1 = (a)[14];  (a)[14] = (a)[7];  c2 = (a)[28];  (a)[28] = c1;  c1 = (a)[25];  (a)[25] = c2;   c2 = (a)[19];  (a)[19] = c1;   (a)[7] = c2; \
    c1 = (a)[22];  (a)[22] = (a)[11]; c2 = (a)[13];  (a)[13] = c1;  c1 = (a)[26];  (a)[26] = c2;   c2 = (a)[21];  (a)[21] = c1;   (a)[11] = c2; \
    c1 = (a)[30];  (a)[30] = (a)[15]; c2 = (a)[29];  (a)[29] = c1;  c1 = (a)[27];  (a)[27] = c2;   c2 = (a)[23];  (a)[23] = c1;   (a)[15] = c2; \
}

#define fftKernel32(a,dir) \
{ \
    fftKernel2S((a)[0],  (a)[16], dir); \
    fftKernel2S((a)[1],  (a)[17], dir); \
    fftKernel2S((a)[2],  (a)[18], dir); \
    fftKernel2S((a)[3],  (a)[19], dir); \
    fftKernel2S((a)[4],  (a)[20], dir); \
    fftKernel2S((a)[5],  (a)[21], dir); \
    fftKernel2S((a)[6],  (a)[22], dir); \
    fftKernel2S((a)[7],  (a)[23], dir); \
    fftKernel2S((a)[8],  (a)[24], dir); \
    fftKernel2S((a)[9],  (a)[25], dir); \
    fftKernel2S((a)[10], (a)[26], dir); \
    fftKernel2S((a)[11], (a)[27], dir); \
    fftKernel2S((a)[12], (a)[28], dir); \
    fftKernel2S((a)[13], (a)[29], dir); \
    fftKernel2S((a)[14], (a)[30], dir); \
    fftKernel2S((a)[15], (a)[31], dir); \
    (a)[17] = complexMul((a)[17], (float2)(0x1.f6297cp-1f, dir*0x1.8f8b84p-3f)); \
    (a)[18] = complexMul((a)[18], (float2)(0x1.d906bcp-1f, dir*0x1.87de2ap-2f)); \
    (a)[19] = complexMul((a)[19], (float2)(0x1.a9b662p-1f, dir*0x1.1c73b4p-1f)); \
    (a)[20] = complexMul((a)[20], (float2)(0x1.6a09e6p-1f, dir*0x1.6a09e6p-1f)); \
    (a)[21] = complexMul((a)[21], (float2)(0x1.1c73b4p-1f, dir*0x1.a9b662p-1f)); \
    (a)[22] = complexMul((a)[22], (float2)(0x1.87de2ap-2f, dir*0x1.d906bcp-1f)); \
    (a)[23] = complexMul((a)[23], (float2)(0x1.8f8b84p-3f, dir*0x1.f6297cp-1f)); \
    (a)[24] = complexMul((a)[24], (float2)(0x0p+0f, dir*0x1p+0f)); \
    (a)[25] = complexMul((a)[25], (float2)(-0x1.8f8b84p-3f, dir*0x1.f6297cp-1f)); \
    (a)[26] = complexMul((a)[26], (float2)(-0x1.87de2ap-2f, dir*0x1.d906bcp-1f)); \
    (a)[27] = complexMul((a)[27], (float2)(-0x1.1c73b4p-1f, dir*0x1.a9b662p-1f)); \
    (a)[28] = complexMul((a)[28], (float2)(-0x1.6a09e6p-1f, dir*0x1.6a09e6p-1f)); \
    (a)[29] = complexMul((a)[29], (float2)(-0x1.a9b662p-1f, dir*0x1.1c73b4p-1f)); \
    (a)[30] = complexMul((a)[30], (float2)(-0x1.d906bcp-1f, dir*0x1.87de2ap-2f)); \
    (a)[31] = complexMul((a)[31], (float2)(-0x1.f6297cp-1f, dir*0x1.8f8b84p-3f)); \
    fftKernel16((a), dir); \
    fftKernel16((a) + 16, dir); \
    bitreverse32((a)); \
}

__kernel void \
clFFT_1DTwistInterleaved(__global float2 *in, unsigned int startRow, unsigned int numCols, unsigned int N, unsigned int numRowsToProcess, int dir) \
{ \
   float2 a, w; \
   float ang; \
   unsigned int j; \
	unsigned int i = get_global_id(0); \
	unsigned int startIndex = mad24(startRow, numCols, i); \
	 \
	if(i < numCols) \
	{ \
	    for(j = 0; j < numRowsToProcess; j++) \
	    { \
	        a = in[startIndex]; \
	        ang = 2.0f * M_PI * dir * i * (startRow + j) / N; \
	        w = (float2)(native_cos(ang), native_sin(ang)); \
	        a = complexMul(a, w); \
	        in[startIndex] = a; \
	        startIndex += numCols; \
	    } \
	}	 \
} \
__kernel void fft0(__global float2 *in, __global float2 *out, int dir, int S, __global float2 * cossinLUT1, __global float2 * cossinLUT2 )
{
    __local float sMem[260];
    int i, j, r, indexIn, indexOut, index, tid, bNum, xNum, k, l;
    int s, ii, jj, offset;
    float2 w;
    float ang, angf, ang1;
    __local float *lMemStore, *lMemLoad;
    float2 a[8];
    int lId = get_local_id( 0 );
    int groupId = get_group_id( 0 );
bNum = groupId & 127;
xNum = groupId >> 7;
indexIn = mul24(bNum, 4);
tid = indexIn;
i = tid >> 0;
j = tid & 0;
indexOut = mad24(i, 64, j);
indexIn += (xNum << 15);
indexOut += (xNum << 15);
tid = lId;
i = tid & 3;
j = tid >> 2;
indexIn += mad24(j, 512, i);
in += indexIn;
a[0] = in[0];
a[1] = in[4096];
a[2] = in[8192];
a[3] = in[12288];
a[4] = in[16384];
a[5] = in[20480];
a[6] = in[24576];
a[7] = in[28672];
fftKernel8(a, dir);
ang = dir*(0x1.921fb54442d18p-4) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[1] = complexMul(a[1], w);
ang = dir*(1.9634954631e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[2] = complexMul(a[2], w);
ang = dir*(2.9452431947e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[3] = complexMul(a[3], w);
ang = dir*(3.9269909263e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[4] = complexMul(a[4], w);
ang = dir*(4.9087386578e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[5] = complexMul(a[5], w);
ang = dir*(5.8904863894e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[6] = complexMul(a[6], w);
ang = dir*(6.8722341210e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[7] = complexMul(a[7], w);
indexIn = mad24(j, 32, i);
lMemStore = sMem + tid;
lMemLoad = sMem + indexIn;
lMemStore[0] = a[0].x;
lMemStore[32] = a[1].x;
lMemStore[64] = a[2].x;
lMemStore[96] = a[3].x;
lMemStore[128] = a[4].x;
lMemStore[160] = a[5].x;
lMemStore[192] = a[6].x;
lMemStore[224] = a[7].x;
barrier(CLK_LOCAL_MEM_FENCE);
a[0].x = lMemLoad[0];
a[1].x = lMemLoad[4];
a[2].x = lMemLoad[8];
a[3].x = lMemLoad[12];
a[4].x = lMemLoad[16];
a[5].x = lMemLoad[20];
a[6].x = lMemLoad[24];
a[7].x = lMemLoad[28];
barrier(CLK_LOCAL_MEM_FENCE);
lMemStore[0] = a[0].y;
lMemStore[32] = a[1].y;
lMemStore[64] = a[2].y;
lMemStore[96] = a[3].y;
lMemStore[128] = a[4].y;
lMemStore[160] = a[5].y;
lMemStore[192] = a[6].y;
lMemStore[224] = a[7].y;
barrier(CLK_LOCAL_MEM_FENCE);
a[0].y = lMemLoad[0];
a[1].y = lMemLoad[4];
a[2].y = lMemLoad[8];
a[3].y = lMemLoad[12];
a[4].y = lMemLoad[16];
a[5].y = lMemLoad[20];
a[6].y = lMemLoad[24];
a[7].y = lMemLoad[28];
barrier(CLK_LOCAL_MEM_FENCE);
fftKernel8(a + 0, dir);
l = ((bNum << 2) + i) >> 0;
k = j << 0;
ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 0));
w = (float2)(native_cos(ang), native_sin(ang));
a[0] = complexMul(a[0], w);
ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 8));
w = (float2)(native_cos(ang), native_sin(ang));
a[1] = complexMul(a[1], w);
ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 16));
w = (float2)(native_cos(ang), native_sin(ang));
a[2] = complexMul(a[2], w);
ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 24));
w = (float2)(native_cos(ang), native_sin(ang));
a[3] = complexMul(a[3], w);
ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 32));
w = (float2)(native_cos(ang), native_sin(ang));
a[4] = complexMul(a[4], w);
ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 40));
w = (float2)(native_cos(ang), native_sin(ang));
a[5] = complexMul(a[5], w);
ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 48));
w = (float2)(native_cos(ang), native_sin(ang));
a[6] = complexMul(a[6], w);
ang = dir*(0x1.921fb54442d18p-13) * (l * (k + 56));
w = (float2)(native_cos(ang), native_sin(ang));
a[7] = complexMul(a[7], w);
lMemStore = sMem + mad24(i, 65, j << 0);
lMemLoad = sMem + mad24(tid >> 6, 65, tid & 63);
lMemStore[ 0] = a[0].x;
lMemStore[ 8] = a[1].x;
lMemStore[ 16] = a[2].x;
lMemStore[ 24] = a[3].x;
lMemStore[ 32] = a[4].x;
lMemStore[ 40] = a[5].x;
lMemStore[ 48] = a[6].x;
lMemStore[ 56] = a[7].x;
barrier(CLK_LOCAL_MEM_FENCE);
a[0].x = lMemLoad[0];
a[1].x = lMemLoad[32];
a[2].x = lMemLoad[65];
a[3].x = lMemLoad[97];
a[4].x = lMemLoad[130];
a[5].x = lMemLoad[162];
a[6].x = lMemLoad[195];
a[7].x = lMemLoad[227];
barrier(CLK_LOCAL_MEM_FENCE);
lMemStore[ 0] = a[0].y;
lMemStore[ 8] = a[1].y;
lMemStore[ 16] = a[2].y;
lMemStore[ 24] = a[3].y;
lMemStore[ 32] = a[4].y;
lMemStore[ 40] = a[5].y;
lMemStore[ 48] = a[6].y;
lMemStore[ 56] = a[7].y;
barrier(CLK_LOCAL_MEM_FENCE);
a[0].y = lMemLoad[0];
a[1].y = lMemLoad[32];
a[2].y = lMemLoad[65];
a[3].y = lMemLoad[97];
a[4].y = lMemLoad[130];
a[5].y = lMemLoad[162];
a[6].y = lMemLoad[195];
a[7].y = lMemLoad[227];
barrier(CLK_LOCAL_MEM_FENCE);
indexOut += tid;
out += indexOut;
out[0] = a[0];
out[32] = a[1];
out[64] = a[2];
out[96] = a[3];
out[128] = a[4];
out[160] = a[5];
out[192] = a[6];
out[224] = a[7];
}
__kernel void fft1(__global float2 *in, __global float2 *out, int dir, int S, __global float2 * cossinLUT1, __global float2 * cossinLUT2 )
{
    __local float sMem[256];
    int i, j, r, indexIn, indexOut, index, tid, bNum, xNum, k, l;
    int s, ii, jj, offset;
    float2 w;
    float ang, angf, ang1;
    __local float *lMemStore, *lMemLoad;
    float2 a[8];
    int lId = get_local_id( 0 );
    int groupId = get_group_id( 0 );
bNum = groupId & 127;
xNum = groupId >> 7;
indexIn = mul24(bNum, 4);
tid = indexIn;
i = tid >> 6;
j = tid & 63;
indexOut = mad24(i, 4096, j);
indexIn += (xNum << 15);
indexOut += (xNum << 15);
tid = lId;
i = tid & 3;
j = tid >> 2;
indexIn += mad24(j, 512, i);
in += indexIn;
a[0] = in[0];
a[1] = in[4096];
a[2] = in[8192];
a[3] = in[12288];
a[4] = in[16384];
a[5] = in[20480];
a[6] = in[24576];
a[7] = in[28672];
fftKernel8(a, dir);
ang = dir*(0x1.921fb54442d18p-4) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[1] = complexMul(a[1], w);
ang = dir*(1.9634954631e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[2] = complexMul(a[2], w);
ang = dir*(2.9452431947e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[3] = complexMul(a[3], w);
ang = dir*(3.9269909263e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[4] = complexMul(a[4], w);
ang = dir*(4.9087386578e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[5] = complexMul(a[5], w);
ang = dir*(5.8904863894e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[6] = complexMul(a[6], w);
ang = dir*(6.8722341210e-001) * (j);
w = (float2)(native_cos(ang), native_sin(ang));
a[7] = complexMul(a[7], w);
indexIn = mad24(j, 32, i);
lMemStore = sMem + tid;
lMemLoad = sMem + indexIn;
lMemStore[0] = a[0].x;
lMemStore[32] = a[1].x;
lMemStore[64] = a[2].x;
lMemStore[96] = a[3].x;
lMemStore[128] = a[4].x;
lMemStore[160] = a[5].x;
lMemStore[192] = a[6].x;
lMemStore[224] = a[7].x;
barrier(CLK_LOCAL_MEM_FENCE);
a[0].x = lMemLoad[0];
a[1].x = lMemLoad[4];
a[2].x = lMemLoad[8];
a[3].x = lMemLoad[12];
a[4].x = lMemLoad[16];
a[5].x = lMemLoad[20];
a[6].x = lMemLoad[24];
a[7].x = lMemLoad[28];
barrier(CLK_LOCAL_MEM_FENCE);
lMemStore[0] = a[0].y;
lMemStore[32] = a[1].y;
lMemStore[64] = a[2].y;
lMemStore[96] = a[3].y;
lMemStore[128] = a[4].y;
lMemStore[160] = a[5].y;
lMemStore[192] = a[6].y;
lMemStore[224] = a[7].y;
barrier(CLK_LOCAL_MEM_FENCE);
a[0].y = lMemLoad[0];
a[1].y = lMemLoad[4];
a[2].y = lMemLoad[8];
a[3].y = lMemLoad[12];
a[4].y = lMemLoad[16];
a[5].y = lMemLoad[20];
a[6].y = lMemLoad[24];
a[7].y = lMemLoad[28];
barrier(CLK_LOCAL_MEM_FENCE);
fftKernel8(a + 0, dir);
l = ((bNum << 2) + i) >> 6;
k = j << 0;
ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 0));
w = (float2)(native_cos(ang), native_sin(ang));
a[0] = complexMul(a[0], w);
ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 8));
w = (float2)(native_cos(ang), native_sin(ang));
a[1] = complexMul(a[1], w);
ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 16));
w = (float2)(native_cos(ang), native_sin(ang));
a[2] = complexMul(a[2], w);
ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 24));
w = (float2)(native_cos(ang), native_sin(ang));
a[3] = complexMul(a[3], w);
ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 32));
w = (float2)(native_cos(ang), native_sin(ang));
a[4] = complexMul(a[4], w);
ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 40));
w = (float2)(native_cos(ang), native_sin(ang));
a[5] = complexMul(a[5], w);
ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 48));
w = (float2)(native_cos(ang), native_sin(ang));
a[6] = complexMul(a[6], w);
ang = dir*(0x1.921fb54442d18p-7) * (l * (k + 56));
w = (float2)(native_cos(ang), native_sin(ang));
a[7] = complexMul(a[7], w);
indexOut += mad24(j, 64, i);
out += indexOut;
out[0] = a[0];
out[512] = a[1];
out[1024] = a[2];
out[1536] = a[3];
out[2048] = a[4];
out[2560] = a[5];
out[3072] = a[6];
out[3584] = a[7];
}
__kernel void fft2(__global float2 *in, __global float2 *out, int dir, int S, __global float2 * cossinLUT1, __global float2 * cossinLUT2 )
{
    int i, j, r, indexIn, indexOut, index, tid, bNum, xNum, k, l;
    int s, ii, jj, offset;
    float2 w;
    float ang, angf, ang1;
    __local float *lMemStore, *lMemLoad;
    float2 a[8];
    int lId = get_local_id( 0 );
    int groupId = get_group_id( 0 );
bNum = groupId & 127;
xNum = groupId >> 7;
indexIn = mul24(bNum, 32);
tid = indexIn;
i = tid >> 12;
j = tid & 4095;
indexOut = mad24(i, 32768, j);
indexIn += (xNum << 15);
indexOut += (xNum << 15);
tid = lId;
i = tid & 31;
j = tid >> 5;
indexIn += mad24(j, 4096, i);
in += indexIn;
a[0] = in[0];
a[1] = in[4096];
a[2] = in[8192];
a[3] = in[12288];
a[4] = in[16384];
a[5] = in[20480];
a[6] = in[24576];
a[7] = in[28672];
fftKernel8(a, dir);
indexOut += mad24(j, 32768, i);
out += indexOut;
out[0] = a[0];
out[4096] = a[1];
out[8192] = a[2];
out[12288] = a[3];
out[16384] = a[4];
out[20480] = a[5];
out[24576] = a[6];
out[28672] = a[7];
}

:-(
ID: 51663 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,612,930
RAC: 23,787
United States
Message 51664 - Posted: 21 Jul 2014, 2:38:45 UTC

I suppose it's getting late over there. But...
I got tried of watching the ATI cards sitting there with Suspended tasks, so, I installed Cat 12.1. I'm not getting any ATI Errors but I'm also not getting any ATI activity. It's been over 10 minutes and SIV still shows No ATI GPU load, GPUz says the same. The nVidia task is about a third of the way finished or else I'd pull the app_info and download some stock apps. Is there some link to the Stock Win ATI app so I can just swap out the apps and change the app_info?

No GPU load on the 6770 or 4670...
ID: 51664 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51666 - Posted: 21 Jul 2014, 16:19:02 UTC

stderr logs too short, post next 20-30 lines too.

And better to attach into pastebin or another place such long files as fftPlan.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51666 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51667 - Posted: 21 Jul 2014, 16:29:39 UTC
Last modified: 21 Jul 2014, 16:30:04 UTC

Well, another failing device's log:

call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
oclFFT context=1051930
AP main context=1051930

Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
Error in ap oclFFT_1: -34
ERROR: OpenCL kernel/call 'clFFT_ExecuteInterleaved_ap' call failed (-34) in file ..\..\ap_science.cpp near line 2680.
Waiting 30 sec before restart...

Hence, no issues with context per se it seems. Both main loop and oclFFT use absolutely the same context indeed....
Need to dig further.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51667 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,612,930
RAC: 23,787
United States
Message 51669 - Posted: 21 Jul 2014, 17:43:56 UTC - in response to Message 51666.  

stderr logs too short, post next 20-30 lines too.

And better to attach into pastebin or another place such long files as fftPlan.

I saved the files before I Updated to Catalyst 12.1. Everything seems to be working with Cat 12.1. Strange 1 driver version would make such a difference.
Tasks for Computer 72229
More of the stderr.txt from the 6770 with Cat 11.12;

INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_Juniper.bin_V7_TWIN_FFA_CAL141646, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585
INFO: binary kernel file created
call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130
call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139
call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459
call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474
call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480
call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486
call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516
call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549
call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577
call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580
call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582
call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583
call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_Juniper_32768_r2559.bin_CAL141646, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_setup.cpp near line 723
oclFFT context=27bb8a0
AP main context=27bb8a0
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
Error in ap oclFFT_1: -34
ERROR: OpenCL kernel/call 'clFFT_ExecuteInterleaved_ap' call failed (-34) in file ..\..\ap_science.cpp near line 2680.
Waiting 30 sec before restart...
Running on device number: 0
DATA_CHUNK_UNROLL set to:4
FFA thread block override value:4096
FFA thread fetchblock override value:2048
CPU affinity adjustment enabled
GPUlock enabled. Use -instances_per_device N switch to provide number of instances to run if BOINC is configured to launch few tasks per device.
Maximum single buffer size set to:256MB
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 0, slots 0 to 0 (including) will be checked
Used slot is 0;	Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
Info: CPU affinity mask used: 1
Used GPU device parameters are:
	Number of compute units: 10
	Single buffer allocation size: 128MB
	Total device global memory: 512MB
	max WG size: 256
ID: 51669 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,612,930
RAC: 23,787
United States
Message 51670 - Posted: 21 Jul 2014, 18:28:35 UTC
Last modified: 21 Jul 2014, 18:31:59 UTC

More from the 4670 with Cat 11.12;

INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_ATIRV730.bin_V7_TWIN_FFA_CAL141646, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585
INFO: binary kernel file created
call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130
call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139
call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459
call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474
call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480
call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486
call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516
call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549
call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577
call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580
call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582
call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583
call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_ATIRV730_32768_r2559.bin_CAL141646, continue with recompile...
WARNING: patching required max_kernel_wg_size=32
oclFFT context=27bb8a0
AP main context=27bb8a0
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
Error in ap oclFFT_1: -34
ERROR: OpenCL kernel/call 'clFFT_ExecuteInterleaved_ap' call failed (-34) in file ..\..\ap_science.cpp near line 2680.
Waiting 30 sec before restart...
Running on device number: 1
DATA_CHUNK_UNROLL set to:4
FFA thread block override value:4096
FFA thread fetchblock override value:2048
CPU affinity adjustment enabled
GPUlock enabled. Use -instances_per_device N switch to provide number of instances to run if BOINC is configured to launch few tasks per device.
Maximum single buffer size set to:256MB
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 1, slots 1 to 1 (including) will be checked
Used slot is 1;	Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
Info: CPU affinity mask used: 2
Used GPU device parameters are:
	Number of compute units: 8
	Single buffer allocation size: 128MB
	Total device global memory: 512MB
	max WG size: 128
ID: 51670 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51710 - Posted: 25 Jul 2014, 6:20:24 UTC
Last modified: 25 Jul 2014, 6:21:04 UTC

I'm glad that this issue shows itself only with limited number of AMD drivers versions.

Please try to run this one: https://www.dropbox.com/s/6g2s6f09y928aqd/AP7_win_x86_SSE2_OpenCL_ATI_r2559_oclFFT_debug_2.7z under driver that experienced this issue.

Again, I need stderr log.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51710 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,612,930
RAC: 23,787
United States
Message 51714 - Posted: 26 Jul 2014, 4:13:30 UTC - in response to Message 51710.  

I'm glad that this issue shows itself only with limited number of AMD drivers versions.

Please try to run this one: https://www.dropbox.com/s/6g2s6f09y928aqd/AP7_win_x86_SSE2_OpenCL_ATI_r2559_oclFFT_debug_2.7z under driver that experienced this issue.

Again, I need stderr log.

I can't open the file. I've downloaded it twice. Something wrong with the file?
ID: 51714 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 33
United Kingdom
Message 51716 - Posted: 26 Jul 2014, 7:52:53 UTC - in response to Message 51714.  

I'm glad that this issue shows itself only with limited number of AMD drivers versions.

Please try to run this one: https://www.dropbox.com/s/6g2s6f09y928aqd/AP7_win_x86_SSE2_OpenCL_ATI_r2559_oclFFT_debug_2.7z under driver that experienced this issue.

Again, I need stderr log.

I can't open the file. I've downloaded it twice. Something wrong with the file?

Are you using 7zip to extract it? It extracted OK here.

Claggy
ID: 51716 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,612,930
RAC: 23,787
United States
Message 51717 - Posted: 26 Jul 2014, 10:36:51 UTC - in response to Message 51716.  

I suppose I forgot you have to Click on the link instead of doing a save as.

So, I installed it while running Cat 12.1 and I get the same thing as before. BOINC Manager says it's running but SIV and GPUz doesn't show any GPU load. Before I go through changing drivers, is that normal? Is it suppose to not show any load with Catalyst 12.1?
ID: 51717 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51724 - Posted: 26 Jul 2014, 19:23:48 UTC - in response to Message 51717.  

I suppose I forgot you have to Click on the link instead of doing a save as.

So, I installed it while running Cat 12.1 and I get the same thing as before. BOINC Manager says it's running but SIV and GPUz doesn't show any GPU load. Before I go through changing drivers, is that normal? Is it suppose to not show any load with Catalyst 12.1?


It's debug build with single purpose to generate log under 11.12.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51724 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,612,930
RAC: 23,787
United States
Message 51746 - Posted: 29 Jul 2014, 20:50:41 UTC - in response to Message 51724.  

Here's the stderr.txt for the 6770 with cat 11.12;
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
...
INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_Juniper.bin_V7_TWIN_FFA_CAL141646, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585
INFO: binary kernel file created
call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130
call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139
call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459
call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474
call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480
call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486
call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516
call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549
call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577
call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580
call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582
call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583
call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_Juniper_32768_r2559.bin_CAL141646, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_setup.cpp near line 723
oclFFT context=7641178
AP main context=7641178
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375.
Waiting 30 sec before restart...
Running on device number: 0
DATA_CHUNK_UNROLL set to:6
FFA thread block override value:2280
FFA thread fetchblock override value:1140
Maximum single buffer size set to:256MB
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
...
### Restart at 0.00 percent.
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130
call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139
call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459
call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474
call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480
call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486
call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516
call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549
call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577
call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580
call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582
call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583
call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
oclFFT context=7641178
AP main context=7641178
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375.
Waiting 30 sec before restart...


The 4670;
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 1
Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
...
INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2559.cl_ATIRV730.bin_V7_TWIN_FFA_CAL141646, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585
INFO: binary kernel file created
call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130
call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139
call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459
call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474
call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480
call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486
call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516
call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549
call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577
call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580
call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582
call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583
call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_ATIRV730_32768_r2559.bin_CAL141646, continue with recompile...
WARNING: patching required max_kernel_wg_size=32
oclFFT context=7641178
AP main context=7641178
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375.
Waiting 30 sec before restart...
Running on device number: 1
DATA_CHUNK_UNROLL set to:6
FFA thread block override value:2280
FFA thread fetchblock override value:1140
Maximum single buffer size set to:256MB
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 1
Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
...
### Restart at 0.00 percent.
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130
call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139
call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459
call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474
call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480
call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486
call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516
call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549
call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577
call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580
call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582
call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583
call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_ATIRV730_32768_r2559.bin_CAL141646, continue with recompile...
WARNING: patching required max_kernel_wg_size=32
oclFFT context=7641178
AP main context=7641178
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375.
Waiting 30 sec before restart...
ID: 51746 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51753 - Posted: 30 Jul 2014, 21:53:08 UTC - in response to Message 51746.  

Thanks, that's line I expected to see:

ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375.

So, not kernel call failure but parameters setup failure. It allows further debugging. Will post next attempt soon.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51753 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51762 - Posted: 31 Jul 2014, 1:40:43 UTC - in response to Message 51753.  

Thanks, that's line I expected to see:

ERROR: OpenCL kernel/call 'oclFFT_1 params' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 375.

So, not kernel call failure but parameters setup failure. It allows further debugging. Will post next attempt soon.

Using TWIN_FFA these should be multiples of 128 i thought!
But using good values from old APv6 might not work anymore :
FFA thread block override value:2280
FFA thread fetchblock override value:1140

_\|/_
U r s
ID: 51762 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,612,930
RAC: 23,787
United States
Message 51763 - Posted: 31 Jul 2014, 3:06:14 UTC - in response to Message 51762.  
Last modified: 31 Jul 2014, 3:06:36 UTC

....might not work anymore :
FFA thread block override value:2280
FFA thread fetchblock override value:1140

Well, how would it not work? It seems to be working?
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17413256
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17411937
ID: 51763 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51764 - Posted: 31 Jul 2014, 9:20:12 UTC - in response to Message 51762.  
Last modified: 31 Jul 2014, 9:38:12 UTC

[quote]Thanks, that's line I expected to see:
Using TWIN_FFA these should be multiples of 128 i thought!
[/code]


I hope no. No additional restrictions added.

EDIT: also, failure in oclFFT, TWIN_FFA is FFA mod, FFT in mainloop. Should be not connected (that is, even if those setting fail inside FFA, they should not prevent normal FFT work, only -unroll N influences on FFT calls).
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51764 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51766 - Posted: 31 Jul 2014, 10:23:18 UTC

New build to try: https://www.dropbox.com/s/fo7cda1sz7pg9se/AP7_win_x86_SSE2_OpenCL_ATI_r2567_oclFFT_debug_3.7z

Example of valid output:

....
oclFFT context=33074b8
AP main context=33074b8
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
oclFFT1 param0 value (read buf): 3d573b0
call 'oclFFT1, param 0' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 383
oclFFT1 param1 value (write buf): 3d572c0
call 'oclFFT1, param 1' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 386
oclFFT1 param2 value (direction): -1
call 'oclFFT1, param 2' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 389
oclFFT1 param3 value (s,batch size): 2
call 'oclFFT1, param 3' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 392
oclFFT1 param4 value (cossin_LUT_d1): 0
call 'oclFFT1, param 4' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 395
oclFFT1 param5 value (cossin_LUT_d2): 0
call 'oclFFT1, param 5' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 398
call 'oclFFT_1 params' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 399
in ap oclFFT_1 ok.
....

And I have some suspiction what fails for 11.12 driver.
With last versions I updated oclFFT merging improvements done by Einstein@home's developers (Oliver & others). They added additional path (via lookup tables) but currently non-iGPU AP uses old native trigonometry (enough precision on all but iGPU and fastest) hence LuT buffers are NULL (bolded in output).
Older versions just didn't have those params at all.
So, it's possible that 11.12 doesn't allow NULL pointer as valid value for kernel param of cl_mem type.

Will see when logs will be available.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51766 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,612,930
RAC: 23,787
United States
Message 51767 - Posted: 31 Jul 2014, 12:56:45 UTC - in response to Message 51766.  

Still getting the restart.
6770;
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
...
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2567.cl_Juniper.bin_V7_TWIN_FFA_CAL141646, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585
INFO: binary kernel file created
call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130
call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139
call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459
call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474
call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480
call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486
call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516
call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549
call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577
call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580
call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582
call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583
call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_Juniper_32768_r2567.bin_CAL141646, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_setup.cpp near line 723
oclFFT context=27ba158
AP main context=27ba158
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
oclFFT1 param0 value (read buf): 27daba8
call 'oclFFT1, param 0' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 383
oclFFT1 param1 value (write buf): 281a118
call 'oclFFT1, param 1' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 386
oclFFT1 param2 value (direction): -1
call 'oclFFT1, param 2' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 389
oclFFT1 param3 value (s,batch size): 6
call 'oclFFT1, param 3' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 392
oclFFT1 param4 value (cossin_LUT_d1): 0
ERROR: OpenCL kernel/call 'oclFFT1, param 4' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 395.
Waiting 30 sec before restart...

4670;
call 'clGetDeviceIDs' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 752
BOINC assigns device 1
Info: BOINC provided OpenCL device ID used
call 'clCreateContext' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 936
call 'Creating Command Queue. (clCreateCommandQueue)' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 961
call 'Creating Command Queue for writing' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 966
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 287
call 'Quering device abilities' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 317
...
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
INFO: can't open binary kernel file: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AstroPulse_Kernels_r2567.cl_ATIRV730.bin_V7_TWIN_FFA_CAL141646, continue with recompile...
call 'clGetProgramInfo' is finished OK in file ..\..\..\..\src\GPU_lock.cpp near line 585
INFO: binary kernel file created
call 'clCreateBuffer (ocl_global_buf1)' is finished OK in file ..\..\ap_science.cpp near line 130
call 'clCreateBuffer (ocl_global_buf2)' is finished OK in file ..\..\ap_science.cpp near line 139
call 'clCreateBuffer: gpu_need_blanking' is finished OK in file ..\..\ap_science.cpp near line 459
call 'clCreateBuffer (buf_periods_df64)' is finished OK in file ..\..\ap_science.cpp near line 474
call 'clCreateBuffer (buf_freqs)' is finished OK in file ..\..\ap_science.cpp near line 480
call 'clCreateBuffer (buf_per_int)' is finished OK in file ..\..\ap_science.cpp near line 486
call 'Creating dechirp_range1_kernel' is finished OK in file ..\..\ap_science.cpp near line 516
call 'Creating PC_single_pulse_kernel_FFA_update_reduce0 from program.' is finished OK in file ..\..\ap_science.cpp near line 549
call 'Creating PC_single_pulse_kernel_FFA_update_reduce1_BLANKIT from program.' is finished OK in file ..\..\ap_science.cpp near line 551
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 576
call 'GPU_change_array_sizes_kernel_cl' is finished OK in file ..\..\ap_science.cpp near line 577
call 'Creating GPU_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 580
call 'Creating GPU_PC_compare_with_threshold_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 582
call 'Creating GPU_coadd_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 583
call 'PopulateTresholdTable_kernel9t_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 593
call 'PopulateFetchOffsets_kernel_df64_cl' is finished OK in file ..\..\ap_science.cpp near line 595
call 'create GPU_fetch_array_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 611
call 'Creating GPU_coadd_with_stride_kernel_twin_cl' is finished OK in file ..\..\ap_science.cpp near line 621
WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiweb.ssl.berkeley.edu_beta\AP_clFFTplan_ATIRV730_32768_r2567.bin_CAL141646, continue with recompile...
WARNING: patching required max_kernel_wg_size=32
oclFFT context=27ba158
AP main context=27ba158
Dumping clFFT Plan to file
call 'clCreateBuffer (gpu_thresholds)' is finished OK in file ..\..\ap_client_main.cpp near line 1719
call 'splitter_bits_to_float_range_kernel' is finished OK in file ..\..\ap_science.cpp near line 2580
oclFFT1 param0 value (read buf): 744bc50
call 'oclFFT1, param 0' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 383
oclFFT1 param1 value (write buf): 27b4f10
call 'oclFFT1, param 1' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 386
oclFFT1 param2 value (direction): -1
call 'oclFFT1, param 2' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 389
oclFFT1 param3 value (s,batch size): 6
call 'oclFFT1, param 3' is finished OK in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 392
oclFFT1 param4 value (cossin_LUT_d1): 0
ERROR: OpenCL kernel/call 'oclFFT1, param 4' call failed (-38) in file ..\..\..\..\src\OpenCL_FFT\fft_execute.cpp near line 395.
Waiting 30 sec before restart...
ID: 51767 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51770 - Posted: 31 Jul 2014, 13:35:59 UTC - in response to Message 51767.  

Just as I thought!

11.12 can't accept NULL for buffer.
I would say it's driver bug, cause nothing about it in OpenCL specs as far as I can recall.

But workaround is possible. Will code workaround, stay tuned.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51770 · Report as offensive
1 · 2 · Next

Message boards : AstroPulse : oclFFT's -34 failure in last AP builds debugging


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.