Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · 33 · 34 . . . 58 · Next
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
New CUDA Apps have been posted. Hopefully they will be a little better with the BLC tasks using the Fermi and above GPUs. The CPU Apps are the same as previously with the addition of a SSSE3 App which Might work with the AVX CPUs in Darwin 11.4.2 (Lion). Testing on the troublesome AVX CPUs in Lion is needed. If you have one of those LapTops, Please comment on your results. On my Mac Pro with a GTX 950 the CUDA75 App is slightly faster on the normal Arecibo tasks but is about the same as the CUDA42 App on the VLARs. Your mileage Will vary. The new Apps are here; http://www.arkayn.us/forum/index.php?topic=191.msg4369#msg4369 |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
New ATI/AMD MBv8 App posted. This is a replacement for r3347 and should give better results with the BLC VLAR tasks. Has been tested in Darwin 15.5. Testing is needed with the D-500 & D-700 Mac Pros in Darwin 15.4 & 15.5 to determine if all the Gaussians are being reported. In the usual location, http://www.arkayn.us/forum/index.php?topic=191.msg4368#msg4368 |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Greetings on the 4th. So, the MBv8r3480 App is working so nicely in OSX I decided to try a Linux version. Except the compile is not working so nicely. How do I turn off the Counters? That would probably be the easiest thing to do. The current problem is; In file included from analyzeFuncs.cpp:70:0: /home/tbar/sah_v7_opt/src/counters.h: In constructor ‘Timings<T>::Timings()’: /home/tbar/sah_v7_opt/src/counters.h:279:17: error: there are no arguments to ‘__rdtsc’ that depend on a template parameter, so a declaration of ‘__rdtsc’ must be available [-fpermissive] start=__rdtsc(); ^ /home/tbar/sah_v7_opt/src/counters.h:279:17: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated) /home/tbar/sah_v7_opt/src/counters.h: In destructor ‘Timings<T>::~Timings()’: /home/tbar/sah_v7_opt/src/counters.h:294:36: error: there are no arguments to ‘__rdtsc’ that depend on a template parameter, so a declaration of ‘__rdtsc’ must be available [-fpermissive] register uint64_t delta=__rdtsc()-start; I'd like to just turn them Off please. I'd like to recompile the OSX version with them turned off as well. BTW, the OSX version changed my ATI 6870 from around 42 minutes on a BLC3 to around 26 minutes. Nice. Hopefully the Linux version will work as well. The Counters...how do you turn them Off? I didn't have this problem in OSX. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Wait a little. Soon I'll commit even better code - then worth to rebuild. r3480 actually has bug in new adaptation code so it works better only on subset of tasks. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Oops, already installed. I'll roll back. BTW Raistmer, just curious what the HighPerformaceGPU looks for. I noticed Tom's is listed as "yes" with 14 cu's but my D700’s are list as no with 32 cu's. Maybe that is thI bug you are referring to. Thanks, Chris |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Oops, already installed. I'll roll back. BTW Raistmer, just curious what the HighPerformaceGPU looks for. I noticed Tom's is listed as "yes" with 14 cu's but my D700’s are list as no with 32 cu's. Maybe that is thI bug you are referring to. Currently it enabled only manually via switch. Look ReadMe. BTW, bug doesn't affect correctness of results so this build can be used, especially if its faster indeed on particular host. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The ATI build seems to working very well, there are problems with the nVidia build though. Running the BLC3 tasks were very slow with the nVidia version and then with the ATI version run on a nVidia card when the nVidia build failed. On the NV card not only was it way too slow but the CPU use would go up to 110% within a couple minutes, and the idle wake ups were above 20k. The idle wake ups on the ATI card are around 500. It appears r3482 has appeared in the repository... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Yep, I just completing Windows binaries rebuild for it. r3482 hardly changes any issues with counters though. Their usage governed by USE_COUNTERS define. If undef this define doesn't help please report again. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Would I just comment out; #if !( /*__linux__ || __APPLE__ ||*/ __FreeBSD__ || __MINGW32__ || !(defined(USE_OPENCL) || defined(USE_CUDA) || defined(USE_BROOK)) ) #define USE_COUNTERS 1 #endif in sah_v7_opt/src/GPU_lock.h? It took a while to find it in sah_v7_opt/src. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, commenting out the Counter lines as above gets me to the next error in r3480; ../../src/GPU_lock.cpp: In function ‘void DumpKernelExecTime_PulseFind(KERNEL_TUNE, PulseFind_tune&)’: ../../src/GPU_lock.cpp:510:74: error: ‘floor’ was not declared in this scope else if(tune.N>4)tune.sleep=15*(size_t)floor((tune.sliding_mean_ms+1)/15);//R: rounded down kernel execution time in ms ^ make[2]: *** [seti_boinc-GPU_lock.o] Error 1 ... Strange I didn't have these problems in OSX. I suppose it's time to download r3482. Looks as though there is new CUDA code waiting as well... |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
Well, commenting out the Counter lines as above gets me to the next error in r3480; Try to add header with missing function before void DumpKernelExecTime_PulseFind(... on a new line #include <cmath> and retry. _\|/_ U r s |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I added it here in r3482, I still had to comment out the Counters with r3482; #if __linux__ || __APPLE__ That got me to the next errors; ../../src/CLInfo.cpp: In function ‘void CLInfo()’: ../../src/CLInfo.cpp:481:36: error: ‘CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD’ was not declared in this scope << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No") ^ ../../src/CLInfo.cpp:481:74: error: no matching function for call to ‘cl::Device::getInfo()’ << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No") ^ ../../src/CLInfo.cpp:481:74: note: candidates are: In file included from ../../src/CLInfo.cpp:118:0: ../../src/cl_cutted.hpp:1348:12: note: template<class T> cl_int cl::Device::getInfo(cl_device_info, T*) const cl_int getInfo(cl_device_info name, T* param) const ^ ../../src/cl_cutted.hpp:1348:12: note: template argument deduction/substitution failed: ../../src/CLInfo.cpp:481:74: error: template argument 1 is invalid << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No") ^ In file included from ../../src/CLInfo.cpp:118:0: ../../src/cl_cutted.hpp:1357:5: note: template<int name> typename cl::detail::param_traits<cl::detail::cl_device_info, name>::param_type cl::Device::getInfo(cl_int*) const getInfo(cl_int* err = NULL) const ^ ../../src/cl_cutted.hpp:1357:5: note: template argument deduction/substitution failed: ../../src/CLInfo.cpp:481:74: error: template argument 1 is invalid << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No") ^ make[2]: *** [seti_boinc-CLInfo.o] Error 1 :-( I was able to compile MBv8_8.08r3482_ati5_SoG_x86_64-apple-darwin in Darwin 15.5. As far as I can tell, it works exactly like r3480 on my ATI 6870. I also compiled a new MBv8_8.08r3483_NV_SoG_x86_64-apple-darwin from r3482. The 'Shorty' task I let run to completion had an AR of 1.282434 and should have finished in about 6 minutes on the stock CUDA App...it took 44 minutes running the OpenCL App on a GTX 950. It had a few 'new' numbers in case they might help; Time cpu in use since last restart: 2679.5 seconds Fftlength=8,pass=3:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual Fftlength=8,pass=4:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual Fftlength=8,pass=5:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual Fftlength=16,pass=3:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual Fftlength=16,pass=4:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual Fftlength=16,pass=5:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual Fftlength=32,pass=3:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual Fftlength=32,pass=4:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual Fftlength=32,pass=5:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual Fftlength=64,pass=3:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual Fftlength=64,pass=4:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual Fftlength=64,pass=5:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual Fftlength=128,pass=3:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual Fftlength=128,pass=4:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual Fftlength=128,pass=5:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual Fftlength=256,pass=3:Tune: sum=1176.36(ms); min=6.818(ms); max=13.92(ms); mean=8.98(ms); s_mean=8.396; sleep=0(ms); delta=1; N=131; usual Fftlength=512,pass=3:Tune: sum=600.66(ms); min=1.693(ms); max=3.697(ms); mean=2.284(ms); s_mean=2.202; sleep=0(ms); delta=1; N=263; usual Fftlength=1024,pass=3:Tune: sum=486.072(ms); min=0.7004(ms); max=1.502(ms); mean=0.9223(ms); s_mean=0.9211; sleep=0(ms); delta=1; N=527; usual |
Urs Echternacht Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 |
... That should be defined in cl_ext.h from AMD's APP SDK. Check that it is in the header file that gets included automatically. _\|/_ U r s |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
The 'Shorty' task I let run to completion had an AR of 1.282434 and should have finished in about 6 minutes on the stock CUDA App...it took 44 minutes running the OpenCL App on a GTX 950. It had a few 'new' numbers in case they might help; Judging from those counters slowdown not in GPU part of PulseFind. Pity you can't provide build with common counters. They contain lot more info about what could give such slowdown. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The stderr.txt does have the standard counters. The last version I had built, MBv8_8.08r3479_NV_SoG_x86_64-apple-darwin ran a BLC3 for 30 minutes and was about ~30% complete when I stopped it. I let this one run a 'Shorty' hoping it would finish quicker. I almost stopped this one too. 12:44:27 (70846): Can't open init data file - running in standalone mode 12:44:27 (70846): Can't open init data file - running in standalone mode Not using mb_cmdline.txt-file, using commandline options. Running on device number: 0 12:44:27 (70846): Can't open init data file - running in standalone mode WARNING: init_data.xml missing OpenCL platform detected: Apple WARNING: BOINC supplied wrong platform! Number of OpenCL devices found : 3 BOINC assigns slot on device #1 of 3 devices. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities DOUBLE_FP supported. cl_khr_fp64 supported. cl_APPLE_fp64_basic_ops supported. FERMI : true Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit System: Darwin x86_64 Kernel: 15.5.0 CPU : Intel(R) Xeon(R) CPU E5472 @ 3.00GHz GenuineIntel x86, Family 6 Model 23 Stepping 6 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 OpenCL-kernels filename : MultiBeam_Kernels_r3483.cl INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3483.cl_GeForceGTX950.bin_V7_SoG_15.5.0_1011103460310f0, continue with recompile... Info : Building Program (binary, clBuildProgram):main kernels: OK code 0 INFO: binary kernel file created WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_524288_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_8_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_16_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_32_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_64_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_128_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_256_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_512_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_1024_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_2048_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_4096_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_8192_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_16384_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_32768_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_65536_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_131072_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile... ar=1.282434 NumCfft=101213 NumGauss=0 NumPulse=56259762380 NumTriplet=56259762380 Currently allocated 201 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 OS X optimized setiathome_v8 application Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSSE3x OS X 64bit Build 3483 , Ported by : Raistmer, JDWhale, Urs Echternacht OpenCL version by Raistmer, r3483 Number of OpenCL platforms: 1 OpenCL Platform Name: Apple Number of devices: 3 Max compute units: 6 Max work group size: 1024 Max clock frequency: 1316Mhz Max memory allocation: 536870912 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: No Name: GeForce GTX 950 Vendor: NVIDIA Driver version: 10.11.10 346.03.10f02 Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 Max compute units: 14 Max work group size: 256 Max clock frequency: 900Mhz Max memory allocation: 268435456 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Queue properties: Out-of-Order: No Name: ATI Radeon Barts XT Prototype Vendor: AMD Driver version: 1.2 (Apr 26 2016 00:27:34) Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_depth_images Max compute units: 6 Max work group size: 1024 Max clock frequency: 1316Mhz Max memory allocation: 536870912 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: No Name: GeForce GTX 950 Vendor: NVIDIA Driver version: 10.11.10 346.03.10f02 Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 1.282434 Used GPU device parameters are: Number of compute units: 6 Single buffer allocation size: 128MB Total device global memory: 2048MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: yes LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=50 Spike: peak=24.49179, time=87.24, d_freq=1420292123.35, chirp=-0.70521, fft_len=128k Autocorr: peak=17.95337, time=100.7, delay=5.4133, d_freq=1420291083.78, chirp=-18.726, fft_len=128k Spike: peak=24.40106, time=6.711, d_freq=1420290288.28, chirp=24.561, fft_len=128k Spike: peak=24.02709, time=6.711, d_freq=1420290288.28, chirp=24.562, fft_len=128k Spike: peak=24.90492, time=100.7, d_freq=1420290714.81, chirp=-27.495, fft_len=128k Spike: peak=25.7143, time=100.7, d_freq=1420290714.81, chirp=-27.499, fft_len=128k Spike: peak=24.64154, time=100.7, d_freq=1420290714.81, chirp=-27.502, fft_len=128k Triplet: peak=9.651677, time=33.19, period=0.675, d_freq=1420297365.98, chirp=-69.805, fft_len=128 Best spike: peak=25.7143, time=100.7, d_freq=1420290714.81, chirp=-27.499, fft_len=128k Best autocorr: peak=17.95337, time=100.7, delay=5.4133, d_freq=1420291083.78, chirp=-18.726, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.122e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=0.5118575, time=14.66, period=0.0236, d_freq=1420296732.21, score=0.9063, chirp=48.56, fft_len=16 Best triplet: peak=9.651677, time=33.19, period=0.675, d_freq=1420297365.98, chirp=-69.805, fft_len=128 Flopcounter: 62579306918.907852 Spike count: 6 Autocorr count: 1 Pulse count: 0 Triplet count: 1 Gaussian count: 0 Time cpu in use since last restart: 2679.5 seconds Fftlength=8,pass=3:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual Fftlength=8,pass=4:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual Fftlength=8,pass=5:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual Fftlength=16,pass=3:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual Fftlength=16,pass=4:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual Fftlength=16,pass=5:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual Fftlength=32,pass=3:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual Fftlength=32,pass=4:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual Fftlength=32,pass=5:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual Fftlength=64,pass=3:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual Fftlength=64,pass=4:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual Fftlength=64,pass=5:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual Fftlength=128,pass=3:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual Fftlength=128,pass=4:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual Fftlength=128,pass=5:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual Fftlength=256,pass=3:Tune: sum=1176.36(ms); min=6.818(ms); max=13.92(ms); mean=8.98(ms); s_mean=8.396; sleep=0(ms); delta=1; N=131; usual Fftlength=512,pass=3:Tune: sum=600.66(ms); min=1.693(ms); max=3.697(ms); mean=2.284(ms); s_mean=2.202; sleep=0(ms); delta=1; N=263; usual Fftlength=1024,pass=3:Tune: sum=486.072(ms); min=0.7004(ms); max=1.502(ms); mean=0.9223(ms); s_mean=0.9211; sleep=0(ms); delta=1; N=527; usual Gaussian_transfer_not_needed total=0.0000E+00, N=0 , <>=0 , min=0 , max=0 Gaussian_transfer_needed total=0.0000E+00, N=0 , <>=0 , min=0 , max=0 Gaussian_skip1_no_peak total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_skip2_bad_group_peak total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_skip3_too_weak_peak total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_skip4_too_big_ChiSq total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_skip6_low_power total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_new_best total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_report total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_miss total=0 , N=0 , <>=0 , min=0 , max=0 PC_triplet_find_hit total=9.7200E+02, N=972 , <>=1 , min=1 , max=1 PC_triplet_find_miss total=7.7000E+01, N=77 , <>=1 , min=1 , max=1 PC_pulse_find_hit total=1.0420E+03, N=1042 , <>=1 , min=1 , max=1 PC_pulse_find_miss total=7.0000E+00, N=7 , <>=1 , min=1 , max=1 PC_pulse_find_early_miss total=3.0000E+00, N=3 , <>=1 , min=1 , max=1 PC_pulse_find_2CPU total=1.0000E+00, N=1 , <>=1 , min=1 , max=1 PoT_transfer_not_needed total=9.6900E+02, N=969 , <>=1 , min=1 , max=1 PoT_transfer_needed total=8.1000E+01, N=81 , <>=1 , min=1 , max=1 GPU device sync requested... ...GPU device synched 13:30:22 (70846): called boinc_finish(0) I'm still working on the Linux build. After installing the 2.91 SDK it finished compiling, but, it seems it destroyed the driver... |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well that's not good. I reinstalled the same driver I've been using for over a year and it crashes; 20:02:45 (3185): Can't open init data file - running in standalone mode 20:02:45 (3185): Can't open init data file - running in standalone mode Not using mb_cmdline.txt-file, using commandline options. 20:02:45 (3185): Can't open init data file - running in standalone mode WARNING: init_data.xml missing OpenCL platform detected: Advanced Micro Devices, Inc. WARNING: BOINC supplied wrong platform! Number of OpenCL devices found : 1 BOINC assigns slot on device #0. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit System: Linux x86_64 Kernel: 3.13.0-77-generic CPU : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz 4 core(s), Speed : 1998.00[pre]20:02:45 (3185): Can't open init data file - running in standalone mode 20:02:45 (3185): Can't open init data file - running in standalone mode Not using mb_cmdline.txt-file, using commandline options. 20:02:45 (3185): Can't open init data file - running in standalone mode WARNING: init_data.xml missing OpenCL platform detected: Advanced Micro Devices, Inc. WARNING: BOINC supplied wrong platform! Number of OpenCL devices found : 1 BOINC assigns slot on device #0. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit System: Linux x86_64 Kernel: 3.13.0-77-generic CPU : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz 4 core(s), Speed : 1998.000 MHz L1 : 64 KB, Cache : 3072 KB Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT PNI SSSE3 SSE4_1 OpenCL-kernels filename : MultiBeam_Kernels_r3482.cl INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3482.clHD5_Barts.bin_V7_SoG_15263, continue with recompile... Info : Building Program (binary, clBuildProgram):main kernels: OK code 0 INFO: binary kernel file created WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_524288_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_8_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_16_gr64_lr16_wg256_tw0_r3482.bin_15263,0 MHz L1 : 64 KB, Cache : 3072 KB Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT PNI SSSE3 SSE4_1 OpenCL-kernels filename : MultiBeam_Kernels_r3482.cl INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3482.clHD5_Barts.bin_V7_SoG_15263, continue with recompile... Info : Building Program (binary, clBuildProgram):main kernels: OK code 0 INFO: binary kernel file created WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_524288_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_8_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_16_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_32_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_64_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_128_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_256_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_512_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_1024_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_2048_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_4096_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_8192_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_16384_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_32768_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_65536_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_131072_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile... ar=0.775000 NumCfft=1169 NumGauss=6087368 NumPulse=1197108460 NumTriplet=2300559776 Currently allocated 229 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Linux optimized setiathome_v8 application Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSSE3x Linux64 Build 3482 , Ported by : Raistmer, JDWhale, Urs Echternacht OpenCL version by Raistmer, r3482 AMD HD5 version by Raistmer Number of OpenCL platforms: 1 OpenCL Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Max compute units: 12 Max work group size: 256 Max clock frequency: 775Mhz Max memory allocation: 1073741824 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Queue properties: Out-of-Order: No Profiling timer offset: 4156058960 Global free memory: 4156058976 SIMD per compute unit: 1 SIMD width: 16 SIMD instruction width: 5 Wavefront width: 64 Global mem channels: 8 Global mem channel banks: 16 Global mem channel bank width: 256 Local mem size per compute unit: 32768 Local mem banks: 32 Thread trace supported: No Board Name: AMD Radeon HD 6800 Series Name: Barts Vendor: Advanced Micro Devices, Inc. Driver version: 1526.3 Version: OpenCL 1.2 AMD-APP (1526.3) Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.775000 Used GPU device parameters are: Number of compute units: 12 Single buffer allocation size: 128MB Total device global memory: 1024MB max WG size: 256 local mem type: Real LotOfMem path: yes LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=50 SIGSEGV: segmentation violation Stack trace (24 frames): ./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x653dc0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fdcefd34330] /usr/lib/fglrx/libamdocl64.so(+0x5eb0d1)[0x7fdced0130d1] /usr/lib/fglrx/libamdocl64.so(+0x57a6e3)[0x7fdcecfa26e3] /usr/lib/fglrx/libamdocl64.so(+0x5764dd)[0x7fdcecf9e4dd] /usr/lib/fglrx/libamdocl64.so(+0x5d7ffc)[0x7fdcecfffffc] /usr/lib/fglrx/libamdocl64.so(+0x5d815d)[0x7fdced00015d] /usr/lib/fglrx/libamdocl64.so(+0x5d990b)[0x7fdced00190b] /usr/lib/fglrx/libamdocl64.so(+0x5289d0)[0x7fdcecf509d0] /usr/lib/fglrx/libamdocl64.so(+0x4fa2f4)[0x7fdcecf222f4] /usr/lib/fglrx/libamdocl64.so(+0x4fa4e8)[0x7fdcecf224e8] /usr/lib/fglrx/libamdocl64.so(+0x4fc1c2)[0x7fdcecf241c2] /usr/lib/fglrx/libamdocl64.so(+0x4fca09)[0x7fdcecf24a09] /usr/lib/fglrx/libamdocl64.so(+0x4bbf20)[0x7fdcecee3f20] /usr/lib/fglrx/libamdocl64.so(+0x4bc0d6)[0x7fdcecee40d6] /usr/lib/fglrx/libamdocl64.so(+0x4b1cdb)[0x7fdceced9cdb] /usr/lib/fglrx/libamdocl64.so(clEnqueueNDRangeKernel+0x3e2)[0x7fdceceb2212] ./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x426023] ./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x4124c3] ./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x566f89] ./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x56fd52] ./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x405fc7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fdcef980f45] ./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x4071cc] Exiting... Now what... So, I booted into Ubuntu 12.04 which has the same driver and got the same driver crash. Both systems work with the older r3306 App. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It seems to be working with the Repository driver showing OpenCL 1.2 AMD-APP (1800.11). The older App r3306 was compiled with SDK 2.8.1 and works with OpenCL 1.2 AMD-APP (1526.3). For some reason the new App compiled with SDK 2.9.1 doesn't work with the older driver 14.6. Strange considering 14.6 and SDK 2.9.1 was released about the same time. I dunno. There doesn't seem to be much difference between the older App and the newer r3482, at least not on my old cards; http://setiathome.berkeley.edu/result.php?resultid=5023490256 At least they seem to be validating. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
The stderr.txt does have the standard counters. Good! Then try to catch task that processed by OpenCL app on wingman's host too. And compare your hit/miss counters with wingman's ones. In that particular result triplet miss looks higher than usual but hard to say w/o comparison with wingman's on the same task. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
http://setiathome.berkeley.edu/result.php?resultid=5023490256 this one doesn't contain performance statistics to look for. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The stderr.txt does have the standard counters. The Standalone tasks I posted was run on Main with the CUDA 'Special' App here; http://setiathome.berkeley.edu/result.php?resultid=5023973186 Run time: 2 min 39 sec CPU time: 2 min 31 sec Spike count: 6 Autocorr count: 1 Pulse count: 0 Triplet count: 1 Gaussian count: 0 There aren't any counters, but the results are the same. Unfortunately, the nVidia OpenCL App took 44 minutes to finish the task where the Cuda App took 2.6 minutes. As has been obvious for some time, there is something seriously wrong with the nVidia OpenCL App in Darwin 15.x. That's why I've been recommending the CUDA App to Beta for the last 5 or 6 months. I see there still isn't a Mac CUDA App at Beta. It's also getting difficult to find any Mac nVidia Host working at Beta, my guess is they're giving up on running the same non-working OpenCL Apps for quite some time. My experience with the latest nVidia OpenCL App is about the same as this Host at Beta; http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=58196 Those tasks taking 1.7 hours on that Host would finish in about 34 minutes running the CUDA App I recommended last week. The BLC3 tasks taking 7+ hours should finish in a little over an hour with the CUDA App. This situation is very similar to the results on Main with the nVidia 730s showing similar differences in Windows on the BLC tasks. I don't plan on running very many Tasks with an App that takes 44 minutes to finish a shorty. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.