I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · 33 · 34 . . . 58 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1798964 - Posted: 27 Jun 2016, 6:28:41 UTC
Last modified: 27 Jun 2016, 6:38:27 UTC

New CUDA Apps have been posted. Hopefully they will be a little better with the BLC tasks using the Fermi and above GPUs. The CPU Apps are the same as previously with the addition of a SSSE3 App which Might work with the AVX CPUs in Darwin 11.4.2 (Lion). Testing on the troublesome AVX CPUs in Lion is needed. If you have one of those LapTops, Please comment on your results. On my Mac Pro with a GTX 950 the CUDA75 App is slightly faster on the normal Arecibo tasks but is about the same as the CUDA42 App on the VLARs. Your mileage Will vary.

The new Apps are here; http://www.arkayn.us/forum/index.php?topic=191.msg4369#msg4369
ID: 1798964 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1800199 - Posted: 2 Jul 2016, 22:12:46 UTC

New ATI/AMD MBv8 App posted. This is a replacement for r3347 and should give better results with the BLC VLAR tasks. Has been tested in Darwin 15.5. Testing is needed with the D-500 & D-700 Mac Pros in Darwin 15.4 & 15.5 to determine if all the Gaussians are being reported.

In the usual location, http://www.arkayn.us/forum/index.php?topic=191.msg4368#msg4368
ID: 1800199 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1800759 - Posted: 4 Jul 2016, 20:07:58 UTC

Greetings on the 4th.

So, the MBv8r3480 App is working so nicely in OSX I decided to try a Linux version. Except the compile is not working so nicely. How do I turn off the Counters? That would probably be the easiest thing to do. The current problem is;
In file included from analyzeFuncs.cpp:70:0:
/home/tbar/sah_v7_opt/src/counters.h: In constructor ‘Timings<T>::Timings()’:
/home/tbar/sah_v7_opt/src/counters.h:279:17: error: there are no arguments to ‘__rdtsc’ that depend on a template parameter, so a declaration of ‘__rdtsc’ must be available [-fpermissive]
   start=__rdtsc();
                 ^
/home/tbar/sah_v7_opt/src/counters.h:279:17: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
/home/tbar/sah_v7_opt/src/counters.h: In destructor ‘Timings<T>::~Timings()’:
/home/tbar/sah_v7_opt/src/counters.h:294:36: error: there are no arguments to ‘__rdtsc’ that depend on a template parameter, so a declaration of ‘__rdtsc’ must be available [-fpermissive]
   register uint64_t  delta=__rdtsc()-start;

I'd like to just turn them Off please.
I'd like to recompile the OSX version with them turned off as well.

BTW, the OSX version changed my ATI 6870 from around 42 minutes on a BLC3 to around 26 minutes. Nice.
Hopefully the Linux version will work as well.
The Counters...how do you turn them Off? I didn't have this problem in OSX.
ID: 1800759 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1800800 - Posted: 4 Jul 2016, 22:13:13 UTC - in response to Message 1800759.  

Wait a little. Soon I'll commit even better code - then worth to rebuild. r3480 actually has bug in new adaptation code so it works better only on subset of tasks.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1800800 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1800826 - Posted: 5 Jul 2016, 0:00:14 UTC - in response to Message 1800800.  

Oops, already installed. I'll roll back. BTW Raistmer, just curious what the HighPerformaceGPU looks for. I noticed Tom's is listed as "yes" with 14 cu's but my D700’s are list as no with 32 cu's. Maybe that is thI bug you are referring to.

Thanks,

Chris
ID: 1800826 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1800886 - Posted: 5 Jul 2016, 5:56:57 UTC - in response to Message 1800826.  

Oops, already installed. I'll roll back. BTW Raistmer, just curious what the HighPerformaceGPU looks for. I noticed Tom's is listed as "yes" with 14 cu's but my D700’s are list as no with 32 cu's. Maybe that is thI bug you are referring to.

Thanks,

Chris

Currently it enabled only manually via switch. Look ReadMe.
BTW, bug doesn't affect correctness of results so this build can be used, especially if its faster indeed on particular host.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1800886 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1800899 - Posted: 5 Jul 2016, 8:21:32 UTC

The ATI build seems to working very well, there are problems with the nVidia build though. Running the BLC3 tasks were very slow with the nVidia version and then with the ATI version run on a nVidia card when the nVidia build failed. On the NV card not only was it way too slow but the CPU use would go up to 110% within a couple minutes, and the idle wake ups were above 20k. The idle wake ups on the ATI card are around 500.

It appears r3482 has appeared in the repository...
ID: 1800899 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1800900 - Posted: 5 Jul 2016, 8:29:17 UTC - in response to Message 1800899.  


It appears r3482 has appeared in the repository...

Yep, I just completing Windows binaries rebuild for it.
r3482 hardly changes any issues with counters though.
Their usage governed by USE_COUNTERS define.
If undef this define doesn't help please report again.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1800900 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1800902 - Posted: 5 Jul 2016, 9:05:00 UTC - in response to Message 1800900.  


It appears r3482 has appeared in the repository...

Yep, I just completing Windows binaries rebuild for it.
r3482 hardly changes any issues with counters though.
Their usage governed by USE_COUNTERS define.
If undef this define doesn't help please report again.

Would I just comment out;
#if !( /*__linux__ || __APPLE__ ||*/ __FreeBSD__ || __MINGW32__ || !(defined(USE_OPENCL) || defined(USE_CUDA) || defined(USE_BROOK)) )
#define USE_COUNTERS 1
#endif

in sah_v7_opt/src/GPU_lock.h?
It took a while to find it in sah_v7_opt/src.
ID: 1800902 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1800916 - Posted: 5 Jul 2016, 11:36:58 UTC - in response to Message 1800900.  

Well, commenting out the Counter lines as above gets me to the next error in r3480;
../../src/GPU_lock.cpp: In function ‘void DumpKernelExecTime_PulseFind(KERNEL_TUNE, PulseFind_tune&)’:
../../src/GPU_lock.cpp:510:74: error: ‘floor’ was not declared in this scope
  else if(tune.N>4)tune.sleep=15*(size_t)floor((tune.sliding_mean_ms+1)/15);//R: rounded down kernel execution time in ms
                                                                          ^
make[2]: *** [seti_boinc-GPU_lock.o] Error 1
...

Strange I didn't have these problems in OSX.
I suppose it's time to download r3482.
Looks as though there is new CUDA code waiting as well...
ID: 1800916 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1800923 - Posted: 5 Jul 2016, 13:06:22 UTC - in response to Message 1800916.  

Well, commenting out the Counter lines as above gets me to the next error in r3480;
../../src/GPU_lock.cpp: In function ‘void DumpKernelExecTime_PulseFind(KERNEL_TUNE, PulseFind_tune&)’:
../../src/GPU_lock.cpp:510:74: error: ‘floor’ was not declared in this scope
  else if(tune.N>4)tune.sleep=15*(size_t)floor((tune.sliding_mean_ms+1)/15);//R: rounded down kernel execution time in ms
                                                                          ^
make[2]: *** [seti_boinc-GPU_lock.o] Error 1
...

Strange I didn't have these problems in OSX.
I suppose it's time to download r3482.
Looks as though there is new CUDA code waiting as well...

Try to add header with missing function before
void DumpKernelExecTime_PulseFind(...
on a new line
#include <cmath>
and retry.
_\|/_
U r s
ID: 1800923 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1800947 - Posted: 5 Jul 2016, 20:51:21 UTC - in response to Message 1800923.  
Last modified: 5 Jul 2016, 21:35:16 UTC

I added it here in r3482, I still had to comment out the Counters with r3482;
#if __linux__ || __APPLE__
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>
#include <string>
#include <cmath>

That got me to the next errors;
../../src/CLInfo.cpp: In function ‘void CLInfo()’:
../../src/CLInfo.cpp:481:36: error: ‘CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD’ was not declared in this scope
                   << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No")
                                    ^
../../src/CLInfo.cpp:481:74: error: no matching function for call to ‘cl::Device::getInfo()’
                   << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No")
                                                                          ^
../../src/CLInfo.cpp:481:74: note: candidates are:
In file included from ../../src/CLInfo.cpp:118:0:
../../src/cl_cutted.hpp:1348:12: note: template<class T> cl_int cl::Device::getInfo(cl_device_info, T*) const
     cl_int getInfo(cl_device_info name, T* param) const
            ^
../../src/cl_cutted.hpp:1348:12: note:   template argument deduction/substitution failed:
../../src/CLInfo.cpp:481:74: error: template argument 1 is invalid
                   << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No")
                                                                          ^
In file included from ../../src/CLInfo.cpp:118:0:
../../src/cl_cutted.hpp:1357:5: note: template<int name> typename cl::detail::param_traits<cl::detail::cl_device_info, name>::param_type cl::Device::getInfo(cl_int*) const
     getInfo(cl_int* err = NULL) const
     ^
../../src/cl_cutted.hpp:1357:5: note:   template argument deduction/substitution failed:
../../src/CLInfo.cpp:481:74: error: template argument 1 is invalid
                   << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No")
                                                                          ^
make[2]: *** [seti_boinc-CLInfo.o] Error 1

:-(

I was able to compile MBv8_8.08r3482_ati5_SoG_x86_64-apple-darwin in Darwin 15.5. As far as I can tell, it works exactly like r3480 on my ATI 6870.
I also compiled a new MBv8_8.08r3483_NV_SoG_x86_64-apple-darwin from r3482. The 'Shorty' task I let run to completion had an AR of 1.282434 and should have finished in about 6 minutes on the stock CUDA App...it took 44 minutes running the OpenCL App on a GTX 950. It had a few 'new' numbers in case they might help;
Time cpu in use since last restart: 2679.5 seconds
Fftlength=8,pass=3:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual
Fftlength=8,pass=4:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual
Fftlength=8,pass=5:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual
Fftlength=16,pass=3:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual
Fftlength=16,pass=4:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual
Fftlength=16,pass=5:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual
Fftlength=32,pass=3:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual
Fftlength=32,pass=4:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual
Fftlength=32,pass=5:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual
Fftlength=64,pass=3:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual
Fftlength=64,pass=4:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual
Fftlength=64,pass=5:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual
Fftlength=128,pass=3:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual
Fftlength=128,pass=4:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual
Fftlength=128,pass=5:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual
Fftlength=256,pass=3:Tune: sum=1176.36(ms); min=6.818(ms); max=13.92(ms); mean=8.98(ms); s_mean=8.396; sleep=0(ms); delta=1; N=131; usual
Fftlength=512,pass=3:Tune: sum=600.66(ms); min=1.693(ms); max=3.697(ms); mean=2.284(ms); s_mean=2.202; sleep=0(ms); delta=1; N=263; usual
Fftlength=1024,pass=3:Tune: sum=486.072(ms); min=0.7004(ms); max=1.502(ms); mean=0.9223(ms); s_mean=0.9211; sleep=0(ms); delta=1; N=527; usual
ID: 1800947 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1800963 - Posted: 5 Jul 2016, 21:57:31 UTC - in response to Message 1800947.  

...
That got me to the next errors;
../../src/CLInfo.cpp: In function ‘void CLInfo()’:
../../src/CLInfo.cpp:481:36: error: ‘CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD’ was not declared in this scope
                   << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No")
                                    ^
../../src/CLInfo.cpp:481:74: error: no matching function for call to ‘cl::Device::getInfo()’
                   << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No")
                                                                          ^
../../src/CLInfo.cpp:481:74: note: candidates are:
In file included from ../../src/CLInfo.cpp:118:0:
../../src/cl_cutted.hpp:1348:12: note: template<class T> cl_int cl::Device::getInfo(cl_device_info, T*) const
     cl_int getInfo(cl_device_info name, T* param) const
            ^
../../src/cl_cutted.hpp:1348:12: note:   template argument deduction/substitution failed:
../../src/CLInfo.cpp:481:74: error: template argument 1 is invalid
                   << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No")
                                                                          ^
In file included from ../../src/CLInfo.cpp:118:0:
../../src/cl_cutted.hpp:1357:5: note: template<int name> typename cl::detail::param_traits<cl::detail::cl_device_info, name>::param_type cl::Device::getInfo(cl_int*) const
     getInfo(cl_int* err = NULL) const
     ^
../../src/cl_cutted.hpp:1357:5: note:   template argument deduction/substitution failed:
../../src/CLInfo.cpp:481:74: error: template argument 1 is invalid
                   << ((*i).getInfo<CL_DEVICE_THREAD_TRACE_SUPPORTED_AMD>() ? "Yes" : "No")
                                                                          ^
make[2]: *** [seti_boinc-CLInfo.o] Error 1

That should be defined in cl_ext.h from AMD's APP SDK. Check that it is in the header file that gets included automatically.
_\|/_
U r s
ID: 1800963 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1800964 - Posted: 5 Jul 2016, 22:04:00 UTC - in response to Message 1800947.  
Last modified: 5 Jul 2016, 22:04:36 UTC

The 'Shorty' task I let run to completion had an AR of 1.282434 and should have finished in about 6 minutes on the stock CUDA App...it took 44 minutes running the OpenCL App on a GTX 950. It had a few 'new' numbers in case they might help;

Judging from those counters slowdown not in GPU part of PulseFind.
Pity you can't provide build with common counters. They contain lot more info about what could give such slowdown.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1800964 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1800987 - Posted: 5 Jul 2016, 23:50:52 UTC - in response to Message 1800964.  

The stderr.txt does have the standard counters. The last version I had built, MBv8_8.08r3479_NV_SoG_x86_64-apple-darwin ran a BLC3 for 30 minutes and was about ~30% complete when I stopped it. I let this one run a 'Shorty' hoping it would finish quicker. I almost stopped this one too.
12:44:27 (70846): Can't open init data file - running in standalone mode
12:44:27 (70846): Can't open init data file - running in standalone mode
Not using mb_cmdline.txt-file, using commandline options.
Running on device number: 0
12:44:27 (70846): Can't open init data file - running in standalone mode
WARNING: init_data.xml missing
OpenCL platform detected: Apple
WARNING: BOINC supplied wrong platform!
Number of OpenCL devices found : 3 
BOINC assigns slot on device #1 of 3 devices.
WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities
DOUBLE_FP supported. 
cl_khr_fp64 supported. 
cl_APPLE_fp64_basic_ops supported. 
FERMI : true 

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit 
 System: Darwin  x86_64  Kernel: 15.5.0
CPU : Intel(R) Xeon(R) CPU           E5472  @ 3.00GHz 
 GenuineIntel x86, Family 6 Model 23 Stepping 6
 Features : FPU TSC PAE APIC MTRR MMX SSE  SSE2 HT  SSE3 SSSE3 SSE4.1  

OpenCL-kernels filename : MultiBeam_Kernels_r3483.cl 
INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3483.cl_GeForceGTX950.bin_V7_SoG_15.5.0_1011103460310f0, continue with recompile...
Info : Building Program (binary, clBuildProgram):main kernels: OK code 0
INFO: binary kernel file created
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_524288_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_8_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_16_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_32_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_64_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_128_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_256_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_512_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_1024_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_2048_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_4096_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_8192_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_16384_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_32768_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_65536_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX950_131072_gr64_lr16_wg256_tw0_r3483.bin_15.5.0_1011103460310f0, continue with recompile...
ar=1.282434  NumCfft=101213  NumGauss=0  NumPulse=56259762380  NumTriplet=56259762380
Currently allocated 201 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768
OS X optimized setiathome_v8 application
Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSSE3x OS X 64bit Build 3483 , Ported by : Raistmer, JDWhale, Urs Echternacht


OpenCL version by Raistmer, r3483

Number of OpenCL platforms:				 1


 OpenCL Platform Name:					 Apple
Number of devices:				 3
  Max compute units:				 6
  Max work group size:				 1024
  Max clock frequency:				 1316Mhz
  Max memory allocation:			 536870912
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 2147483648
  Constant buffer size:				 65536
  Max number of constant args:			 9
  Local memory type:				 Scratchpad
  Local memory size:				 49152
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 GeForce GTX 950
  Vendor:					 NVIDIA
  Driver version:				 10.11.10 346.03.10f02
  Version:					 OpenCL 1.2 
  Extensions:					 cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 
  Max compute units:				 14
  Max work group size:				 256
  Max clock frequency:				 900Mhz
  Max memory allocation:			 268435456
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 1073741824
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 ATI Radeon Barts XT Prototype
  Vendor:					 AMD
  Driver version:				 1.2 (Apr 26 2016 00:27:34)
  Version:					 OpenCL 1.2 
  Extensions:					 cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_depth_images 
  Max compute units:				 6
  Max work group size:				 1024
  Max clock frequency:				 1316Mhz
  Max memory allocation:			 536870912
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 2147483648
  Constant buffer size:				 65536
  Max number of constant args:			 9
  Local memory type:				 Scratchpad
  Local memory size:				 49152
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 GeForce GTX 950
  Vendor:					 NVIDIA
  Driver version:				 10.11.10 346.03.10f02
  Version:					 OpenCL 1.2 
  Extensions:					 cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 


Work Unit Info:
...............
Credit multiplier is :  2.85
WU true angle range is :  1.282434
Used GPU device parameters are:
	Number of compute units: 6
	Single buffer allocation size: 128MB
	Total device global memory: 2048MB
	max WG size: 1024
	local mem type: Real
	FERMI path used: yes
	LotOfMem path: yes
	LowPerformanceGPU path: no
	HighPerformanceGPU path: no
period_iterations_num=50
Spike: peak=24.49179, time=87.24, d_freq=1420292123.35, chirp=-0.70521, fft_len=128k
Autocorr: peak=17.95337, time=100.7, delay=5.4133, d_freq=1420291083.78, chirp=-18.726, fft_len=128k
Spike: peak=24.40106, time=6.711, d_freq=1420290288.28, chirp=24.561, fft_len=128k
Spike: peak=24.02709, time=6.711, d_freq=1420290288.28, chirp=24.562, fft_len=128k
Spike: peak=24.90492, time=100.7, d_freq=1420290714.81, chirp=-27.495, fft_len=128k
Spike: peak=25.7143, time=100.7, d_freq=1420290714.81, chirp=-27.499, fft_len=128k
Spike: peak=24.64154, time=100.7, d_freq=1420290714.81, chirp=-27.502, fft_len=128k
Triplet: peak=9.651677, time=33.19, period=0.675, d_freq=1420297365.98, chirp=-69.805, fft_len=128 

Best spike: peak=25.7143, time=100.7, d_freq=1420290714.81, chirp=-27.499, fft_len=128k
Best autocorr: peak=17.95337, time=100.7, delay=5.4133, d_freq=1420291083.78, chirp=-18.726, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.122e+11, d_freq=0,
	score=-12, null_hyp=0, chirp=0, fft_len=0 
Best pulse: peak=0.5118575, time=14.66, period=0.0236, d_freq=1420296732.21, score=0.9063, chirp=48.56, fft_len=16 
Best triplet: peak=9.651677, time=33.19, period=0.675, d_freq=1420297365.98, chirp=-69.805, fft_len=128 


Flopcounter: 62579306918.907852

Spike count:    6
Autocorr count: 1
Pulse count:    0
Triplet count:  1
Gaussian count: 0
Time cpu in use since last restart: 2679.5 seconds
Fftlength=8,pass=3:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual
Fftlength=8,pass=4:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual
Fftlength=8,pass=5:Tune: sum=1182.06(ms); min=5.145(ms); max=30.43(ms); mean=14.24(ms); s_mean=13.16; sleep=15(ms); delta=51; N=83; usual
Fftlength=16,pass=3:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual
Fftlength=16,pass=4:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual
Fftlength=16,pass=5:Tune: sum=825.005(ms); min=0.7342(ms); max=21.58(ms); mean=9.593(ms); s_mean=12.35; sleep=15(ms); delta=77; N=86; usual
Fftlength=32,pass=3:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual
Fftlength=32,pass=4:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual
Fftlength=32,pass=5:Tune: sum=612.242(ms); min=0.2632(ms); max=46.09(ms); mean=7.289(ms); s_mean=10.93; sleep=0(ms); delta=78; N=84; usual
Fftlength=64,pass=3:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual
Fftlength=64,pass=4:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual
Fftlength=64,pass=5:Tune: sum=542.066(ms); min=14.67(ms); max=59.64(ms); mean=16.43(ms); s_mean=15.6; sleep=15(ms); delta=1; N=33; usual
Fftlength=128,pass=3:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual
Fftlength=128,pass=4:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual
Fftlength=128,pass=5:Tune: sum=731.316(ms); min=8.496(ms); max=36.08(ms); mean=11.25(ms); s_mean=11.72; sleep=0(ms); delta=1; N=65; usual
Fftlength=256,pass=3:Tune: sum=1176.36(ms); min=6.818(ms); max=13.92(ms); mean=8.98(ms); s_mean=8.396; sleep=0(ms); delta=1; N=131; usual
Fftlength=512,pass=3:Tune: sum=600.66(ms); min=1.693(ms); max=3.697(ms); mean=2.284(ms); s_mean=2.202; sleep=0(ms); delta=1; N=263; usual
Fftlength=1024,pass=3:Tune: sum=486.072(ms); min=0.7004(ms); max=1.502(ms); mean=0.9223(ms); s_mean=0.9211; sleep=0(ms); delta=1; N=527; usual

 Gaussian_transfer_not_needed       	 total=0.0000E+00, N=0         , <>=0         , min=0         , max=0          
 Gaussian_transfer_needed           	 total=0.0000E+00, N=0         , <>=0         , min=0         , max=0          


 Gaussian_skip1_no_peak             	 total=0         , N=0         , <>=0         , min=0         , max=0          
 Gaussian_skip2_bad_group_peak      	 total=0         , N=0         , <>=0         , min=0         , max=0          
 Gaussian_skip3_too_weak_peak       	 total=0         , N=0         , <>=0         , min=0         , max=0          
 Gaussian_skip4_too_big_ChiSq       	 total=0         , N=0         , <>=0         , min=0         , max=0          
 Gaussian_skip6_low_power           	 total=0         , N=0         , <>=0         , min=0         , max=0          


 Gaussian_new_best                  	 total=0         , N=0         , <>=0         , min=0         , max=0          
 Gaussian_report                    	 total=0         , N=0         , <>=0         , min=0         , max=0          
 Gaussian_miss                      	 total=0         , N=0         , <>=0         , min=0         , max=0          


 PC_triplet_find_hit                	 total=9.7200E+02, N=972       , <>=1         , min=1         , max=1          
 PC_triplet_find_miss               	 total=7.7000E+01, N=77        , <>=1         , min=1         , max=1          


 PC_pulse_find_hit                  	 total=1.0420E+03, N=1042      , <>=1         , min=1         , max=1          
 PC_pulse_find_miss                 	 total=7.0000E+00, N=7         , <>=1         , min=1         , max=1          
 PC_pulse_find_early_miss           	 total=3.0000E+00, N=3         , <>=1         , min=1         , max=1          
 PC_pulse_find_2CPU                 	 total=1.0000E+00, N=1         , <>=1         , min=1         , max=1          


 PoT_transfer_not_needed            	 total=9.6900E+02, N=969       , <>=1         , min=1         , max=1          
 PoT_transfer_needed                	 total=8.1000E+01, N=81        , <>=1         , min=1         , max=1          

GPU device sync requested...  ...GPU device synched
13:30:22 (70846): called boinc_finish(0)


I'm still working on the Linux build. After installing the 2.91 SDK it finished compiling, but, it seems it destroyed the driver...
ID: 1800987 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1800988 - Posted: 6 Jul 2016, 0:14:01 UTC
Last modified: 6 Jul 2016, 0:47:25 UTC

Well that's not good. I reinstalled the same driver I've been using for over a year and it crashes;
20:02:45 (3185): Can't open init data file - running in standalone mode
20:02:45 (3185): Can't open init data file - running in standalone mode
Not using mb_cmdline.txt-file, using commandline options.
20:02:45 (3185): Can't open init data file - running in standalone mode
WARNING: init_data.xml missing
OpenCL platform detected: Advanced Micro Devices, Inc.
WARNING: BOINC supplied wrong platform!
Number of OpenCL devices found : 1 
BOINC assigns slot on device #0.
WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit 
 System: Linux  x86_64  Kernel: 3.13.0-77-generic
 CPU   : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
 4 core(s), Speed :  1998.00[pre]20:02:45 (3185): Can't open init data file - running in standalone mode
20:02:45 (3185): Can't open init data file - running in standalone mode
Not using mb_cmdline.txt-file, using commandline options.
20:02:45 (3185): Can't open init data file - running in standalone mode
WARNING: init_data.xml missing
OpenCL platform detected: Advanced Micro Devices, Inc.
WARNING: BOINC supplied wrong platform!
Number of OpenCL devices found : 1 
BOINC assigns slot on device #0.
WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit 
 System: Linux  x86_64  Kernel: 3.13.0-77-generic
 CPU   : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
 4 core(s), Speed :  1998.000 MHz
 L1 : 64 KB, Cache : 3072 KB
 Features : FPU TSC PAE APIC MTRR MMX SSE  SSE2 HT PNI SSSE3 SSE4_1  

OpenCL-kernels filename : MultiBeam_Kernels_r3482.cl 
INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3482.clHD5_Barts.bin_V7_SoG_15263, continue with recompile...
Info : Building Program (binary, clBuildProgram):main kernels: OK code 0
INFO: binary kernel file created
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_524288_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_8_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_16_gr64_lr16_wg256_tw0_r3482.bin_15263,0 MHz
 L1 : 64 KB, Cache : 3072 KB
 Features : FPU TSC PAE APIC MTRR MMX SSE  SSE2 HT PNI SSSE3 SSE4_1  

OpenCL-kernels filename : MultiBeam_Kernels_r3482.cl 
INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3482.clHD5_Barts.bin_V7_SoG_15263, continue with recompile...
Info : Building Program (binary, clBuildProgram):main kernels: OK code 0
INFO: binary kernel file created
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_524288_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_8_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_16_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_32_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_64_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_128_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_256_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_512_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_1024_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_2048_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_4096_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_8192_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_16384_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_32768_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_65536_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_Barts_131072_gr64_lr16_wg256_tw0_r3482.bin_15263, continue with recompile...
ar=0.775000  NumCfft=1169  NumGauss=6087368  NumPulse=1197108460  NumTriplet=2300559776
Currently allocated 229 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768
Linux optimized setiathome_v8 application
Version info: SSSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSSE3x Linux64 Build 3482 , Ported by : Raistmer, JDWhale, Urs Echternacht


OpenCL version by Raistmer, r3482

AMD HD5 version by Raistmer

Number of OpenCL platforms:				 1


 OpenCL Platform Name:					 AMD Accelerated Parallel Processing
Number of devices:				 1
  Max compute units:				 12
  Max work group size:				 256
  Max clock frequency:				 775Mhz
  Max memory allocation:			 1073741824
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 1073741824
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Profiling timer offset:			 4156058960
  Global free memory:				 4156058976
  SIMD per compute unit:			 1
  SIMD width:					 16
  SIMD instruction width:			 5
  Wavefront width:				 64
  Global mem channels:				 8
  Global mem channel banks:			 16
  Global mem channel bank width:		 256
  Local mem size per compute unit:		 32768
  Local mem banks:				 32
  Thread trace supported:			 No
  Board Name:					 AMD Radeon HD 6800 Series  
  Name:						 Barts
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1526.3
  Version:					 OpenCL 1.2 AMD-APP (1526.3)
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event 


Work Unit Info:
...............
Credit multiplier is :  2.85
WU true angle range is :  0.775000
Used GPU device parameters are:
	Number of compute units: 12
	Single buffer allocation size: 128MB
	Total device global memory: 1024MB
	max WG size: 256
	local mem type: Real
	LotOfMem path: yes
	LowPerformanceGPU path: no
	HighPerformanceGPU path: no
period_iterations_num=50
SIGSEGV: segmentation violation
Stack trace (24 frames):
./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x653dc0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fdcefd34330]
/usr/lib/fglrx/libamdocl64.so(+0x5eb0d1)[0x7fdced0130d1]
/usr/lib/fglrx/libamdocl64.so(+0x57a6e3)[0x7fdcecfa26e3]
/usr/lib/fglrx/libamdocl64.so(+0x5764dd)[0x7fdcecf9e4dd]
/usr/lib/fglrx/libamdocl64.so(+0x5d7ffc)[0x7fdcecfffffc]
/usr/lib/fglrx/libamdocl64.so(+0x5d815d)[0x7fdced00015d]
/usr/lib/fglrx/libamdocl64.so(+0x5d990b)[0x7fdced00190b]
/usr/lib/fglrx/libamdocl64.so(+0x5289d0)[0x7fdcecf509d0]
/usr/lib/fglrx/libamdocl64.so(+0x4fa2f4)[0x7fdcecf222f4]
/usr/lib/fglrx/libamdocl64.so(+0x4fa4e8)[0x7fdcecf224e8]
/usr/lib/fglrx/libamdocl64.so(+0x4fc1c2)[0x7fdcecf241c2]
/usr/lib/fglrx/libamdocl64.so(+0x4fca09)[0x7fdcecf24a09]
/usr/lib/fglrx/libamdocl64.so(+0x4bbf20)[0x7fdcecee3f20]
/usr/lib/fglrx/libamdocl64.so(+0x4bc0d6)[0x7fdcecee40d6]
/usr/lib/fglrx/libamdocl64.so(+0x4b1cdb)[0x7fdceced9cdb]
/usr/lib/fglrx/libamdocl64.so(clEnqueueNDRangeKernel+0x3e2)[0x7fdceceb2212]
./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x426023]
./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x4124c3]
./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x566f89]
./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x56fd52]
./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x405fc7]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fdcef980f45]
./MBv8_8.08r3482_ssse3_clGPU_x86_64-pc-linux-gnu[0x4071cc]

Exiting...

Now what...

So, I booted into Ubuntu 12.04 which has the same driver and got the same driver crash. Both systems work with the older r3306 App.
ID: 1800988 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1801011 - Posted: 6 Jul 2016, 2:59:18 UTC

It seems to be working with the Repository driver showing OpenCL 1.2 AMD-APP (1800.11). The older App r3306 was compiled with SDK 2.8.1 and works with OpenCL 1.2 AMD-APP (1526.3). For some reason the new App compiled with SDK 2.9.1 doesn't work with the older driver 14.6. Strange considering 14.6 and SDK 2.9.1 was released about the same time. I dunno.
There doesn't seem to be much difference between the older App and the newer r3482, at least not on my old cards; http://setiathome.berkeley.edu/result.php?resultid=5023490256
At least they seem to be validating.
ID: 1801011 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1801046 - Posted: 6 Jul 2016, 8:40:35 UTC - in response to Message 1800987.  

The stderr.txt does have the standard counters.

Good!
Then try to catch task that processed by OpenCL app on wingman's host too.
And compare your hit/miss counters with wingman's ones.

In that particular result triplet miss looks higher than usual but hard to say w/o comparison with wingman's on the same task.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1801046 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1801047 - Posted: 6 Jul 2016, 8:44:37 UTC - in response to Message 1801011.  

http://setiathome.berkeley.edu/result.php?resultid=5023490256

this one doesn't contain performance statistics to look for.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1801047 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1801073 - Posted: 6 Jul 2016, 14:15:42 UTC - in response to Message 1801046.  
Last modified: 6 Jul 2016, 14:50:25 UTC

The stderr.txt does have the standard counters.

Good!
Then try to catch task that processed by OpenCL app on wingman's host too.
And compare your hit/miss counters with wingman's ones.

In that particular result triplet miss looks higher than usual but hard to say w/o comparison with wingman's on the same task.

The Standalone tasks I posted was run on Main with the CUDA 'Special' App here; http://setiathome.berkeley.edu/result.php?resultid=5023973186
Run time: 2 min 39 sec
CPU time: 2 min 31 sec
Spike count: 6
Autocorr count: 1
Pulse count: 0
Triplet count: 1
Gaussian count: 0

There aren't any counters, but the results are the same.
Unfortunately, the nVidia OpenCL App took 44 minutes to finish the task where the Cuda App took 2.6 minutes.
As has been obvious for some time, there is something seriously wrong with the nVidia OpenCL App in Darwin 15.x. That's why I've been recommending the CUDA App to Beta for the last 5 or 6 months. I see there still isn't a Mac CUDA App at Beta. It's also getting difficult to find any Mac nVidia Host working at Beta, my guess is they're giving up on running the same non-working OpenCL Apps for quite some time. My experience with the latest nVidia OpenCL App is about the same as this Host at Beta; http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=58196 Those tasks taking 1.7 hours on that Host would finish in about 34 minutes running the CUDA App I recommended last week. The BLC3 tasks taking 7+ hours should finish in a little over an hour with the CUDA App. This situation is very similar to the results on Main with the nVidia 730s showing similar differences in Windows on the BLC tasks.
I don't plan on running very many Tasks with an App that takes 44 minutes to finish a shorty.
ID: 1801073 · Report as offensive
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · 33 · 34 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.