1)
Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
(Message 1865309)
Posted 3 May 2017 by Gianfranco Lizzio Post: @ base clock with blc02 the runtime is 7min 17 sec. I hope this can help you. |
2)
Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
(Message 1865294)
Posted 3 May 2017 by Gianfranco Lizzio Post: @TBar Performance between OSX and Linux are almost the same. In Linux I get better times just because I overclock the card. The memories works at 7700 Mhz against the factory 7010 Mhz and the core graphics run at 1480Mhz. With these overclock ​​the card works at 50 degrees with fans at 50%. |
3)
Message boards :
Number crunching :
Mac OS Sierra
(Message 1844291)
Posted 25 Jan 2017 by Gianfranco Lizzio Post: Could you post stderr of results. Also, is it iGPU indeed or Intel CPU AVX2 instead? That iGPU thing looks confusing... Here is the stderr output shmget in attach_shmem: Invalid argument 13:31:33 (3207): Can't set up shared mem: -1. Will run in standalone mode. Not using mb_cmdline.txt-file, using commandline options. Illegal value for gpu_device_num: -1 in BOINC Client 0.0.0 WARNING: boinc_get_opencl_ids failed with code -33 OpenCL platform detected: Apple WARNING: BOINC supplied wrong platform! Number of OpenCL devices found : 1 BOINC assigns slot on device #0. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities Build features: SETI8 Non-graphics OpenCL USE_OPENCL_INTEL OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 CPU : Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz GenuineIntel x86, Family 6 Model 60 Stepping 3 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX1.0 OpenCL-kernels filename : MultiBeam_Kernels_r3609.cl INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3609.cl_GeForceGTX960.bin_V7_15.6.0_1011143460315f0, continue with recompile... Info : Building Program (binary, clBuildProgram):main kernels: OK code 0 INFO: binary kernel file created WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_524288_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_8_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_16_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_32_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_64_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_128_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_256_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_512_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_1024_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_2048_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_4096_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_8192_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_16384_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_32768_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_65536_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... WARNING: can't open binary kernel file for oclFFT plan: .//MB_clFFTplan_GeForceGTX960_131072_gr64_lr8_wg64_tw3_r3609.bin_15.6.0_1011143460315f0, continue with recompile... ar=0.775000 NumCfft=58365 NumGauss=305171592 NumPulse=56611558094 NumTriplet=113221446922 Currently allocated 209 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 OS X Optimized setiathome_v8 Application Version info: SSSE3j (Intel, Core 2-Optimized v8-nographics) V5.13 by Alex Kan SSSE3j OS X 64bit Build 3609 , Ported by: Raistmer, JDWhale, Urs Echternacht Compiled by: Gianfranco with Optimized fftw-3.3.6-pl1 OpenCL version by Raistmer, r3609 Number of OpenCL platforms: 1 OpenCL Platform Name: Apple Number of devices: 1 Max compute units: 8 Max work group size: 1024 Max clock frequency: 1278Mhz Max memory allocation: 1073741824 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 4294967296 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: No Name: GeForce GTX 960 Vendor: NVIDIA Driver version: 10.11.14 346.03.15f06 Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.775000 Used GPU device parameters are: Number of compute units: 8 Single buffer allocation size: 128MB Total device global memory: 4096MB max WG size: 1024 local mem type: Real LotOfMem path: no LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=50 Triplet: peak=8.642654, time=80.39, period=0.5439, d_freq=1418924863.12, chirp=0.91687, fft_len=128 Spike: peak=22.39557, time=100.7, d_freq=1418919461.29, chirp=1.0592, fft_len=128k Spike: peak=22.53507, time=100.7, d_freq=1418919461.29, chirp=1.0629, fft_len=128k Spike: peak=22.37876, time=33.55, d_freq=1418917201.18, chirp=2.3476, fft_len=128k Spike: peak=22.36402, time=95.63, d_freq=1418924030.24, chirp=2.7506, fft_len=32k Spike: peak=22.46222, time=95.63, d_freq=1418924030.23, chirp=2.8097, fft_len=32k Spike: peak=22.14054, time=87.24, d_freq=1418918419.01, chirp=-3.9244, fft_len=128k Spike: peak=22.10414, time=20.13, d_freq=1418916408.05, chirp=-3.9336, fft_len=128k Spike: peak=23.33476, time=20.13, d_freq=1418916408.05, chirp=-3.9373, fft_len=128k Spike: peak=23.51858, time=20.13, d_freq=1418916408.05, chirp=-3.941, fft_len=128k Spike: peak=22.6291, time=20.13, d_freq=1418916408.05, chirp=-3.9447, fft_len=128k Spike: peak=22.81873, time=6.711, d_freq=1418918946.48, chirp=5.2886, fft_len=128k Spike: peak=23.78655, time=6.711, d_freq=1418918946.49, chirp=5.2905, fft_len=128k Autocorr: peak=17.81484, time=6.711, delay=2.6452, d_freq=1418920199.14, chirp=-13.254, fft_len=128k Spike: peak=22.05251, time=46.98, d_freq=1418924025.88, chirp=-17.67, fft_len=128k Triplet: peak=10.88554, time=11.73, period=4.204, d_freq=1418918701.96, chirp=-31.149, fft_len=64 Gaussian: peak=3.108306, mean=0.5762569, ChiSq=1.411669, time=54.53, d_freq=1418917957.9, score=0.2513781, null_hyp=2.108531, chirp=39.174, fft_len=16k Gaussian: peak=3.330397, mean=0.5623342, ChiSq=1.364179, time=56.2, d_freq=1418918023.62, score=1.399199, null_hyp=2.141914, chirp=39.174, fft_len=16k Gaussian: peak=3.202162, mean=0.5691997, ChiSq=1.41157, time=57.88, d_freq=1418918089.34, score=1.176808, null_hyp=2.159723, chirp=39.174, fft_len=16k Gaussian: peak=3.273643, mean=0.5212154, ChiSq=1.375195, time=78.01, d_freq=1418919872.95, score=4.595944, null_hyp=2.320152, chirp=49.452, fft_len=16k Gaussian: peak=3.619983, mean=0.5178809, ChiSq=1.260599, time=79.69, d_freq=1418919955.91, score=4.680799, null_hyp=2.260839, chirp=49.452, fft_len=16k Gaussian: peak=3.586394, mean=0.5256274, ChiSq=1.337057, time=81.37, d_freq=1418920038.88, score=4.236526, null_hyp=2.279013, chirp=49.452, fft_len=16k Best spike: peak=23.78655, time=6.711, d_freq=1418918946.49, chirp=5.2905, fft_len=128k Best autocorr: peak=17.81484, time=6.711, delay=2.6452, d_freq=1418920199.14, chirp=-13.254, fft_len=128k Best gaussian: peak=3.619983, mean=0.5178809, ChiSq=1.260599, time=79.69, d_freq=1418919955.91, score=4.680799, null_hyp=2.260839, chirp=49.452, fft_len=16k Best pulse: peak=5.988931, time=96.91, period=2.359, d_freq=1418918839.55, score=0.9994, chirp=-30.691, fft_len=512 Best triplet: peak=10.88554, time=11.73, period=4.204, d_freq=1418918701.96, chirp=-31.149, fft_len=64 Spike count: 13 Autocorr count: 1 Pulse count: 0 Triplet count: 2 Gaussian count: 6 Time cpu in use since last restart: 101.4 seconds Gaussian_transfer_not_needed total=2.7918E+04, N=27918 , <>=1 , min=1 , max=1 Gaussian_transfer_needed total=1.8000E+01, N=18 , <>=1 , min=1 , max=1 Gaussian_skip1_no_peak total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_skip2_bad_group_peak total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_skip3_too_weak_peak total=0 , N=0 , <>=0 , min=0 , max=0 Gaussian_skip4_too_big_ChiSq total=0 , N=1321 , <>=0 , min=0 , max=0 Gaussian_skip6_low_power total=1290 , N=1321 , <>=0 , min=0 , max=1 Gaussian_new_best total=47 , N=47 , <>=1 , min=1 , max=1 Gaussian_report total=6 , N=6 , <>=1 , min=1 , max=1 Gaussian_miss total=1271 , N=1271 , <>=1 , min=1 , max=1 PC_triplet_find_hit total=6.6660E+03, N=6666 , <>=1 , min=1 , max=1 PC_triplet_find_miss total=3.1300E+02, N=313 , <>=1 , min=1 , max=1 PC_pulse_find_hit total=3.4830E+03, N=3483 , <>=1 , min=1 , max=1 PC_pulse_find_miss total=6.0000E+00, N=6 , <>=1 , min=1 , max=1 PC_pulse_find_early_miss total=3.0000E+00, N=3 , <>=1 , min=1 , max=1 PC_pulse_find_2CPU total=1.0000E+00, N=1 , <>=1 , min=1 , max=1 PoT_transfer_not_needed total=6.6630E+03, N=6663 , <>=1 , min=1 , max=1 PoT_transfer_needed total=3.1700E+02, N=317 , <>=1 , min=1 , max=1 SleepQuantum total=0.0000E+00, N=0 , <>=0 , min=0 , max=0 GPU device sync requested... ...GPU device synched 13:36:00 (3207): called boinc_finish(0) |
4)
Message boards :
Number crunching :
Mac OS Sierra
(Message 1844271)
Posted 25 Jan 2017 by Gianfranco Lizzio Post: The NV SoG App takes about twice as long to be performed respect the iGPU. |
5)
Message boards :
Number crunching :
Mac OS Sierra
(Message 1844270)
Posted 25 Jan 2017 by Gianfranco Lizzio Post: What do you get on your nVidia card when you compile an App using the nVidia SoG path? TBar, these are the results obtained with my Nvidia GPU and the SoG App KWSN-Darwin-MBbench v2.1.07 Running on Andromeda at Mer 25 Gen 07:41:57 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : reference_work_unit_r3215.wu Listing executable(s) in /APPS : MBv8_8.22r3609_ssse3_NV_SoG_x86_64-apple-darwin Listing executable in /REF_APPs : MBv8_8.22r3603_avx_x86_64-apple-darwin --------------------------------------------------- Current WU: reference_work_unit_r3215.wu --------------------------------------------------- Running default app with command : MBv8_8.22r3603_avx_x86_64-apple-darwin 940.89 real 938.01 user 0.77 sys Elapsed Time: ………………………………… 941 seconds --------------------------------------------------- Running app with command : MBv8_8.22r3609_ssse3_NV_SoG_x86_64-apple-darwin 463.46 real 154.04 user 94.91 sys Elapsed Time : ……………………………… 463 seconds Speed compared to default : 203 % ----------------- Comparing results Result : Strongly similar, Q= 99.50% --------------------------------------------------- Done with reference_work_unit_r3215.wu. I hope that these results will be helpful! |
6)
Message boards :
Number crunching :
Mac OS Sierra
(Message 1844259)
Posted 25 Jan 2017 by Gianfranco Lizzio Post: I compiled a version for iGPU using the new fftw-3.3.6.-pl1 with AVX & AVX2 SIMD extensions with the following results KWSN-Darwin-MBbench v2.1.07 Running on Andromeda at Mar 24 Gen 16:43:50 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : reference_work_unit_r3215.wu Listing executable(s) in /APPS : MBv8_8.22r3609_ssse3_Intel_x86_64-apple-darwin Listing executable in /REF_APPs : MBv8_8.22r3603_avx_x86_64-apple-darwin --------------------------------------------------- Current WU: reference_work_unit_r3215.wu --------------------------------------------------- Skipping default app MBv8_8.22r3603_avx_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 938 seconds --------------------------------------------------- Running app with command : MBv8_8.22r3609_ssse3_Intel_x86_64-apple-darwin Elapsed Time : ……………………………… 271 seconds Speed compared to default : 346 % ----------------- Comparing results Result : Strongly similar, Q= 99.51% --------------------------------------------------- Done with reference_work_unit_r3215.wu. |
7)
Message boards :
Number crunching :
Mac OS Sierra
(Message 1844089)
Posted 23 Jan 2017 by Gianfranco Lizzio Post: That will probably be when nVidia releases a New Web driver that supports the latest update to Darwin 15.6, which should be in a day or two. TBar here is a link where you can download the latest Nvidia web driver build 15G1217 compatible with Darwin 15.6.0 https://images.nvidia.com/mac/pkg/346/WebDriver-346.03.15f06.pkg Gianfranco |
8)
Message boards :
Number crunching :
Mac OS Sierra
(Message 1843515)
Posted 21 Jan 2017 by Gianfranco Lizzio Post: @TBar I suppose you can't use the same AVX2 FFTW library with a SSE41 CPU although FFTW hints that you can; If the CPU does not support AVX SIMD extensions you must compile fftw-3.3.6-bl1 only with --enable-sse2 So, on your machine there isn't any difference with run-times between AVX & AVX2? I confirm that there are no signifcant difference in execution time when use AVX2 extension. It would be interesting to see if the same happens using new Skylake and Kabylake i7 processors. Gianfranco |
9)
Message boards :
Number crunching :
Mac OS Sierra
(Message 1843473)
Posted 21 Jan 2017 by Gianfranco Lizzio Post: Still not having any success trying to compile FFTW with just the avx2 SIMD, and using multiple selections doesn't work very well. @TBar I compiled with success fftw-3.3.6-bl1 using AVX2 SIMD and the MBv8 App using the same SIMD. After that I run the KWSN-OSX-bench comparing your App and mine and that's the result: Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : test_work_unit.wu Listing executable(s) in /APPS : MBv8_8.17r3588_avx2_x86_64-apple-darwin Listing executable in /REF_APPs : MBv8_8.22r3603_avx2_x86_64-apple-darwin --------------------------------------------------- Current WU: test_work_unit.wu --------------------------------------------------- Running default app with command : MBv8_8.22r3603_avx2_x86_64-apple-darwin Elapsed Time: ………………………………… 1256 seconds --------------------------------------------------- Running app with command : MBv8_8.17r3588_avx2_x86_64-apple-darwin Elapsed Time : ……………………………… 1284 seconds Speed compared to default : 97 % ----------------- Comparing results Result : Strongly similar, Q= 99.82% --------------------------------------------------- Done with test_work_unit.wu. As you see on my i7 4770K Haswell the results are strongly similar and the same thing happens when I compare my App compiled with AVX SIMD and the one with AVX2 SIMD. Gianfranco |
10)
Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
(Message 1813134)
Posted 28 Aug 2016 by Gianfranco Lizzio Post: Hello Gianfranco, Hi TBar, I got the same error but Petri send me a new cudaAcceleration.cu that works correctly without the error. Gianfranco |
11)
Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
(Message 1811254)
Posted 22 Aug 2016 by Gianfranco Lizzio Post:
Hi TBar i compiled without problem the nVidia OpenCL App on El Capitan 10.11.6. But the result is the same as the last time, the App returns incorrect result and is 5x slower than Petri's code. Gianfranco |
12)
Message boards :
Number crunching :
Some considerations regarding OpenCL MultiBeam app tuning from algorithm view
(Message 1794237)
Posted 7 Jun 2016 by Gianfranco Lizzio Post: And finally (perhaps most easy way ;) ) you can look into task's stderr for any of my builds: Raistmer as you suggested blc5 data reports ar=0.007159 NumCfft=123489 NumGauss=0 NumPulse=54509597824 NumTriplet=67492265376 blc6 data reports ar=0.006972 NumCfft=99877 NumGauss=0 NumPulse=29801966464 NumTriplet=42750031008 and so your assumptions were correct. |
13)
Message boards :
Number crunching :
Some considerations regarding OpenCL MultiBeam app tuning from algorithm view
(Message 1794189)
Posted 7 Jun 2016 by Gianfranco Lizzio Post: Raistmer, with Arecibo data the processing times were always the same for same AR. But now with data of Greenbank it is no longer so , and with the same AR times are very different, depending on whether it's blc2 , blc3 , blc5 or blc6 data . The question then is, what changes as blc changes? I asked this question to Eric without receiving any response from him... |
14)
Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
(Message 1792605)
Posted 1 Jun 2016 by Gianfranco Lizzio Post: I have found another host that generate a lot of inconclusive result running Darwin 15.5.0 and the OpenCL App http://setiathome.berkeley.edu/show_host_detail.php?hostid=7552348 |
15)
Message boards :
Number crunching :
CUDA Toolkit 8.0 Available for Developers
(Message 1791147)
Posted 28 May 2016 by Gianfranco Lizzio Post: https://developer.nvidia.com/cuda-toolkit |
16)
Message boards :
Number crunching :
OpenCL WU test on OS X 10.11.5
(Message 1789949)
Posted 24 May 2016 by Gianfranco Lizzio Post: Urs i'm not a coder, so thank you for all your advice. |
17)
Message boards :
Number crunching :
OpenCL WU test on OS X 10.11.5
(Message 1789833)
Posted 23 May 2016 by Gianfranco Lizzio Post: I'm using Xcode 6.1.1 without problem, on my main computer build. |
18)
Message boards :
Number crunching :
OpenCL WU test on OS X 10.11.5
(Message 1789828)
Posted 23 May 2016 by Gianfranco Lizzio Post: Now that IS Strange. I've never had that Error before, and I have installed FFTW before in El Capitan. Compile as follow ./configure CC="clang" CXX="clang" --enable-float --enable-sse2 --enable-avx --enable-threads --with-combined-threads --build=x86_64-apple-darwin make make install |
19)
Message boards :
Number crunching :
OpenCL WU test on OS X 10.11.5
(Message 1789796)
Posted 23 May 2016 by Gianfranco Lizzio Post: Please explain. Before you wrote that home-compiled one provided incorrect result too. Or I'm missing smth? I'm sorry for my bad english...the result of my compiled app match perfectly with result of official v8.10 app, but the result are wrong...I hope I was clear this time. Gianfranco |
20)
Message boards :
Number crunching :
OpenCL WU test on OS X 10.11.5
(Message 1789773)
Posted 23 May 2016 by Gianfranco Lizzio Post: Which method are you using to compile? For some reason my 10.11.4 & 5 system has decided it can't find make and won't install automake or autoconf even though I've turned off Apple's usr blockage. 1. Run _autosetup 2. Run compile using CC="clang" CXX="clang" as Urs suggested, with CPPFLAGS=" -O3 -DUSE_I386_OPTIMIZATIONS -DUSE_I386_XEON -DUSE_FFTW -DUSE_OPENCL -DUSE_OPENCL_NV -DASYNC_SPIKE -DSETI7 -DSETI8 -DOCL_CHIRP3 -DOCL_ZERO_COPY -DUSE_SSE3" 3. make all The app created found Gaussian signal using Lunatics test WU, but the result.sah don't match the reference result! The result generated with my compiled app match perfectly with the one generated with official app v8.10 on main. So the problem is not to search in the app... Gianfranco |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.