Monitoring inconclusive GBT validations and harvesting data for testing

Author	Message
TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1813309 - Posted: 29 Aug 2016, 0:59:08 UTC - in response to Message 1813220. Last modified: 29 Aug 2016, 1:04:05 UTC ...So far the validator appears to be coping with the limited 'unofficial' Mac and Linux builds in circulation, though if significant co-validations begin to occur I will likely have to raise a discussion with project staff... I'm still scratching my head over this one. The Current Stock Mac nVidia App is producing nearly 100% Inconclusive results when running the Current Mac OSX. I decided to see how one was doing with those co-validations. The fourth valid task in the list seems suspect; http://setiathome.berkeley.edu/workunit.php?wuid=2249369730 5123535836 5879059 28 Aug 2016, 15:43:41 UTC 28 Aug 2016, 17:23:45 UTC Completed and validated 1,548.38 96.09 54.07 SETI@home v8 v8.00 (opencl_nvidia_mac) x86_64-apple-darwin 5123535837 8018045 28 Aug 2016, 15:44:19 UTC 28 Aug 2016, 20:18:50 UTC Completed and validated 1,077.36 416.23 54.07 SETI@home v8 v8.00 (opencl_nvidia_mac) x86_64-apple-darwin The First Host has, In progress (3) Â· Validation pending (48) Â· Validation inconclusive (38) Â· Valid (40) Â· Invalid (1) Â· Error (0) The Second has; In progress (108) Â· Validation pending (454) Â· Validation inconclusive (349) Â· Valid (258) Â· Invalid (9) Â· Error (2) I didn't go any further, I was afraid of what I might find. All the Mac nVidia Hosts have this problem running the current OSX, and have had it for a while. Seems to me even the rawest Alpha is working better than the Current Stock Mac nVidia App. There is a much better Mac nVidia App at Beta, but only a couple of people are running it. ID: 1813309 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1813319 - Posted: 29 Aug 2016, 2:08:29 UTC - in response to Message 1813309. Simple answer. I hold X-branch to higher standards than are applied to the broken stock Mac app that is pushing itself to the top of the list as primary justification for this thread ;) While the Mac/Linux penetration of unfinished petri based apps is low, there is enough information accumulated to know there is more work to be done before it can hit primetime. Petri routinely acknowledges this himself. My understanding of Richard's work here is identifying the major problems, so they can potentially be addressed. With Petri's alphas already a known quantity, somewhat contained, and being actively worked on, then rationalising the actual brokenness on the main offenders can begin. An unfinished application causing minimal impact, is not, on the bigger scale, a problem, but a 'development tool'. As soon as you step from Mac/Linux onto Windows, then the potential for damage increases probably tenfold or more. Just weight of numbers. Care is needed to ensure the situation doesn't step from crappy Mac stock builds into abject chaos. Chaos is my realm, and it isn't pretty. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1813319 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1813324 - Posted: 29 Aug 2016, 2:38:37 UTC - in response to Message 1813319. Last modified: 29 Aug 2016, 2:39:53 UTC My point is, just about any Mac nVidia App that produces over around 20% first time validations is an improvement over the situation that has existed for quite some time. The current App at Beta was first posted back in January and produces over 95% first time validations. One of the couple Hosts running it is having one of his best days in terms of activity, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=73656&offset=20 I remember someone saying that Baseline App at Beta didn't need Beta testing... The problem with the Intel iGPUs is also well known. I ran across a couple of those in that Mac Valid list. I would suspect any 'Valid' task that included a Mac nVidia GPU & Intel iGPU, and also any two iGPUs that validated with each other. There are quite a few Intel iGPUs out there. I don't remember anyone saying Petri's App should be posted for anyone, however, it should be recognized that in the case of the current Mac nVidia App just about anything is an improvement. The Intel iGPUs aren't far behind either. ID: 1813324 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1813333 - Posted: 29 Aug 2016, 2:51:06 UTC - in response to Message 1813324. Yes, however you're trying to draw a line between broken, and somewhat less broken, which falls over with the concept of 'inconclusive'. It's better to be all good, or more or less completely broken. Airy fairy results will pollute the situation, to the point that the worst stock apps feel like trolls. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1813333 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1813374 - Posted: 29 Aug 2016, 7:06:12 UTC - in response to Message 1813324. The Intel iGPUs aren't far behind either. Same question: what limits in plan class should be implemented? SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1813374 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1813377 - Posted: 29 Aug 2016, 7:15:21 UTC - in response to Message 1813374. The Intel iGPUs aren't far behind either. Same question: what limits in plan class should be implemented? Alternatively: what can we do to fix it (them)? ID: 1813377 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1813378 - Posted: 29 Aug 2016, 7:17:32 UTC - in response to Message 1813377. Last modified: 29 Aug 2016, 7:19:36 UTC The Intel iGPUs aren't far behind either. Same question: what limits in plan class should be implemented? Alternatively: what can we do to fix it (them)? We can limit distribution to known good hosts. Though after more than month of waiting I start to doubt even in this - still no response from Eric ("only" week from last one though :D). SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1813378 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1813380 - Posted: 29 Aug 2016, 7:34:12 UTC - in response to Message 1813378. The Intel iGPUs aren't far behind either. Same question: what limits in plan class should be implemented? Alternatively: what can we do to fix it (them)? We can limit distribution to known good hosts. Though after more than month of waiting I start to doubt even in this - still no response from Eric ("only" week from last one though :D). I did see a team message yesterday from Angela: "Last week was [Eric's] first week back to work after a two week vacation, so things were pretty crazy for him." ID: 1813380 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1813383 - Posted: 29 Aug 2016, 8:03:20 UTC - in response to Message 1813380. Last modified: 29 Aug 2016, 8:04:45 UTC I did see a team message yesterday from Angela: "Last week was [Eric's] first week back to work after a two week vacation, so things were pretty crazy for him." Yep, hobby leaves more time for action usually ;D Meantime: there is nothing to offer regarding iGPU exclusion list - concentrate on it. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1813383 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1813387 - Posted: 29 Aug 2016, 8:28:52 UTC - in response to Message 1813383. Considering Apple uses strictly Intel CPUs, and has been known to have 'arrangements' with Intel, I wouldn't be surprised to find the Apple OpenCL and Intel OpenCL problems are related. Apple uses their own version of OpenCL. There are No Apple OpenCL drivers that I know of. There are No Apple Intel iGPU drivers either. That means the Intel iGPUs are using the Apple driver, just as the AMD & nVidia GPUs are using the Apple OpenCL. Solve the problem with the Apple OpenCL and you might solve the problem with the Intel iGPUs. It's just a thought... ID: 1813387 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1813390 - Posted: 29 Aug 2016, 9:07:35 UTC I haven't read through this thread completely but i want to post my findings. Look at these results! It looks like that Petris code is more accurate than the Apple code in my mind. Quick compare: CPU: Spike: peak=24.09495, time=33.55, d_freq=1420751144.89, chirp=-0.028652, fft_len=128k Spike: peak=24.42812, time=33.55, d_freq=1420751144.89, chirp=-0.033273, fft_len=128k Spike: peak=24.31746, time=33.55, d_freq=1420751144.88, chirp=-0.037895, fft_len=128k Autocorr: peak=19.48975, time=73.82, delay=5.1587, d_freq=1420751429.61, chirp=-7.0919, fft_len=128k Spike: peak=24.78413, time=68.79, d_freq=1420750198.94, chirp=-95.62, fft_len=32k APPLE: Spike: peak=24.09507, time=33.55, d_freq=1420751144.89, chirp=-0.028652, fft_len=128k Spike: peak=24.4283, time=33.55, d_freq=1420751144.89, chirp=-0.033273, fft_len=128k Spike: peak=24.31762, time=33.55, d_freq=1420751144.88, chirp=-0.037895, fft_len=128k Autocorr: peak=19.48845, time=73.82, delay=5.1587, d_freq=1420751429.61, chirp=-7.0919, fft_len=128k Spike: peak=24.11478, time=68.79, d_freq=1420750198.94, chirp=-95.62, fft_len=32k PETRI: Spike: peak=24.09493, time=33.55, d_freq=1420751144.89, chirp=-0.028652, fft_len=128k Spike: peak=24.42814, time=33.55, d_freq=1420751144.89, chirp=-0.033273, fft_len=128k Spike: peak=24.31745, time=33.55, d_freq=1420751144.88, chirp=-0.037895, fft_len=128k Autocorr: peak=19.48978, time=73.82, delay=5.1587, d_freq=1420751429.61, chirp=-7.0919, fft_len=128k Spike: peak=24.78424, time=68.79, d_freq=1420750198.94, chirp=-95.62, fft_len=32k This was a WU that has been crunched by cpu, apple and by Petris tweaked code. The problem to me seems like this precision drift started way back or is my assumption wrong? http://setiathome.berkeley.edu/workunit.php?wuid=2248302050 Complete STDERR below CPU: <stderr_txt> Build features: SETI8 Non-graphics FFTW USE_SSE3 x64 CPUID: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 FMA3 SSE4.1 SSE4.2 AVX ar=0.410125 NumCfft=200105 NumGauss=1151864510 NumPulse=226294165221 NumTriplet=452692139687 In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Windows optimized setiathome_v8 application Based on Intel, Core 2-optimized v8-nographics V5.13 by Alex Kan SSE3xj Win64 Build 3330 , Ported by : Raistmer, JDWhale SETI8 update by Raistmer Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.410125 Spike: peak=24.09495, time=33.55, d_freq=1420751144.89, chirp=-0.028652, fft_len=128k Spike: peak=24.42812, time=33.55, d_freq=1420751144.89, chirp=-0.033273, fft_len=128k Spike: peak=24.31746, time=33.55, d_freq=1420751144.88, chirp=-0.037895, fft_len=128k Autocorr: peak=19.48975, time=73.82, delay=5.1587, d_freq=1420751429.61, chirp=-7.0919, fft_len=128k Spike: peak=24.78413, time=68.79, d_freq=1420750198.94, chirp=-95.62, fft_len=32k Best spike: peak=24.78413, time=68.79, d_freq=1420750198.94, chirp=-95.62, fft_len=32k Best autocorr: peak=19.48975, time=73.82, delay=5.1587, d_freq=1420751429.61, chirp=-7.0919, fft_len=128k Best gaussian: peak=2.818341, mean=0.5207298, ChiSq=1.416708, time=12.58, d_freq=1420752531.85, score=-0.9713898, null_hyp=2.204585, chirp=10.797, fft_len=16k Best pulse: peak=3.243532, time=6.544, period=0.9724, d_freq=1420754890.84, score=0.9946, chirp=-87.382, fft_len=64 Best triplet: peak=0, time=-2.122e+011, period=0, d_freq=0, chirp=0, fft_len=0 Flopcounter: 38853887058724.039000 Spike count: 4 Autocorr count: 1 Pulse count: 0 Triplet count: 0 Gaussian count: 0 Wallclock time elapsed since last restart: 8021.9 seconds 10:32:47 (3944): called boinc_finish(0) </stderr_txt> APPLE: <stderr_txt> OpenCL platform detected: Apple Number of OpenCL devices found : 1 BOINC assigns slot on device #0. Info: BOINC provided OpenCL device ID used DOUBLE_FP supported. cl_khr_fp64 supported. cl_APPLE_fp64_basic_ops supported. FERMI : true Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW SSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 CPU : Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz GenuineIntel x86, Family 6 Model 58 Stepping 9 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX1.0 OpenCL-kernels filename : MultiBeam_Kernels_r3321.cl ar=0.410125 NumCfft=200105 NumGauss=1151864510 NumPulse=226294165221 NumTriplet=452692139687 Currently allocated 209 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 OS X optimized setiathome_v8 application Version info: SSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSE3x OS X 64bit Build 3321 , Ported by : Raistmer, JDWhale, Urs Echternacht OpenCL version by Raistmer, r3321 Number of OpenCL platforms: 1 OpenCL Platform Name: Apple Number of devices: 1 Max compute units: 2 Max work group size: 1024 Max clock frequency: 745Mhz Max memory allocation: 134217728 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: No Name: GeForce GT 640M Vendor: NVIDIA Driver version: 10.10.13 310.42.25f01 Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.410125 Used GPU device parameters are: Number of compute units: 2 Single buffer allocation size: 128MB Total device global memory: 512MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: no period_iterations_num=50 Spike: peak=24.09507, time=33.55, d_freq=1420751144.89, chirp=-0.028652, fft_len=128k Spike: peak=24.4283, time=33.55, d_freq=1420751144.89, chirp=-0.033273, fft_len=128k Spike: peak=24.31762, time=33.55, d_freq=1420751144.88, chirp=-0.037895, fft_len=128k GPU device sync requested... ...GPU device synched Termination request detected or computations are finished. GPU device synched, exiting... OpenCL platform detected: Apple Number of OpenCL devices found : 1 BOINC assigns slot on device #0. Info: BOINC provided OpenCL device ID used DOUBLE_FP supported. cl_khr_fp64 supported. cl_APPLE_fp64_basic_ops supported. FERMI : true Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW SSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 CPU : Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz GenuineIntel x86, Family 6 Model 58 Stepping 9 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX1.0 OpenCL-kernels filename : MultiBeam_Kernels_r3321.cl ar=0.410125 NumCfft=200105 NumGauss=1151864510 NumPulse=226294165221 NumTriplet=452692139687 Currently allocated 209 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 OS X optimized setiathome_v8 application Version info: SSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSE3x OS X 64bit Build 3321 , Ported by : Raistmer, JDWhale, Urs Echternacht OpenCL version by Raistmer, r3321 Number of OpenCL platforms: 1 OpenCL Platform Name: Apple Number of devices: 1 Max compute units: 2 Max work group size: 1024 Max clock frequency: 745Mhz Max memory allocation: 134217728 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Queue properties: Out-of-Order: No Name: GeForce GT 640M Vendor: NVIDIA Driver version: 10.10.13 310.42.25f01 Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.410125 Used GPU device parameters are: Number of compute units: 2 Single buffer allocation size: 128MB Total device global memory: 512MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: no period_iterations_num=50 Spike: peak=24.09507, time=33.55, d_freq=1420751144.89, chirp=-0.028652, fft_len=128k Spike: peak=24.4283, time=33.55, d_freq=1420751144.89, chirp=-0.033273, fft_len=128k Spike: peak=24.31762, time=33.55, d_freq=1420751144.88, chirp=-0.037895, fft_len=128k GPU device sync requested... ...GPU device synched Termination request detected or computations are finished. GPU device synched, exiting... OpenCL platform detected: Apple Number of OpenCL devices found : 1 BOINC assigns slot on device #0. Info: BOINC provided OpenCL device ID used DOUBLE_FP supported. cl_khr_fp64 supported. cl_APPLE_fp64_basic_ops supported. FERMI : true Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW SSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 CPU : Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz GenuineIntel x86, Family 6 Model 58 Stepping 9 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX1.0 OpenCL-kernels filename : MultiBeam_Kernels_r3321.cl ar=0.410125 NumCfft=200105 NumGauss=1151864510 NumPulse=226294165221 NumTriplet=452692139687 Currently allocated 209 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Restarted at 2.00 percent. Used GPU device parameters are: Number of compute units: 2 Single buffer allocation size: 128MB Total device global memory: 512MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: no period_iterations_num=50 GPU device sync requested... ...GPU device synched Termination request detected or computations are finished. GPU device synched, exiting... OpenCL platform detected: Apple Number of OpenCL devices found : 1 BOINC assigns slot on device #0. Info: BOINC provided OpenCL device ID used DOUBLE_FP supported. cl_khr_fp64 supported. cl_APPLE_fp64_basic_ops supported. FERMI : true Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW SSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 CPU : Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz GenuineIntel x86, Family 6 Model 58 Stepping 9 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX1.0 OpenCL-kernels filename : MultiBeam_Kernels_r3321.cl ar=0.410125 NumCfft=200105 NumGauss=1151864510 NumPulse=226294165221 NumTriplet=452692139687 Currently allocated 209 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Restarted at 2.00 percent. Used GPU device parameters are: Number of compute units: 2 Single buffer allocation size: 128MB Total device global memory: 512MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: no period_iterations_num=50 Autocorr: peak=19.48845, time=73.82, delay=5.1587, d_freq=1420751429.61, chirp=-7.0919, fft_len=128k GPU device sync requested... ...GPU device synched Termination request detected or computations are finished. GPU device synched, exiting... OpenCL platform detected: Apple Number of OpenCL devices found : 1 BOINC assigns slot on device #0. Info: BOINC provided OpenCL device ID used DOUBLE_FP supported. cl_khr_fp64 supported. cl_APPLE_fp64_basic_ops supported. FERMI : true Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW SSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 CPU : Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz GenuineIntel x86, Family 6 Model 58 Stepping 9 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX1.0 OpenCL-kernels filename : MultiBeam_Kernels_r3321.cl ar=0.410125 NumCfft=200105 NumGauss=1151864510 NumPulse=226294165221 NumTriplet=452692139687 Currently allocated 209 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Restarted at 44.13 percent. Used GPU device parameters are: Number of compute units: 2 Single buffer allocation size: 128MB Total device global memory: 512MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: no period_iterations_num=50 GPU device sync requested... ...GPU device synched Termination request detected or computations are finished. GPU device synched, exiting... OpenCL platform detected: Apple Number of OpenCL devices found : 1 BOINC assigns slot on device #0. Info: BOINC provided OpenCL device ID used DOUBLE_FP supported. cl_khr_fp64 supported. cl_APPLE_fp64_basic_ops supported. FERMI : true Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY OCL_CHIRP3 ASYNC_SPIKE FFTW SSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 CPU : Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz GenuineIntel x86, Family 6 Model 58 Stepping 9 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX1.0 OpenCL-kernels filename : MultiBeam_Kernels_r3321.cl ar=0.410125 NumCfft=200105 NumGauss=1151864510 NumPulse=226294165221 NumTriplet=452692139687 Currently allocated 209 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Restarted at 66.08 percent. Used GPU device parameters are: Number of compute units: 2 Single buffer allocation size: 128MB Total device global memory: 512MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: no period_iterations_num=50 Spike: peak=24.11478, time=68.79, d_freq=1420750198.94, chirp=-95.62, fft_len=32k Best spike: peak=24.4283, time=33.56, d_freq=1420751144.89, chirp=-0.033273, fft_len=128k Best autocorr: peak=19.48845, time=73.82, delay=5.1587, d_freq=1420751429.61, chirp=-7.0919, fft_len=128k Best gaussian: peak=2.805941, mean=0.5229927, ChiSq=1.398399, time=12.58, d_freq=1420752531.85, score=-1.300901, null_hyp=2.174957, chirp=10.797, fft_len=16k Best pulse: peak=3.245201, time=6.54, period=0.9724, d_freq=1420754890.84, score=0.9951, chirp=-87.382, fft_len=64 Best triplet: peak=0, time=-2.122e+11, period=0, d_freq=0, chirp=0, fft_len=0 Flopcounter: 19269428530118.300781 Spike count: 4 Autocorr count: 1 Pulse count: 0 Triplet count: 0 Gaussian count: 0 Time cpu in use since last restart: 18.1 seconds GPU device sync requested... ...GPU device synched 16:01:17 (34168): called boinc_finish(0) </stderr_txt> PETRI: <stderr_txt> setiathome_CUDA: Found 4 CUDA device(s): Device 1: GeForce GTX 750 Ti, 1998 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 1, pciSlotID = 0 Device 2: GeForce GTX 750 Ti, 2000 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 2, pciSlotID = 0 Device 3: GeForce GTX 750 Ti, 2000 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 4, pciSlotID = 0 Device 4: GeForce GTX 750 Ti, 2000 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 5, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 2 setiathome_CUDA: CUDA Device 2 specified, checking... Device 2: GeForce GTX 750 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 750 Ti Using pfb = 8 from command line args Using pfp = 240 from command line args Using unroll = 12 from command line args setiathome v8 enhanced x41p_zi3d, Cuda 7.50 special Compiled with NVCC 8.0, using 6.5 libraries. Modifications done by petri33. Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.410125 Sigma 3 Thread call stack limit is: 1k Spike: peak=24.09493, time=33.55, d_freq=1420751144.89, chirp=-0.028652, fft_len=128k Spike: peak=24.42814, time=33.55, d_freq=1420751144.89, chirp=-0.033273, fft_len=128k Spike: peak=24.31745, time=33.55, d_freq=1420751144.88, chirp=-0.037895, fft_len=128k Autocorr: peak=19.48978, time=73.82, delay=5.1587, d_freq=1420751429.61, chirp=-7.0919, fft_len=128k Spike: peak=24.78424, time=68.79, d_freq=1420750198.94, chirp=-95.62, fft_len=32k cudaAcc_free() called... cudaAcc_free() running... cudaAcc_free() PulseFind freed... cudaAcc_free() Gaussfit freed... cudaAcc_free() AutoCorrelation freed... 1,2,3,4,5,6,7,8,9,10,10,11,12,cudaAcc_free() DONE. 13 Best spike: peak=24.78424, time=68.79, d_freq=1420750198.94, chirp=-95.62, fft_len=32k Best autocorr: peak=19.48978, time=73.82, delay=5.1587, d_freq=1420751429.61, chirp=-7.0919, fft_len=128k Best gaussian: peak=2.818346, mean=0.5207286, ChiSq=1.416716, time=12.58, d_freq=1420752531.85, score=-0.9712276, null_hyp=2.204598, chirp=10.797, fft_len=16k Best pulse: peak=3.243528, time=6.544, period=0.9724, d_freq=1420754890.84, score=0.9946, chirp=-87.382, fft_len=64 Best triplet: peak=0, time=-2.122e+11, period=0, d_freq=0, chirp=0, fft_len=0 Flopcounter: 40132345887159.960938 Spike count: 4 Autocorr count: 1 Pulse count: 0 Triplet count: 0 Gaussian count: 0 09:44:58 (30909): called boinc_finish(0) </stderr_txt> _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1813390 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1813394 - Posted: 29 Aug 2016, 9:31:52 UTC - in response to Message 1813390. I haven't read through this thread completely but i want to post my findings. Look at these results! It looks like that Petris code is more accurate than the Apple code in my mind. ... Exactly. This is one of those rare cases where more broken is better than slightly less broken, just because of the way validation works. Let's be clear I'm in full support of Petri's work, however you're wanting to shove unfinished code in for the sakes of performance, when the landscape is turd. The stock Apple situation needs resolution before new technologies are introduced. New technologies are not a solution, but an incremental refinement. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1813394 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1813395 - Posted: 29 Aug 2016, 9:39:43 UTC - in response to Message 1813394. Let's be clear I'm in full support of Petri's work, however you're wanting to shove unfinished code in for the sakes of performance, when the landscape is turd. The stock Apple situation needs resolution before new technologies are introduced. New technologies are not a solution, but an incremental refinement. No, not at this Point! I just wanted to post my findings how come the Apple code is not targeted aswell. It just seems odd and this kind of proved that we must dig further in the original code because in one way or Another some parts of the optimisation tree seems to differ from the Berkeley compiled executable with no optimisations in Place. I just wanted to Point out that it seems like the precision drifting started way back Before he started to improve it for Nvidia cards and someone needs to dig deeper to see why it has started to not align to original code. Has someone here a list of applications that you could do a "checklist" on that has a strongly similar of atleast 99.95% or so, the best would ofcourse be 100%. I didn't know that other applications differed so much more compared to Petris but still considers valid or is that Apple code not distributed by the S@H servers? Just confused now here. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1813395 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1813396 - Posted: 29 Aug 2016, 9:42:06 UTC - in response to Message 1813394. however you're wanting to shove unfinished code in for the sakes of performance Not this time, now i'm more than ever concerned that other applications has precision drifting aswell and this needs to be addressed properly to pinpoint where it all began Deep within those burried lines of code. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1813396 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 1813397 - Posted: 29 Aug 2016, 9:43:12 UTC - in response to Message 1813395. Not only do we have a checklist. We also need the results.sah file the apps produce so we can run it against a known valid result file. Having the stderr doesn't help much if it doesn't supply what was found. ID: 1813397 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1813399 - Posted: 29 Aug 2016, 9:53:28 UTC - in response to Message 1813395. Yeah, I agree there is chaos/confusion. IMO Just let Richard do his mission, so that we have clear decks to make things better. The confusion (not only yours) comes from the change in telescopes that the OpenCL apps cope with better, And Petri running with the newest tech+hardware without experience in how Seti validation works (i.e. 30% inconclusive isn't good enough). At present, obviously broken apps (like stock Mac) are valuable, in that they give some context to the changes. Since we have that context, and clear representative targets, then development goals are clear. No amount of piling extra crud on the existing situation will help, only confuse it more. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1813399 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1813400 - Posted: 29 Aug 2016, 9:55:17 UTC - in response to Message 1813396. however you're wanting to shove unfinished code in for the sakes of performance Not this time, now i'm more than ever concerned that other applications has precision drifting aswell and this needs to be addressed properly to pinpoint where it all began Deep within those burried lines of code. Yep. That's partly with the telescope change, partly with Coders, partly with too much change at once. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1813400 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 1813403 - Posted: 29 Aug 2016, 10:00:21 UTC One good news is that I am setting up a mac VM so I am able to provide some result files :) ID: 1813403 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1813404 - Posted: 29 Aug 2016, 10:02:07 UTC Having been watching this thread and seeing how people have pitched in to help out (and I'm really pleased to see that), I think the main issues are turning out to be: A) Petri's code is good, but it could still be better - the code itself allows some wasteful inconclusive results to demand an extra replication be sent out before validation. But Petri - with help from others and the results reported here - is working on that. It isn't finished yet. B) The stock Apple applications, and the intel_gpu situation, are a mess - even more so when they combine in the Mac intel_gpu applications. (B) should remind us that a full science application isn't made up of pure code alone. There's what we would understand as pure coding, and then there at least one, and for GPUs apps at least two, compilation stages. Compiler settings are at least as important as raw code in producing a mathematically-acceptable application. One of the GPU compilation steps is under our control - the ordinary C-code compiler. My suspicion is that something went wrong with the stock Mac CPU app at this stage - a compiler option set for speed instead of accuracy, perhaps. If we could contact whoever compiled that app in the first place, it could be fixed. The second GPU compilation stage - for OpenCL, at least - is not under our control to the same extent. It happens on the user's computer, automatically, using tools installed on each user's machine as part of the driver package. It seems as if the intel_gpus' problems lie in this area - the intel_gpu applications are problematic at Einstein too, although their apps run fine here with an older driver. I wish the BOINC scientific community was large and coherent enough to form a critical mass that Intel would need to work with - but I don't think we've ever succeeded in reaching that stage. ID: 1813404 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1813406 - Posted: 29 Aug 2016, 10:08:52 UTC - in response to Message 1813404. Pretty sure Petri would be happy with that assessment, as I am. Will do some thinking, and possibly research, on the Intel OpenCL situation. My fear is that Intel's implementation of OpenCL might possibly have been about the (marketing) Flops, more than the actual accurate results. Time will tell. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1813406 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.