Monitoring inconclusive GBT validations and harvesting data for testing

Author	Message
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1827502 - Posted: 30 Oct 2016, 11:37:47 UTC - in response to Message 1827494. Thanks. Would be good to check other collected so far overflows. In the same result representation. Overflow w/u's arn't the only 1's exhibiting these unusual numbers across similar Nvidia hosts as a few that I listed wern't overflows. Cheers. ID: 1827502 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1827504 - Posted: 30 Oct 2016, 11:45:27 UTC - in response to Message 1827470. Last modified: 30 Oct 2016, 11:47:41 UTC Macs have been all 64 bit for a while, so yes, always use 64 bit. Never had this problem before. From my understanding, it's trying to free the same memory space twice. I've looked at malloc_a.cpp but don't see what I'm looking for. Not really sure about what I'm looking for... Build w/o ASYNC_SPIKE then will see. OK, but, the SoG build doesn't use ASYNC_SPIKE...and it has the same Error. Examples of crash with SoG build? The last one; Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 ... Credit multiplier is : 2.85 WU true angle range is : 0.775000 Used GPU device parameters are: Number of compute units: 6 Single buffer allocation size: 128MB Total device global memory: 2048MB max WG size: 1024 local mem type: Real FERMI path used: yes LotOfMem path: yes LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=10 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 cpu_GPUState->gaussians.index=0 Triplet: peak=8.642653, time=80.39, period=0.5439, d_freq=1418924863.12, chirp=0.91687, fft_len=128 MBv8_8.18r3549_NV-SoG_ssse3_x86_64-apple-darwin(1751,0x700000228000) malloc: * error for object 0x120440000: pointer being freed was not allocated * set a breakpoint in malloc_error_break to debug SIGABRT: abort called Crashed executable name: MBv8_8.18r3549_NV-SoG_ssse3_x86_64-apple-darwin Machine type Intel 80486 (64-bit executable) System version: Macintosh OS 10.11.6 build 15G1108 Fri Oct 28 17:33:10 2016 0 MBv8_8.18r3549_NV-SoG_ssse3_x86_64-apple-darwin 0x0000000107d93614 std::__1::__tree<std::__1::__value_type<int, PROCINFO>, std::__1::__map_value_compare<int, std::__1::__value_type<int, PROCINFO>, std::__1::less<int>, true>, std::__1::allocator<std::__1::__value_type<int, PROCINFO> > >::__insert_unique(std::__1::__value_type<int, PROCINFO> const&) + 1076 1 MBv8_8.18r3549_NV-SoG_ssse3_x86_64-apple-darwin 0x0000000107d82876 COPROCS::clear() + 4006 2 libsystem_platform.dylib 0x00007fff9099c52a _sigtramp + 26 3 libsystem_malloc.dylib 0x00007fff9a5795a1 malloc_zone_malloc + 71 4 libsystem_c.dylib 0x00007fff8d3946df abort + 129 5 libsystem_malloc.dylib 0x00007fff9a57b041 szone_size + 0 6 GeForceGLDriverWeb 0x000000010a15c1aa gldWaitForObject + 4947 7 GeForceGLDriverWeb 0x000000010a165e3b gldExecuteKernel + 201 8 OpenCL 0x00007fff9c2b34a7 OpenCL + 13479 9 OpenCL 0x00007fff9c2d00da clSetEventCallback + 5888 10 OpenCL 0x00007fff9c2d39cc clFinish + 761 11 libdispatch.dylib 0x00007fff9aa6240b _dispatch_client_callout + 8 12 libdispatch.dylib 0x00007fff9aa6703b _dispatch_queue_drain + 754 13 libdispatch.dylib 0x00007fff9aa6d707 _dispatch_queue_invoke + 549 14 libdispatch.dylib 0x00007fff9aa6240b _dispatch_client_callout + 8 15 libdispatch.dylib 0x00007fff9aa6629b _dispatch_root_queue_drain + 1890 16 libdispatch.dylib 0x00007fff9aa65b00 _dispatch_worker_thread3 + 91 17 libsystem_pthread.dylib 0x00007fff9a2af4de _pthread_wqthread + 1129 18 libsystem_pthread.dylib 0x00007fff9a2ad341 start_wqthread + 13 Thread 6 crashed with X86 Thread State (64-bit): rax: 0x0100001f rbx: 0x00000000 rcx: 0x700000225588 rdx: 0x00000028 rdi: 0x7000002255f0 rsi: 0x00000003 rbp: 0x7000002255d0 rsp: 0x700000225588 r8: 0x00004603 r9: 0x00000000 r10: 0x000003b0 r11: 0x00000206 r12: 0x000003b0 r13: 0x00000028 r14: 0x7000002255f0 r15: 0x00004603 rip: 0x7fff8a952f72 rfl: 0x00000206 ... I use the same MacOSX10.9.sdk on All builds. The difference is with the sah_config.h <malloc.h> setting; /* Define to 1 if you have the <malloc.h> header file. / / #undef HAVE_MALLOC_H */ It doesn't seem to bother the Intel build either way. I'll have to run more tests with it undef. All the previous builds have it defined and pointed to the MacOSX10.9.sdk. ID: 1827504 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1827511 - Posted: 30 Oct 2016, 12:16:44 UTC - in response to Message 1827502. Thanks. Would be good to check other collected so far overflows. In the same result representation. Overflow w/u's arn't the only 1's exhibiting these unusual numbers across similar Nvidia hosts as a few that I listed wern't overflows. Cheers. Links to non-overflow inconclusives? SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1827511 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1827513 - Posted: 30 Oct 2016, 12:27:01 UTC - in response to Message 1827511. Thanks. Would be good to check other collected so far overflows. In the same result representation. Overflow w/u's arn't the only 1's exhibiting these unusual numbers across similar Nvidia hosts as a few that I listed wern't overflows. Cheers. Links to non-overflow inconclusives? I've already provided them and if you didn't catch them then that is your problem so stop trying to pass the buck off. These SoG apps were far too green to start with, they have got slightly better over time, but IMHO they still have a way to go yet (even if you may not think so, others think otherwise). Cheers. ID: 1827513 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1827515 - Posted: 30 Oct 2016, 12:43:32 UTC - in response to Message 1827504. Last modified: 30 Oct 2016, 12:55:29 UTC Try to rebuild from r3551. If crash remained try to rebuild with OCL_VERBOSE defined. unfortunately OS X stack listing starts inside some runtime function so unclear where in app crash occurs. OCL_VERBOSE could show place. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1827515 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1827521 - Posted: 30 Oct 2016, 14:26:10 UTC - in response to Message 1827515. The build goes fine, but it fails on the kernel build; shmget in attach_shmem: Invalid argument 10:20:51 (49549): Can't set up shared mem: -1. Will run in standalone mode. Not using mb_cmdline.txt-file, using commandline options. Maximum single buffer size set to:192MB oclFFT max WG size override set to:128 SpikeFind FFT size threshold override set to:2048 Number of period iterations for PulseFind set to 8 Running on device number: 2 OpenCL platform detected: Apple Number of OpenCL devices found : 3 BOINC assigns slot on device #3 of 3 devices. Info: BOINC provided OpenCL device ID used DOUBLE_FP supported. cl_khr_fp64 supported. cl_APPLE_fp64_basic_ops supported. FERMI : true Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_CHIRP3 ASYNC_SPIKE FFTW SSSE3 64bit System: Darwin x86_64 Kernel: 15.6.0 CPU : Intel(R) Xeon(R) CPU E5472 @ 3.00GHz GenuineIntel x86, Family 6 Model 23 Stepping 6 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 OpenCL-kernels filename : MultiBeam_Kernels_r3551.cl INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3551.cl_GeForceGTX950.bin_V7_15.6.0_1011143460315f0, continue with recompile... Error : Building Program (binary, clBuildProgram):main kernels: not OK code -11 CL file build log on device GeForce GTX 950 <program source>:3641:40: error: expected ')' __global float4* restricted PoT,__global uint* restricted result_flag) { ^ <program source>:3640:37: note: to match this '(' void PC_find_triplets_kernel_twin_cl(int ul_FftLength, int len_power, float triplet_thresh_base, int AdvanceBy, int PoTLen, ^ <program source>:3655:32: error: use of undeclared identifier 'PoT' __global float4* fp_PulsePot= PoT + ul_PoT + TOffset * (fft_len4)+neg2561024; ^ <program source>:3687:3: error: use of undeclared identifier 'result_flag' result_flag[result_coordinate+neg*RESULT_SIZE]=1; ^ ID: 1827521 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1827522 - Posted: 30 Oct 2016, 14:27:41 UTC - in response to Message 1827521. just comment out whole that kernel or wait next rev SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1827522 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1827524 - Posted: 30 Oct 2016, 14:57:26 UTC - in response to Message 1827522. Last modified: 30 Oct 2016, 15:31:39 UTC My comments didn't work, however, it seems to be working with the .cl file from r3548. Whether it will work correctly though, I dunno. It didn't crash on the first 2 using; /* Define to 1 if you have the <malloc.h> header file. */ #define HAVE_MALLOC_H 1 However, the r3551 NV build is still much slower than the r3550 Intel build from back here; http://setiathome.berkeley.edu/forum_thread.php?id=80158&postid=1827435 The times and Q score are about the same with r3551. ID: 1827524 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1827538 - Posted: 30 Oct 2016, 15:51:37 UTC - in response to Message 1827467. Both hosts seem to have clean records when it comes to Invalids. http://setiathome.berkeley.edu/results.php?hostid=7940818&offset=0&show_names=0&state=3&appid= such number of inconclusives (and first few checked are non-overflows) can't be considered as "clean records". Smth. wrong on that host. BTW, its other inconclusives also have too big Spike on 128k fft. And another observation - through that inconclusives list driver version changes. 372.70, 372.90. Maybe inconsistent dirver update results in such behavior. Most of inconclusives are from 28 & 29 October. Only 5 from dates earlier than 24 Oct. And per 24Oct host had bigger driver version: Driver version: 375.57 So, I would attribute its breakage to incorrect driver re-installation. Ah, okay. I had only looked at the Invalid count for that host being 0. I hadn't dug deeper into the current Inconclusives. I did verify that the 375.57 driver was not the one used in the WU I posted, since there was a warning about it in the Windows 10 - Yea or Nay? thread. Again, however, I didn't dig back to see if it had been in use previously. Still, it does seem like the vast majority of that host's tasks validate on the first try, with just the occasional wild Spike or Autocorr signal popping up. ID: 1827538 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1827544 - Posted: 30 Oct 2016, 16:29:26 UTC - in response to Message 1827522. It ran a couple more without crashing. It's just much slower than the Intel build; Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 11au16aa.28481.85822.12.39.56.wu blc5_2bit_guppi_57449_43932_HIP78775_0013.26700.831.18.27.53.vlar.wu Listing executable(s) in /APPS : MBv8_8.18r3551_NV_ssse3_x86_64-apple-darwin Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 11au16aa.28481.85822.12.39.56.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 3630 seconds --------------------------------------------------- Running app with command : MBv8_8.18r3551_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2 818.42 real 84.46 user 188.16 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 819 seconds Speed compared to default : 443 % ----------------- Comparing results Result : Strongly similar, Q= 98.12% --------------------------------------------------- Done with 11au16aa.28481.85822.12.39.56.wu. Current WU: blc5_2bit_guppi_57449_43932_HIP78775_0013.26700.831.18.27.53.vlar.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 8062 seconds --------------------------------------------------- Running app with command : MBv8_8.18r3551_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2 2165.85 real 605.08 user 268.95 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 2166 seconds Speed compared to default : 372 % ----------------- Comparing results Result : Strongly similar, Q= 99.96% --------------------------------------------------- ID: 1827544 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1827556 - Posted: 30 Oct 2016, 18:06:04 UTC Last modified: 30 Oct 2016, 18:22:14 UTC This might be interesting. The only major difference, besides a couple out of order signals, is the autocorr; SSSE3ux OS X 64bit Build 3550: Best autocorr: peak=16.59647, time=60.4, delay=4.3732, d_freq=1420584475.99, chirp=-24.198, fft_len=128k SSE3xj Win32 Build 3528 : Best autocorr: peak=16.60995, time=87.24, delay=5.9092, d_freq=1420588467.31, chirp=28.998, fft_len=128k Everything else appears very close. SSSE3ux OS X 64bit Build 3550 SSE3xj Win32 Build 3528 Hmmm, outvoted by the Windows Cartel again. It's now running on a Linux CPU. We'll see what that says in a couple hours... ID: 1827556 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1827567 - Posted: 30 Oct 2016, 19:25:13 UTC - in response to Message 1827556. Ah, that "ux" finally attracted my attention. Please return to xj path by defining USE_JSPF. Other paths could work (or not) but currently unmaintained for GPU build. So, to speedup debugging better to stay on same path with Windows builds. I posted full line of defines for Windows build before, do comparison. Also, try to add OCL_SYNCHED to NV build. Will it help with speed? SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1827567 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 1827587 - Posted: 30 Oct 2016, 21:44:45 UTC - in response to Message 1827494. Last modified: 30 Oct 2016, 21:49:49 UTC Thanks. Would be good to check other collected so far overflows. In the same result representation. Sorry about the delay in getting this overflow result comparison to you. C:\Users\qingb\Documents\TestEnvironment>compare i5-4210U.sah GT840m_r3528.sah Q100 ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 21 21 21 0 0 21 21 21 0 Autocorr 0 7 7 7 0 0 7 7 7 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 2 2 2 0 0 2 2 2 0 Triplet 0 0 0 0 0 0 0 0 0 0 Best Spike 0 0 0 0 0 0 0 0 0 0 Best Autocorr 0 0 0 0 0 0 0 0 0 0 Best Gaussian 0 0 0 0 0 0 0 0 0 0 Best Pulse 0 0 0 0 0 0 0 0 0 0 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 0 30 30 30 0 0 30 30 30 0 Result : Strongly similar, Q= 99.98% C:\Users\qingb\Documents\TestEnvironment>compare i5-4210U.sah GT840m_r3548.sah Q100 ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 21 21 21 0 0 21 21 21 0 Autocorr 0 7 7 7 0 0 7 7 7 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 2 2 2 0 0 2 2 2 0 Triplet 0 0 0 0 0 0 0 0 0 0 Best Spike 0 0 0 0 0 0 0 0 0 0 Best Autocorr 0 0 0 0 0 0 0 0 0 0 Best Gaussian 0 0 0 0 0 0 0 0 0 0 Best Pulse 0 0 0 0 0 0 0 0 0 0 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 0 30 30 30 0 0 30 30 30 0 Result : Strongly similar, Q= 99.98% Data file and Results can be found here Also the link to the post here ID: 1827587 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1827593 - Posted: 30 Oct 2016, 22:52:48 UTC - in response to Message 1827567. Last modified: 30 Oct 2016, 23:14:01 UTC Ah, that "ux" finally attracted my attention. Please return to xj path by defining USE_JSPF. Other paths could work (or not) but currently unmaintained for GPU build. So, to speedup debugging better to stay on same path with Windows builds. I posted full line of defines for Windows build before, do comparison. Also, try to add OCL_SYNCHED to NV build. Will it help with speed? Well, the SoG build still crashes on the BLC tasks, and uses lots of CPU; Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit SSSE3ux OS X 64bit Build 3553 period_iterations_num=8 Spike: peak=25.07172, time=8.83, d_freq=1616892188.79, chirp=0, fft_len=128 Pulse: peak=4.935277, time=45.82, period=11.3, d_freq=1616892188.79, score=1.036, chirp=0, fft_len=128 MBv8_8.18r3553_NV-SoG_ssse3_x86_64-apple-darwin(15547,0x7000001a5000) malloc: * error for object 0x12f34c000: pointer being freed was not allocated * set a breakpoint in malloc_error_break to debug SIGABRT: abort called And takes very long on the reference_work_unit_r3215.wu; Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 2110 seconds --------------------------------------------------- Running app with command : MBv8_8.18r3553_NV-SoG_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 0 1012.02 real 703.05 user 159.28 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 1012 seconds Speed compared to default : 208 % ----------------- Comparing results Result : Strongly similar, Q= 99.50% The NV build takes too long as well. It worked on the BLC5 task for quite a while and then crashed. Maybe it doesn't like JSPF? Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit SSSE3xj OS X 64bit Build 3552 Current WU: 11au16aa.28481.85822.12.39.56.wu Running app with command : MBv8_8.18r3552_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2 824.21 real 92.48 user 203.38 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 824 seconds Speed compared to default : 440 % period_iterations_num=8 Pulse: peak=5.321575, time=45.99, period=13.06, d_freq=1228150097.09, score=1.003, chirp=-4.4361, fft_len=4k D: threshold 1.503875; unscaled peak power: 1.507655 exceeds threshold for 0.2514% Autocorr: peak=18.36389, time=74.45, delay=5.4228, d_freq=1228144214.73, chirp=-18.592, fft_len=128k Pulse: peak=2.62872, time=45.9, period=5.577, d_freq=1228148715.01, score=1.019, chirp=-61.396, fft_len=2k D: threshold 0.4152116; unscaled peak power: 0.4208242 exceeds threshold for 1.352% Pulse: peak=6.128602, time=45.82, period=11.56, d_freq=1228147742.68, score=1.031, chirp=70.205, fft_len=128 D: threshold 0.0541962; unscaled peak power: 0.05564297 exceeds threshold for 2.67% MBv8_8.18r3552_NV_ssse3_x86_64-apple-darwin(15806,0x700000122000) malloc: * error for object 0x134864000: pointer being freed was not allocated * set a breakpoint in malloc_error_break to debug SIGABRT: abort called The Intel build has sped up; Build features: SETI8 Non-graphics OpenCL USE_OPENCL_INTEL OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit SSSE3xj OS X 64bit Build 3551 Current WU: 11au16aa.28481.85822.12.39.56.wu Running app with command : MBv8_8.18r3551_Intel_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2 444.13 real 81.17 user 130.82 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 444 seconds Speed compared to default : 817 % Current WU: blc5_2bit_guppi_57449_43932_HIP78775_0013.26700.831.18.27.53.vlar.wu Running app with command : MBv8_8.18r3551_Intel_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2 1248.19 real 189.15 user 326.76 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 1248 seconds Speed compared to default : 645 % ----------------- Comparing results Result : Strongly similar, Q= 99.96% I don't know about the OCL_SYNCHED, but it slowed the Intel build down quite a bit when I tried it there. Right now it looks as though the Intel build is far superior. ID: 1827593 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1827663 - Posted: 31 Oct 2016, 5:10:54 UTC I'm going to go ahead and post this one, since it involves what I assume is the latest Petri Special. Workunit 2309927200 (02fe09ad.27386.3344.8.35.76) Task 5251952577 (S=7, A=3, P=3, T=2, G=0) SSE3xj Win32 Build 3330 Task 5251952578 (S=7, A=3, P=3, T=2, G=0) x41p_zi3k, Cuda 8.00 special From what I can see, all the reported signals seem to match up quite well. It appears that the only significant discrepancy is down in the "Best gaussian" report, even though neither app actually reported a Gaussian signal. SSE3xj Win32 Build 3330 Best gaussian: peak=3.905206, mean=0.586291, ChiSq=1.31821, time=62.91, d_freq=1420745941.17, score=-2.003441, null_hyp=2.082193, chirp=38.498, fft_len=16k x41p_zi3k, Cuda 8.00 special Best gaussian: peak=4.36072, mean=0.5805977, ChiSq=1.177906, time=64.59, d_freq=1420746005.57, score=-2.007056, null_hyp=2.005376, chirp=38.44, fft_len=16k ID: 1827663 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1827671 - Posted: 31 Oct 2016, 6:02:38 UTC - in response to Message 1827663. Last modified: 31 Oct 2016, 6:05:07 UTC Hi Jeff, If you can hoard that along with others for in depth analysis at a later date, that would be great. That particular Gaussian scenario looks quite similar to X-branch Pre-v8 migration, where the main codebase required dialling in a few compiler options. Aside on the Cuda generalisation: In between too much work and home stuff going on, I've managed to isolate why my 980 machine freaks out with the optimisations on occasion (having dumped another 88 tasks last night, replicating last weekend's freakout. Something I'd been waiting for.). It's a case of the error and exception handling needing rationalisation: extensive rework to capture the new kinds of exceptional circumstances that can be generated by asynchronous (and memory hungry) code. That's actually good news for the long run, because Multibeam has had return codes where there should be exception handlers, and exception handlers that induce unstable states for some time. Fortunately looks like the limited usefulness of the boincapi debug output's days might be numbered in this particular branch. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1827671 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1827674 - Posted: 31 Oct 2016, 7:07:05 UTC - in response to Message 1827435. So, why is the Intel build faster on the nVidia cards? Nice find! Maybe Intel Crippling is back again or something?! I don't know! I can only guess. They've done it in the past and may very well do so again :) http://www.agner.org/optimize/blog/read.php?i=49 _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1827674 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1827682 - Posted: 31 Oct 2016, 7:48:25 UTC - in response to Message 1827593. Last modified: 31 Oct 2016, 7:48:43 UTC Do NV build with OCL_VERBOSE. It will produce long log - only few last lines before the crash will be interesting ones. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1827682 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1827687 - Posted: 31 Oct 2016, 8:12:15 UTC - in response to Message 1827674. So, why is the Intel build faster on the nVidia cards? Nice find! Maybe Intel Crippling is back again or something?! I don't know! I can only guess. They've done it in the past and may very well do so again :) http://www.agner.org/optimize/blog/read.php?i=49 Those times it could be circumvented by refusing from Intel's DLL usage, using statical linkage and manual SIMD level selection. I did that in AKv8 codebase but then some anonymous complains rised versus legality of Intel + GPL (BOINC) combo. The decision was to abandon Intel compiler. That cost few dozens % of performance to SETI project (considering that SIMD builds could be incorporated into stock and CPU provides most of project power still). SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1827687 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1827689 - Posted: 31 Oct 2016, 8:20:32 UTC - in response to Message 1827687. Last modified: 31 Oct 2016, 8:27:21 UTC Certainly decided to abandon about $3000 dollars worth of work and personal Intel compiler licences myself, after having the incompatibilities with GPL pointed out to me. Frankly if I need a team of lawyers to use a tool, then it's not the tool for me. [What I find particularly Ironic, is conversing with Francois himself, and him never raising said problems. Maybe Intel are so big, they don't know their arses from their elbows] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1827689 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.