Monitoring inconclusive GBT validations and harvesting data for testing

Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 36 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1827502 - Posted: 30 Oct 2016, 11:37:47 UTC - in response to Message 1827494.  

Thanks.
Would be good to check other collected so far overflows.
In the same result representation.

Overflow w/u's arn't the only 1's exhibiting these unusual numbers across similar Nvidia hosts as a few that I listed wern't overflows.

Cheers.
ID: 1827502 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827504 - Posted: 30 Oct 2016, 11:45:27 UTC - in response to Message 1827470.  
Last modified: 30 Oct 2016, 11:47:41 UTC

Macs have been all 64 bit for a while, so yes, always use 64 bit. Never had this problem before. From my understanding, it's trying to free the same memory space twice. I've looked at malloc_a.cpp but don't see what I'm looking for. Not really sure about what I'm looking for...

Build w/o ASYNC_SPIKE then will see.

OK, but, the SoG build doesn't use ASYNC_SPIKE...and it has the same Error.

Examples of crash with SoG build?

The last one;
Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit 
 System: Darwin  x86_64  Kernel: 15.6.0
...
Credit multiplier is :  2.85
WU true angle range is :  0.775000
Used GPU device parameters are:
	Number of compute units: 6
	Single buffer allocation size: 128MB
	Total device global memory: 2048MB
	max WG size: 1024
	local mem type: Real
	FERMI path used: yes
	LotOfMem path: yes
	LowPerformanceGPU path: no
	HighPerformanceGPU path: no
period_iterations_num=10
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
cpu_GPUState->gaussians.index=0
Triplet: peak=8.642653, time=80.39, period=0.5439, d_freq=1418924863.12, chirp=0.91687, fft_len=128 
MBv8_8.18r3549_NV-SoG_ssse3_x86_64-apple-darwin(1751,0x700000228000) malloc: *** error for object 0x120440000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
SIGABRT: abort called

Crashed executable name: MBv8_8.18r3549_NV-SoG_ssse3_x86_64-apple-darwin
Machine type Intel 80486 (64-bit executable)
System version: Macintosh OS 10.11.6 build 15G1108
Fri Oct 28 17:33:10 2016

0   MBv8_8.18r3549_NV-SoG_ssse3_x86_64-apple-darwin 0x0000000107d93614 std::__1::__tree<std::__1::__value_type<int, PROCINFO>, std::__1::__map_value_compare<int, std::__1::__value_type<int, PROCINFO>, std::__1::less<int>, true>, std::__1::allocator<std::__1::__value_type<int, PROCINFO> > >::__insert_unique(std::__1::__value_type<int, PROCINFO> const&) + 1076
1   MBv8_8.18r3549_NV-SoG_ssse3_x86_64-apple-darwin 0x0000000107d82876 COPROCS::clear() + 4006
2   libsystem_platform.dylib            0x00007fff9099c52a _sigtramp + 26
3   libsystem_malloc.dylib              0x00007fff9a5795a1 malloc_zone_malloc + 71
4   libsystem_c.dylib                   0x00007fff8d3946df abort + 129
5   libsystem_malloc.dylib              0x00007fff9a57b041 szone_size + 0
6   GeForceGLDriverWeb                  0x000000010a15c1aa gldWaitForObject + 4947
7   GeForceGLDriverWeb                  0x000000010a165e3b gldExecuteKernel + 201
8   OpenCL                              0x00007fff9c2b34a7 OpenCL + 13479
9   OpenCL                              0x00007fff9c2d00da clSetEventCallback + 5888
10  OpenCL                              0x00007fff9c2d39cc clFinish + 761
11  libdispatch.dylib                   0x00007fff9aa6240b _dispatch_client_callout + 8
12  libdispatch.dylib                   0x00007fff9aa6703b _dispatch_queue_drain + 754
13  libdispatch.dylib                   0x00007fff9aa6d707 _dispatch_queue_invoke + 549
14  libdispatch.dylib                   0x00007fff9aa6240b _dispatch_client_callout + 8
15  libdispatch.dylib                   0x00007fff9aa6629b _dispatch_root_queue_drain + 1890
16  libdispatch.dylib                   0x00007fff9aa65b00 _dispatch_worker_thread3 + 91
17  libsystem_pthread.dylib             0x00007fff9a2af4de _pthread_wqthread + 1129
18  libsystem_pthread.dylib             0x00007fff9a2ad341 start_wqthread + 13

Thread 6 crashed with X86 Thread State (64-bit):
  rax: 0x0100001f  rbx: 0x00000000  rcx: 0x700000225588  rdx: 0x00000028
  rdi: 0x7000002255f0  rsi: 0x00000003  rbp: 0x7000002255d0  rsp: 0x700000225588
   r8: 0x00004603   r9: 0x00000000  r10: 0x000003b0  r11: 0x00000206
  r12: 0x000003b0  r13: 0x00000028  r14: 0x7000002255f0  r15: 0x00004603
  rip: 0x7fff8a952f72  rfl: 0x00000206
...

I use the same MacOSX10.9.sdk on All builds. The difference is with the sah_config.h <malloc.h> setting;
/* Define to 1 if you have the <malloc.h> header file. */
/* #undef HAVE_MALLOC_H */
It doesn't seem to bother the Intel build either way. I'll have to run more tests with it undef. All the previous builds have it defined and pointed to the MacOSX10.9.sdk.
ID: 1827504 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827511 - Posted: 30 Oct 2016, 12:16:44 UTC - in response to Message 1827502.  

Thanks.
Would be good to check other collected so far overflows.
In the same result representation.

Overflow w/u's arn't the only 1's exhibiting these unusual numbers across similar Nvidia hosts as a few that I listed wern't overflows.

Cheers.

Links to non-overflow inconclusives?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827511 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1827513 - Posted: 30 Oct 2016, 12:27:01 UTC - in response to Message 1827511.  

Thanks.
Would be good to check other collected so far overflows.
In the same result representation.

Overflow w/u's arn't the only 1's exhibiting these unusual numbers across similar Nvidia hosts as a few that I listed wern't overflows.

Cheers.

Links to non-overflow inconclusives?

I've already provided them and if you didn't catch them then that is your problem so stop trying to pass the buck off.

These SoG apps were far too green to start with, they have got slightly better over time, but IMHO they still have a way to go yet (even if you may not think so, others think otherwise).

Cheers.
ID: 1827513 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827515 - Posted: 30 Oct 2016, 12:43:32 UTC - in response to Message 1827504.  
Last modified: 30 Oct 2016, 12:55:29 UTC

Try to rebuild from r3551.
If crash remained try to rebuild with OCL_VERBOSE defined.
unfortunately OS X stack listing starts inside some runtime function so unclear where in app crash occurs. OCL_VERBOSE could show place.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827515 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827521 - Posted: 30 Oct 2016, 14:26:10 UTC - in response to Message 1827515.  

The build goes fine, but it fails on the kernel build;
shmget in attach_shmem: Invalid argument
10:20:51 (49549): Can't set up shared mem: -1. Will run in standalone mode.
Not using mb_cmdline.txt-file, using commandline options.
Maximum single buffer size set to:192MB
oclFFT max WG size override set to:128
SpikeFind FFT size threshold override set to:2048
Number of period iterations for PulseFind set to 8 
Running on device number: 2
OpenCL platform detected: Apple
Number of OpenCL devices found : 3 
BOINC assigns slot on device #3 of 3 devices.
Info: BOINC provided OpenCL device ID used
DOUBLE_FP supported. 
cl_khr_fp64 supported. 
cl_APPLE_fp64_basic_ops supported. 
FERMI : true 

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_CHIRP3 ASYNC_SPIKE FFTW SSSE3 64bit 
 System: Darwin  x86_64  Kernel: 15.6.0
CPU : Intel(R) Xeon(R) CPU           E5472  @ 3.00GHz 
 GenuineIntel x86, Family 6 Model 23 Stepping 6
 Features : FPU TSC PAE APIC MTRR MMX SSE  SSE2 HT  SSE3 SSSE3 SSE4.1  

OpenCL-kernels filename : MultiBeam_Kernels_r3551.cl 
INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3551.cl_GeForceGTX950.bin_V7_15.6.0_1011143460315f0, continue with recompile...
Error : Building Program (binary, clBuildProgram):main kernels: not OK code -11
CL file build log on device GeForce GTX 950
<program source>:3641:40: error: expected ')'
                                                                          __global float4* restricted PoT,__global uint* restricted result_flag) {
                                       ^
<program source>:3640:37: note: to match this '('
void PC_find_triplets_kernel_twin_cl(int ul_FftLength, int len_power, float triplet_thresh_base, int AdvanceBy, int PoTLen,
                                    ^
<program source>:3655:32: error: use of undeclared identifier 'PoT'
        __global float4* fp_PulsePot= PoT + ul_PoT + TOffset * (fft_len4)+neg*256*1024;
                               ^
<program source>:3687:3: error: use of undeclared identifier 'result_flag'
                result_flag[result_coordinate+neg*RESULT_SIZE]=1;
  ^
ID: 1827521 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827522 - Posted: 30 Oct 2016, 14:27:41 UTC - in response to Message 1827521.  

just comment out whole that kernel or wait next rev
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827522 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827524 - Posted: 30 Oct 2016, 14:57:26 UTC - in response to Message 1827522.  
Last modified: 30 Oct 2016, 15:31:39 UTC

My comments didn't work, however, it seems to be working with the .cl file from r3548. Whether it will work correctly though, I dunno.

It didn't crash on the first 2 using;
/* Define to 1 if you have the <malloc.h> header file. */
#define HAVE_MALLOC_H 1

However, the r3551 NV build is still much slower than the r3550 Intel build from back here;
http://setiathome.berkeley.edu/forum_thread.php?id=80158&postid=1827435
The times and Q score are about the same with r3551.
ID: 1827524 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1827538 - Posted: 30 Oct 2016, 15:51:37 UTC - in response to Message 1827467.  



Both hosts seem to have clean records when it comes to Invalids.

http://setiathome.berkeley.edu/results.php?hostid=7940818&offset=0&show_names=0&state=3&appid=
such number of inconclusives (and first few checked are non-overflows) can't be considered as "clean records".
Smth. wrong on that host.

BTW, its other inconclusives also have too big Spike on 128k fft.
And another observation - through that inconclusives list driver version changes.
372.70, 372.90. Maybe inconsistent dirver update results in such behavior.

Most of inconclusives are from 28 & 29 October. Only 5 from dates earlier than 24 Oct.

And per 24Oct host had bigger driver version: Driver version: 375.57

So, I would attribute its breakage to incorrect driver re-installation.

Ah, okay. I had only looked at the Invalid count for that host being 0. I hadn't dug deeper into the current Inconclusives. I did verify that the 375.57 driver was not the one used in the WU I posted, since there was a warning about it in the Windows 10 - Yea or Nay? thread. Again, however, I didn't dig back to see if it had been in use previously. Still, it does seem like the vast majority of that host's tasks validate on the first try, with just the occasional wild Spike or Autocorr signal popping up.
ID: 1827538 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827544 - Posted: 30 Oct 2016, 16:29:26 UTC - in response to Message 1827522.  

It ran a couple more without crashing. It's just much slower than the Intel build;
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
11au16aa.28481.85822.12.39.56.wu blc5_2bit_guppi_57449_43932_HIP78775_0013.26700.831.18.27.53.vlar.wu

Listing executable(s) in /APPS :
MBv8_8.18r3551_NV_ssse3_x86_64-apple-darwin

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 11au16aa.28481.85822.12.39.56.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3630 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3551_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2
      818.42 real        84.46 user       188.16 sys
Elapsed Time : ……………………………… 819 seconds
Speed compared to default : 443 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 98.12%
---------------------------------------------------
Done with 11au16aa.28481.85822.12.39.56.wu.
Current WU: blc5_2bit_guppi_57449_43932_HIP78775_0013.26700.831.18.27.53.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8062 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3551_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2
     2165.85 real       605.08 user       268.95 sys
Elapsed Time : ……………………………… 2166 seconds
Speed compared to default : 372 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.96%
---------------------------------------------------
ID: 1827544 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827556 - Posted: 30 Oct 2016, 18:06:04 UTC
Last modified: 30 Oct 2016, 18:22:14 UTC

This might be interesting. The only major difference, besides a couple out of order signals, is the autocorr;
SSSE3ux OS X 64bit Build 3550: Best autocorr: peak=16.59647, time=60.4, delay=4.3732, d_freq=1420584475.99, chirp=-24.198, fft_len=128k
SSE3xj Win32 Build 3528      : Best autocorr: peak=16.60995, time=87.24, delay=5.9092, d_freq=1420588467.31, chirp=28.998, fft_len=128k

Everything else appears very close.

SSSE3ux OS X 64bit Build 3550
SSE3xj Win32 Build 3528

Hmmm, outvoted by the Windows Cartel again.
It's now running on a Linux CPU. We'll see what that says in a couple hours...
ID: 1827556 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827567 - Posted: 30 Oct 2016, 19:25:13 UTC - in response to Message 1827556.  

Ah, that "ux" finally attracted my attention.
Please return to xj path by defining USE_JSPF.
Other paths could work (or not) but currently unmaintained for GPU build. So, to speedup debugging better to stay on same path with Windows builds. I posted full line of defines for Windows build before, do comparison.

Also, try to add OCL_SYNCHED to NV build. Will it help with speed?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827567 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1827587 - Posted: 30 Oct 2016, 21:44:45 UTC - in response to Message 1827494.  
Last modified: 30 Oct 2016, 21:49:49 UTC

Thanks.
Would be good to check other collected so far overflows.
In the same result representation.


Sorry about the delay in getting this overflow result comparison to you.

C:\Users\qingb\Documents\TestEnvironment>compare i5-4210U.sah GT840m_r3528.sah Q100
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0     21     21     21      0        0     21     21     21      0
     Autocorr      0      7      7      7      0        0      7      7      7      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      2      2      2      0        0      2      2      2      0
      Triplet      0      0      0      0      0        0      0      0      0      0
   Best Spike      0      0      0      0      0        0      0      0      0      0
Best Autocorr      0      0      0      0      0        0      0      0      0      0
Best Gaussian      0      0      0      0      0        0      0      0      0      0
   Best Pulse      0      0      0      0      0        0      0      0      0      0
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   0     30     30     30      0        0     30     30     30      0

Result      : Strongly similar,  Q= 99.98%

C:\Users\qingb\Documents\TestEnvironment>compare i5-4210U.sah GT840m_r3548.sah Q100
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0     21     21     21      0        0     21     21     21      0
     Autocorr      0      7      7      7      0        0      7      7      7      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      2      2      2      0        0      2      2      2      0
      Triplet      0      0      0      0      0        0      0      0      0      0
   Best Spike      0      0      0      0      0        0      0      0      0      0
Best Autocorr      0      0      0      0      0        0      0      0      0      0
Best Gaussian      0      0      0      0      0        0      0      0      0      0
   Best Pulse      0      0      0      0      0        0      0      0      0      0
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   0     30     30     30      0        0     30     30     30      0

Result      : Strongly similar,  Q= 99.98%


Data file and Results can be found here

Also the link to the post here
ID: 1827587 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827593 - Posted: 30 Oct 2016, 22:52:48 UTC - in response to Message 1827567.  
Last modified: 30 Oct 2016, 23:14:01 UTC

Ah, that "ux" finally attracted my attention.
Please return to xj path by defining USE_JSPF.
Other paths could work (or not) but currently unmaintained for GPU build. So, to speedup debugging better to stay on same path with Windows builds. I posted full line of defines for Windows build before, do comparison.

Also, try to add OCL_SYNCHED to NV build. Will it help with speed?

Well, the SoG build still crashes on the BLC tasks, and uses lots of CPU;
Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit
SSSE3ux OS X 64bit Build 3553
period_iterations_num=8
Spike: peak=25.07172, time=8.83, d_freq=1616892188.79, chirp=0, fft_len=128 
Pulse: peak=4.935277, time=45.82, period=11.3, d_freq=1616892188.79, score=1.036, chirp=0, fft_len=128 
MBv8_8.18r3553_NV-SoG_ssse3_x86_64-apple-darwin(15547,0x7000001a5000) malloc: *** error for object 0x12f34c000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
SIGABRT: abort called

And takes very long on the reference_work_unit_r3215.wu;
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 2110 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3553_NV-SoG_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 0
     1012.02 real       703.05 user       159.28 sys
Elapsed Time : ……………………………… 1012 seconds
Speed compared to default : 208 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.50%

The NV build takes too long as well. It worked on the BLC5 task for quite a while and then crashed. Maybe it doesn't like JSPF?
Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit
SSSE3xj OS X 64bit Build 3552
Current WU: 11au16aa.28481.85822.12.39.56.wu
Running app with command : MBv8_8.18r3552_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2
      824.21 real        92.48 user       203.38 sys
Elapsed Time : ……………………………… 824 seconds
Speed compared to default : 440 %

period_iterations_num=8
Pulse: peak=5.321575, time=45.99, period=13.06, d_freq=1228150097.09, score=1.003, chirp=-4.4361, fft_len=4k
D:	threshold 1.503875; unscaled peak power: 1.507655 exceeds threshold for 0.2514%
Autocorr: peak=18.36389, time=74.45, delay=5.4228, d_freq=1228144214.73, chirp=-18.592, fft_len=128k
Pulse: peak=2.62872, time=45.9, period=5.577, d_freq=1228148715.01, score=1.019, chirp=-61.396, fft_len=2k
D:	threshold 0.4152116; unscaled peak power: 0.4208242 exceeds threshold for 1.352%
Pulse: peak=6.128602, time=45.82, period=11.56, d_freq=1228147742.68, score=1.031, chirp=70.205, fft_len=128 
D:	threshold 0.0541962; unscaled peak power: 0.05564297 exceeds threshold for 2.67%
MBv8_8.18r3552_NV_ssse3_x86_64-apple-darwin(15806,0x700000122000) malloc: *** error for object 0x134864000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
SIGABRT: abort called

The Intel build has sped up;
Build features: SETI8 Non-graphics OpenCL USE_OPENCL_INTEL OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit
SSSE3xj OS X 64bit Build 3551
Current WU: 11au16aa.28481.85822.12.39.56.wu
Running app with command : MBv8_8.18r3551_Intel_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2
      444.13 real        81.17 user       130.82 sys
Elapsed Time : ……………………………… 444 seconds
Speed compared to default : 817 %

Current WU: blc5_2bit_guppi_57449_43932_HIP78775_0013.26700.831.18.27.53.vlar.wu
Running app with command : MBv8_8.18r3551_Intel_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2
     1248.19 real       189.15 user       326.76 sys
Elapsed Time : ……………………………… 1248 seconds
Speed compared to default : 645 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.96%

I don't know about the OCL_SYNCHED, but it slowed the Intel build down quite a bit when I tried it there.
Right now it looks as though the Intel build is far superior.
ID: 1827593 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1827663 - Posted: 31 Oct 2016, 5:10:54 UTC

I'm going to go ahead and post this one, since it involves what I assume is the latest Petri Special.

Workunit 2309927200 (02fe09ad.27386.3344.8.35.76)
Task 5251952577 (S=7, A=3, P=3, T=2, G=0) SSE3xj Win32 Build 3330
Task 5251952578 (S=7, A=3, P=3, T=2, G=0) x41p_zi3k, Cuda 8.00 special

From what I can see, all the reported signals seem to match up quite well. It appears that the only significant discrepancy is down in the "Best gaussian" report, even though neither app actually reported a Gaussian signal.

SSE3xj Win32 Build 3330
Best gaussian: peak=3.905206, mean=0.586291, ChiSq=1.31821, time=62.91, d_freq=1420745941.17,
score=-2.003441, null_hyp=2.082193, chirp=38.498, fft_len=16k

x41p_zi3k, Cuda 8.00 special
Best gaussian: peak=4.36072, mean=0.5805977, ChiSq=1.177906, time=64.59, d_freq=1420746005.57,
score=-2.007056, null_hyp=2.005376, chirp=38.44, fft_len=16k
ID: 1827663 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1827671 - Posted: 31 Oct 2016, 6:02:38 UTC - in response to Message 1827663.  
Last modified: 31 Oct 2016, 6:05:07 UTC

Hi Jeff,
If you can hoard that along with others for in depth analysis at a later date, that would be great. That particular Gaussian scenario looks quite similar to X-branch Pre-v8 migration, where the main codebase required dialling in a few compiler options.

Aside on the Cuda generalisation: In between too much work and home stuff going on, I've managed to isolate why my 980 machine freaks out with the optimisations on occasion (having dumped another 88 tasks last night, replicating last weekend's freakout. Something I'd been waiting for.). It's a case of the error and exception handling needing rationalisation: extensive rework to capture the new kinds of exceptional circumstances that can be generated by asynchronous (and memory hungry) code.

That's actually good news for the long run, because Multibeam has had return codes where there should be exception handlers, and exception handlers that induce unstable states for some time. Fortunately looks like the limited usefulness of the boincapi debug output's days might be numbered in this particular branch.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1827671 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1827674 - Posted: 31 Oct 2016, 7:07:05 UTC - in response to Message 1827435.  

So, why is the Intel build faster on the nVidia cards?


Nice find!
Maybe Intel Crippling is back again or something?! I don't know! I can only guess. They've done it in the past and may very well do so again :)
http://www.agner.org/optimize/blog/read.php?i=49

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1827674 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827682 - Posted: 31 Oct 2016, 7:48:25 UTC - in response to Message 1827593.  
Last modified: 31 Oct 2016, 7:48:43 UTC

Do NV build with OCL_VERBOSE.
It will produce long log - only few last lines before the crash will be interesting ones.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827682 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827687 - Posted: 31 Oct 2016, 8:12:15 UTC - in response to Message 1827674.  

So, why is the Intel build faster on the nVidia cards?


Nice find!
Maybe Intel Crippling is back again or something?! I don't know! I can only guess. They've done it in the past and may very well do so again :)
http://www.agner.org/optimize/blog/read.php?i=49

Those times it could be circumvented by refusing from Intel's DLL usage, using statical linkage and manual SIMD level selection. I did that in AKv8 codebase but then some anonymous complains rised versus legality of Intel + GPL (BOINC) combo.
The decision was to abandon Intel compiler. That cost few dozens % of performance to SETI project (considering that SIMD builds could be incorporated into stock and CPU provides most of project power still).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827687 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1827689 - Posted: 31 Oct 2016, 8:20:32 UTC - in response to Message 1827687.  
Last modified: 31 Oct 2016, 8:27:21 UTC

Certainly decided to abandon about $3000 dollars worth of work and personal Intel compiler licences myself, after having the incompatibilities with GPL pointed out to me. Frankly if I need a team of lawyers to use a tool, then it's not the tool for me.

[What I find particularly Ironic, is conversing with Francois himself, and him never raising said problems. Maybe Intel are so big, they don't know their arses from their elbows]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1827689 · Report as offensive
Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 36 · Next

Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.