Monitoring inconclusive GBT validations and harvesting data for testing

Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · 33 · 34 . . . 36 · Next

AuthorMessage
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1827695 - Posted: 31 Oct 2016, 10:03:25 UTC

Jeff, when you hoard them, if you could PM me your email so I can add you to the Google Drive folder I have setup, so you can upload data to it. Right now everyone has read-only access and only select few have permission to upload and delete.
And I would like to keep it that way, in case someone decides to delete everything if I decided the public has all permission
ID: 1827695 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827698 - Posted: 31 Oct 2016, 11:17:22 UTC - in response to Message 1827587.  

Thanks.
Would be good to check other collected so far overflows.
In the same result representation.


Sorry about the delay in getting this overflow result comparison to you.

Thanks. But seems this one is OK for r3528 too in this run.

Would be good to be sure that new version has substantionally better inconclusives rate before I would bother Eric with new beta server update.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827698 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827708 - Posted: 31 Oct 2016, 12:20:01 UTC - in response to Message 1827682.  

Do NV build with OCL_VERBOSE.
It will produce long log - only few last lines before the crash will be interesting ones.

Not sure if it will help, I don't really see anything other than the crash. The SoG App crashed pretty quick, the NV build took longer. I do see HD5 mentioned, isn't that an AMD Kernel?

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_VERBOSE OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit
Current WU: blc3_2bit_guppi_57451_21431_HIP63121_0009.12921.416.17.26.230.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 933 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3553_NV-SoG-Verbose_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 0
       24.93 real        17.86 user         2.77 sys
Elapsed Time : ……………………………… 25 seconds
------------------
...
call ' oclFFT1: clEnqueueNDRangeKernel' is finished OK in file ../../src/OpenCL_FFT/fft_execute.cpp near line 570
call ' oclFFT1: clEnqueueNDRangeKernel' is finished OK in file ../../src/OpenCL_FFT/fft_execute.cpp near line 570
call ' oclFFT1: clEnqueueNDRangeKernel' is finished OK in file ../../src/OpenCL_FFT/fft_execute.cpp near line 570
call ' oclFFT1: clEnqueueNDRangeKernel' is finished OK in file ../../src/OpenCL_FFT/fft_execute.cpp near line 570
call 'autocorr fft' is finished OK in file autocorr.cpp near line 715
call 'FindAutoCorrelation_reduce0_kernel_cl' is finished OK in file autocorr.cpp near line 757
call 'Enqueueing FindAutoCorrelation_reduce1_kernel_cl' is finished OK in file autocorr.cpp near line 789
call 'Setting kernel argument: (strip) PC_find_spike32_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 3371
call 'Enqueueing kernel: (strip) PC_find_spike32_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 3377
call 'Setting kernel argument: (strip) Spike_logging_HD5_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 3408
call 'clEnqueueNDRangeKernel(cq,Spike_logging_HD5_kernel_cl)' is finished OK in file analyzeFuncs.cpp near line 3422
call 'clEnqueueTask(cq,Autocorr_logging_kernel_cl)' is finished OK in file analyzeFuncs.cpp near line 3727
call 'Setting kernel argument:CalcChirpData_kernel2_cl' is finished OK in file analyzeFuncs.cpp near line 4118
call 'Enqueueing kernel:CalcChirpData_kernel2_cl' is finished OK in file analyzeFuncs.cpp near line 4140
call ' oclFFT1: clEnqueueNDRangeKernel' is finished OK in file ../../src/OpenCL_FFT/fft_execute.cpp near line 570
call ' oclFFT1: clEnqueueNDRangeKernel' is finished OK in file ../../src/OpenCL_FFT/fft_execute.cpp near line 570
call 'non-strip fft' is finished OK in file analyzeFuncs.cpp near line 4235
INFO: FFT done no strip. fftlen=2048, NumBlockFfts=512, chirplen=1048576
call 'Enqueueing kernel:GetPowerSpectrum_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 4274
call 'Setting kernel argument:PC_find_spike32_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 4331
call 'Enqueueing kernel:PC_find_spike32_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 4337
call 'Setting kernel argument: (strip) Spike_logging_HD5_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 4368
call 'clEnqueueNDRangeKernel(cq,Spike_logging_HD5_kernel_cl)' is finished OK in file analyzeFuncs.cpp near line 4382
call 'Setting kernel argument:pc_triplet_find_cl' is finished OK in file analyzePoT.cpp near line 1357
call 'Enqueueing kernel:pc_triplet_find_cl' is finished OK in file analyzePoT.cpp near line 1368
call 'Setting kernel argument:set_mem_kernel_cl' is finished OK in file analyzePoT.cpp near line 1455
call 'Enqueueing kernel:set_mem_kernel_cl' is finished OK in file analyzePoT.cpp near line 1462
call 'Setting kernel argument:PC_find_pulse_kernel_cl' is finished OK in file analyzePoT.cpp near line 2640
call 'Setting kernel argument:PC_find_pulse_kernel_cl,offset' is finished OK in file analyzePoT.cpp near line 2697
call 'clEnqueueMarker' is finished OK in file analyzePoT.cpp near line 2699
MBv8_8.18r3553_NV-SoG-Verbose_ssse3_x86_64-apple-darwin(34650,0x7000001a5000) malloc: *** error for object 0x12501f000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
SIGABRT: abort called

Crashed executable name: MBv8_8.18r3553_NV-SoG-Verbose_ssse3_x86_64-apple-darwin


Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_VERBOSE OCL_CHIRP3 ASYNC_SPIKE FFTW JSPF SSSE3 64bit
Current WU: blc5_2bit_guppi_57449_43932_HIP78775_0013.26700.831.18.27.53.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8062 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3552_NV-Verbose_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 8 -device 2
     1623.65 real       489.79 user       411.95 sys
Elapsed Time : ……………………………… 1624 seconds
--------------
...
call 'Enqueueing kernel:PC_find_pulse_partial_kernel_cl,pass 5' is finished OK in file analyzePoT.cpp near line 2283
call 'ReadBuffer(gpu_result_flag,pulse)' is finished OK in file analyzePoT.cpp near line 2369
call 'clGetEventProfilingInfo' is finished OK in file ../../src/GPU_lock.cpp near line 546
call 'ReadBuffer(gpu_triplet_result_flag)' is finished OK in file analyzePoT.cpp near line 3302
need_pulse_cpu_processing=0;need_triplet_cpu_processing=0;ThisPoT=1;last_bin=512;
		PulseSearchBinStart=-1;PulseSearchBinStop=0;TripletSearchBinStart=31;TripletSearchBinStop=0
call ' oclFFT2: clEnqueueNDRangeKernel' is finished OK in file ../../src/OpenCL_FFT/fft_execute.cpp near line 609
call 'non-strip fft' is finished OK in file analyzeFuncs.cpp near line 4235
INFO: FFT done no strip. fftlen=1024, NumBlockFfts=1024, chirplen=1048576
call 'Enqueueing kernel:GetPowerSpectrum_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 4274
call 'Setting kernel argument:set_mem_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 4438
call 'Enqueueing kernel:set_mem_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 4445
call 'Setting kernel argument: PC_find_spike_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 4523
call 'Enqueueing kernel:PC_find_spike_kernel_cl' is finished OK in file analyzeFuncs.cpp near line 4538
call 'ReadBuffer(gpu_result_flag,spike)' is finished OK in file analyzeFuncs.cpp near line 4587
call 'Setting kernel argument:set_mem_kernel_cl' is finished OK in file analyzePoT.cpp near line 1299
call 'Enqueueing kernel:set_mem_kernel_cl' is finished OK in file analyzePoT.cpp near line 1306
call 'Setting kernel argument:pc_triplet_find_cl' is finished OK in file analyzePoT.cpp near line 1357
call 'Enqueueing kernel:pc_triplet_find_cl' is finished OK in file analyzePoT.cpp near line 1368
call 'Setting kernel argument:set_mem_kernel_cl' is finished OK in file analyzePoT.cpp near line 1455
call 'Enqueueing kernel:set_mem_kernel_cl' is finished OK in file analyzePoT.cpp near line 1462
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl' is finished OK in file analyzePoT.cpp near line 1792
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl,offset' is finished OK in file analyzePoT.cpp near line 1813
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl,pass 3' is finished OK in file analyzePoT.cpp near line 1858
call 'clEnqueueMarker' is finished OK in file analyzePoT.cpp near line 1860
call 'Enqueueing kernel:PC_find_pulse_partial_kernel_cl,pass 3' is finished OK in file analyzePoT.cpp near line 1877
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl,pass 4' is finished OK in file analyzePoT.cpp near line 2074
call 'Enqueueing kernel:PC_find_pulse_partial_kernel_cl,pass 4' is finished OK in file analyzePoT.cpp near line 2089
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl,pass 5' is finished OK in file analyzePoT.cpp near line 2268
call 'Enqueueing kernel:PC_find_pulse_partial_kernel_cl,pass 5' is finished OK in file analyzePoT.cpp near line 2283
MBv8_8.18r3552_NV-Verbose_ssse3_x86_64-apple-darwin(34606,0x700000122000) malloc: *** error for object 0x13351a000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
SIGABRT: abort called

Crashed executable name: MBv8_8.18r3552_NV-Verbose_ssse3_x86_64-apple-darwin
ID: 1827708 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827710 - Posted: 31 Oct 2016, 12:41:12 UTC - in response to Message 1827708.  

Is the crash deterministic now?
Second run 9of same binary) will show exactly same log or different one?
Try for non-SoG for example (as more deterministic one, though crash somewhere in PulseFibnd it seems).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827710 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827715 - Posted: 31 Oct 2016, 13:13:04 UTC - in response to Message 1827710.  
Last modified: 31 Oct 2016, 13:44:56 UTC

The SoG build crashed in the same place, even though the numbers in the crash are different. The non-SoG build will take a while to finish.

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_VERBOSE OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW SSSE3 64bit
call 'non-strip fft' is finished OK in file analyzeFuncs.cpp near line 4235
call 'Enqueueing kernel:pc_triplet_find_cl' is finished OK in file analyzePoT.cpp near line 1368
call 'Setting kernel argument:set_mem_kernel_cl' is finished OK in file analyzePoT.cpp near line 1455
call 'Enqueueing kernel:set_mem_kernel_cl' is finished OK in file analyzePoT.cpp near line 1462
call 'Setting kernel argument:PC_find_pulse_kernel_cl' is finished OK in file analyzePoT.cpp near line 2640
call 'Setting kernel argument:PC_find_pulse_kernel_cl,offset' is finished OK in file analyzePoT.cpp near line 2697
call 'clEnqueueMarker' is finished OK in file analyzePoT.cpp near line 2699
MBv8_8.18r3553_NV-SoG-Verbose_ssse3_x86_64-apple-darwin(35414,0x700000122000) malloc: *** error for object 0x1256e9000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
SIGABRT: abort called

Crashed executable name: MBv8_8.18r3553_NV-SoG-Verbose_ssse3_x86_64-apple-darwin

Seems the non-SoG build crashed earlier, but with the same couple lines preceding the crash.
Elapsed Time : ……………………………… 1566 seconds
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl' is finished OK in file analyzePoT.cpp near line 1792
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl,offset' is finished OK in file analyzePoT.cpp near line 1813
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl,pass 3' is finished OK in file analyzePoT.cpp near line 1858
call 'Enqueueing kernel:PC_find_pulse_partial_kernel_cl,pass 3' is finished OK in file analyzePoT.cpp near line 1877
call 'ReadBuffer(gpu_result_flag,pulse)' is finished OK in file analyzePoT.cpp near line 1961
call 'clGetEventProfilingInfo' is finished OK in file ../../src/GPU_lock.cpp near line 546
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl,pass 4' is finished OK in file analyzePoT.cpp near line 2074
call 'Enqueueing kernel:PC_find_pulse_partial_kernel_cl,pass 4' is finished OK in file analyzePoT.cpp near line 2089
call 'ReadBuffer(gpu_result_flag,pulse)' is finished OK in file analyzePoT.cpp near line 2172
call 'clGetEventProfilingInfo' is finished OK in file ../../src/GPU_lock.cpp near line 546
call 'Setting kernel argument:PC_find_pulse_partial_kernel_cl,pass 5' is finished OK in file analyzePoT.cpp near line 2268
call 'Enqueueing kernel:PC_find_pulse_partial_kernel_cl,pass 5' is finished OK in file analyzePoT.cpp near line 2283
MBv8_8.18r3552_NV-Verbose_ssse3_x86_64-apple-darwin(35370,0x700000122000) malloc: *** error for object 0x12e020000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
SIGABRT: abort called

Crashed executable name: MBv8_8.18r3552_NV-Verbose_ssse3_x86_64-apple-darwin
ID: 1827715 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827724 - Posted: 31 Oct 2016, 14:30:13 UTC - in response to Message 1827689.  
Last modified: 31 Oct 2016, 14:32:05 UTC


[What I find particularly Ironic, is conversing with Francois himself, and him never raising said problems. Maybe Intel are so big, they don't know their arses from their elbows]

Perhaps because it's not Intel but GPL issue.
If some binary contains functions w/o sources, only in binary form - it's OK for Intel.
But the same doesn't OK for GPL.
So, GPL's way of "freedom defense" cost SETI project few dozens % of performance.
Would be good if GPL adepts realise that fact. Forbidding to link into same binary non-GPL code comes from GPL, not from Intel.
Unfortunately, SETI's own license directly alows (that makes its license non-GPL compliant fully, btw) to link to binary FFT libraries. But, as I said in first post on this topic, to circumvent Intel's fradulent practice statical instead of dynamical linkage required.
Perhaps to solve this one could develop own DLL, call it iFFT, link to it all needed FFT functions from IPP (statically, SIMD-dependent) and then link SETI main binary to that DLL. This perverted way GPL would be circumvented by explicit Berkeley's license change. But currently I have no time for such pervertions...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827724 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1827756 - Posted: 31 Oct 2016, 18:25:13 UTC - in response to Message 1827671.  

Hi Jeff,
If you can hoard that along with others for in depth analysis at a later date, that would be great. That particular Gaussian scenario looks quite similar to X-branch Pre-v8 migration, where the main codebase required dialling in a few compiler options.

Well, Jason, I can't say that I'm "hoarding" many of these, but I did grab this one and just emailed it to you, or at least to your junk mail folder. ;^) If there are other specific ones you want, let me know and I'll see if I saved them.

The problem, as I see it, of saving many of these WUs for a "later date", rather than for immediate use, is that the collection rapidly loses context. Then I find myself with a bunch of WU files and no recollection of why, specifically, I saved them, even (or, perhaps, especially) if they're floating in a cloud. It seems like if they're going to be saved for future use, whether or not it's in a central repository, there probably needs to be some standard system for tying them back to the original reason they were saved.
ID: 1827756 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1827760 - Posted: 31 Oct 2016, 18:40:41 UTC - in response to Message 1827695.  
Last modified: 31 Oct 2016, 18:42:14 UTC

Jeff, when you hoard them, if you could PM me your email so I can add you to the Google Drive folder I have setup, so you can upload data to it. Right now everyone has read-only access and only select few have permission to upload and delete.
And I would like to keep it that way, in case someone decides to delete everything if I decided the public has all permission

As I just replied to Jason, I haven't really grabbed that many of these. Mostly I've just been providing the links in my posts and my Inconclusives lists so those who want to test with them can grab them themselves. When necessary, I've uploaded a few to my own Amazon cloud drive but, as I also just mentioned to Jason, it seems to me that those WU files rapidly lose context when they're saved in anticipation of possible future testing, rather than a more immediate use.
ID: 1827760 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1827772 - Posted: 31 Oct 2016, 20:24:05 UTC - in response to Message 1827760.  
Last modified: 31 Oct 2016, 20:24:23 UTC

Jeff, when you hoard them, if you could PM me your email so I can add you to the Google Drive folder I have setup, so you can upload data to it. Right now everyone has read-only access and only select few have permission to upload and delete.
And I would like to keep it that way, in case someone decides to delete everything if I decided the public has all permission

As I just replied to Jason, I haven't really grabbed that many of these. Mostly I've just been providing the links in my posts and my Inconclusives lists so those who want to test with them can grab them themselves. When necessary, I've uploaded a few to my own Amazon cloud drive but, as I also just mentioned to Jason, it seems to me that those WU files rapidly lose context when they're saved in anticipation of possible future testing, rather than a more immediate use.


Then are you able to provide the program that you made to generate the inconclusive list?
I do not have the time to make my own program in that regard.
ID: 1827772 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1827815 - Posted: 1 Nov 2016, 2:03:44 UTC - in response to Message 1827772.  

Jeff, when you hoard them, if you could PM me your email so I can add you to the Google Drive folder I have setup, so you can upload data to it. Right now everyone has read-only access and only select few have permission to upload and delete.
And I would like to keep it that way, in case someone decides to delete everything if I decided the public has all permission

As I just replied to Jason, I haven't really grabbed that many of these. Mostly I've just been providing the links in my posts and my Inconclusives lists so those who want to test with them can grab them themselves. When necessary, I've uploaded a few to my own Amazon cloud drive but, as I also just mentioned to Jason, it seems to me that those WU files rapidly lose context when they're saved in anticipation of possible future testing, rather than a more immediate use.


Then are you able to provide the program that you made to generate the inconclusive list?
I do not have the time to make my own program in that regard.

I don't think I understand what your goal would be here. I automated my Inconclusive list generation because of the volume of WUs in an Inconclusive state that are typically active on my crunchers at any given time. Last night's list had 143 WUs on it. The list summarizes those in just a few minutes in a manner that would take me several hours to pull up manually. Then I, or anyone else, can more easily skim through the list to identify potential candidates for further research. I just looked at your hosts and only see 4 Inconclusives currently.

I had at one time considered automatically downloading all the associated WU files each time I generated a new list. My routine has that capability. However, after taking the time to pick through those lists over several days, I realized that only a tiny fraction are likely to warrant further investigation, so downloading all of them would seem to be a waste of resources. I've tried to post examples of some of those, to the extent that something looks odd to me, but I'm not one of the developers, so I'm not entirely sure if I'm helping them or annoying them each time I post. It would be better if they could do their own filtering, but even with a list to work from, that takes time.

If I was convinced that it would be significantly useful to provide a program to others that could generate a similar list for their own crunchers, I could probably do so. As it exists currently, however, the routine that I use is just part of a larger program that I use to maintain a local DB of my own WUs. The initial Inconclusive WU identification to drive the routine is done internally from that DB, which would not be possible on anybody else's machine.
ID: 1827815 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827841 - Posted: 1 Nov 2016, 8:04:38 UTC - in response to Message 1827815.  

I'm not one of the developers, so I'm not entirely sure if I'm helping them or annoying them each time I post.

Depends on the mood :D
More seriously - if you have such big number of inconclusives could you try latest posted build under anonymous platform and estimate average reduction in inconclusives after lets say few days usage?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827841 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827847 - Posted: 1 Nov 2016, 8:39:23 UTC - in response to Message 1827715.  
Last modified: 1 Nov 2016, 8:41:15 UTC

Well, it seems that OS X doesn't like new profiling abilities of OpenCL app.
Similar situation I had with iGPU on Windows so currently iGPU path has all adaptation disabled. This could explain why iGPU build doesn't crash on NV hardware under OS X.
iGPU doesn't enqueue all those barriers and doesn't read all those events.
So, short-term solution would be add OS X to similar exclusion through the sources.
In long-term would be good to solve this "aversion" of more modern approach.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827847 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1827938 - Posted: 2 Nov 2016, 5:49:24 UTC - in response to Message 1827841.  
Last modified: 2 Nov 2016, 5:53:17 UTC

I'm not one of the developers, so I'm not entirely sure if I'm helping them or annoying them each time I post.

Depends on the mood :D

Heh, heh. I think I've noticed that. ;^)

More seriously - if you have such big number of inconclusives could you try latest posted build under anonymous platform and estimate average reduction in inconclusives after lets say few days usage?

Okay, I've just started running r3548 on my host 8064262, so we'll see how it goes. (BTW, I noticed you didn't include the MultiBeam_Kernals_r3548.cl file in the aistub, although I assume that it's needed in the app_info.)

Also, that large number of Inconclusives is the total for all 6 of my active hosts. It includes not only the WUs which are currently shown in an Inconclusive state for my hosts, but also existing Inconclusive WUs for which my hosts have been sent the potential tiebreaker tasks and haven't yet completed them.

If I look just at the 8064262 box, there are currently just 49 completed tasks in an Inconclusive state, of which 16 are still hanging around from r3500. Of the remaining 33, just over half are against either wayward hosts (with lots of Invalids), Intel GPUs (which I assume can be ignored until v8.20 comes along), or a Petri Special prior to x41p_zi3k. The remaining Inconclusives for r3528 break down as follows:

Non-overflow
r3528 vs. stock Windows CPU = 3 WUs

Overflow
r3528 vs. stock Windows CPU = 9 WUs
r3528 vs. Cuda50 = 2 WUs
r3528 vs. r3528 = 2 WUs

I would think that those are the types of Inconclusives that I should keep an eye out for with r3548. Is that a valid assumption?
ID: 1827938 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827939 - Posted: 2 Nov 2016, 6:00:04 UTC - in response to Message 1827938.  


Overflow
r3528 vs. stock Windows CPU = 9 WUs
r3528 vs. Cuda50 = 2 WUs
r3528 vs. r3528 = 2 WUs

I would think that those are the types of Inconclusives that I should keep an eye out for with r3548. Is that a valid assumption?

I think so. Mostly - overflows with CPU and OpenCL NV/ATi included.
Last changes should influence only overflows, validation ratre for non-overflows should remain the same as in r3528.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827939 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827942 - Posted: 2 Nov 2016, 6:38:31 UTC - in response to Message 1827847.  

Well, it seems that OS X doesn't like new profiling abilities of OpenCL app.
Similar situation I had with iGPU on Windows so currently iGPU path has all adaptation disabled. This could explain why iGPU build doesn't crash on NV hardware under OS X.
iGPU doesn't enqueue all those barriers and doesn't read all those events.
So, short-term solution would be add OS X to similar exclusion through the sources.
In long-term would be good to solve this "aversion" of more modern approach.

Do you really need to change anything? The only way you can tell it's built with the Intel path is by looking at the build features. All the versions I've built work just fine in Anonymous platform configured for NVIDIA. What would you need to change to run at Beta? The best version has this line;
Build features: SETI8 Non-graphics OpenCL USE_OPENCL_INTEL OCL_ZERO_COPY OCL_CHIRP3 FFTW JSPF SSSE3 64bit
It seems to be running just as fast as the Windows SoG version on similar GPUs.
The Webpages aren't updating, so, this last version isn't showing up at Beta yet.
ID: 1827942 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827984 - Posted: 2 Nov 2016, 18:14:53 UTC - in response to Message 1827942.  

Well, what will be if host contains both iGPU and NV devices?
If "iGPU" build will accept NV as destination (provided iGPU exists in the system, not NV-GPU only) then we could live with such hack. But I'm afraid that app will switch to iGPU in such case. Do we have any hardware to check this?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827984 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827994 - Posted: 2 Nov 2016, 18:59:33 UTC - in response to Message 1827984.  

In my experience it wouldn't be a problem if a <coproc> section is present. I have that problem when running in standalone when there isn't a <coproc> section. My ATI App always runs on my nVidia card when in standalone, it doesn't happen when running in BOINC with a <coproc> entry. So, I don't think it would be any different than what already happens. Of course, you would have two different Apps with different version numbers and coproc names, that would ensure each App had it's own Wisdom and Kernel files. I don't see a problem.

Look at the stderr in the App run in BOINC, it's different than the same app run in Standalone;
http://setiathome.berkeley.edu/result.php?resultid=5260876988
Number of OpenCL devices found : 3
BOINC assigns slot on device #3 of 3 devices.
Info: BOINC provided OpenCL device ID used

verses
18:10:03 (74234): Can't set up shared mem: -1. Will run in standalone mode.
Running on device number: 0
GPU not found: type=intel_gpu, opencl_device_index=-1, device_num=0
WARNING: boinc_get_opencl_ids failed with code -1
OpenCL platform detected: Apple
WARNING: BOINC supplied wrong platform!
Number of OpenCL devices found : 3
BOINC assigns slot on device #1 of 3 devices.
WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities


I still haven't heard a word on whether the App works correctly on an iGPU.
Now there have been 2 copies downloaded, http://www.arkayn.us/forum/index.php?topic=191.msg4498#msg4498
ID: 1827994 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1827997 - Posted: 2 Nov 2016, 19:21:29 UTC - in response to Message 1827994.  

In my experience it wouldn't be a problem if a <coproc> section is present. I have that problem when running in standalone when there isn't a <coproc> section. My ATI App always runs on my nVidia card when in standalone, it doesn't happen when running in BOINC with a <coproc> entry.

Your testbed should have a minimal init_data.xml file, and you should adjust it as necessary to guide the application under test onto the right GPU. I was able to test the Windows intel_gpu application on the target hardware that way - and it was a necessary test, because we were trying to track down some change in the hardware and hardware-specific device driver combination, which had caused the rash of inconclusive and invalid results. Testing on different hardware, with its own (different) device driver, is insufficient.
ID: 1827997 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1827998 - Posted: 2 Nov 2016, 19:42:56 UTC - in response to Message 1827997.  

Or you could just do as I did. Once you see the App produces the correct results and doesn't crash, just run it in BOINC. Another solution would be to just remove everything except the ATI card(s), since I don't have an iGPU that works well. Usually the only GPUs in the machine are nVidias anyway, so, I don't have that problem.
ID: 1827998 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1827999 - Posted: 2 Nov 2016, 19:47:23 UTC - in response to Message 1827998.  

It's still a technique worth remembering, in case your testing needs change in the future.
ID: 1827999 · Report as offensive
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · 33 · 34 . . . 36 · Next

Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.