Posts by -= Vyper =-

21) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1823219)
Posted 10 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
Sweet, sweet. Well we can now conclude that we can bin the precision questions that i've started lately.

A lot of more time spent doing nothing is not doable to justify! Thanks Richard and Raistmer for this.
If we now enters next phase then? I'm venting an idea now.

Question:
Should there exist a double precision variant of the cpu executable and a set of WUs that ofcourse would be slow as hell to calculate but we got so much precision of it that it would set the "gold standard" Q=100 on the .res files and make it the reference values that every other optimised and production executable would try to get as near Q100 as possible to?
That application is not ment to be for users i'm more talking like a this is the best results that can ever be calculated for every WU out there and is used by you optimisers and S@H crew themselves as an origin and outcome of what "perfect" would be!

Ok part two then:
Then we have the validator thing to address instead to invent a golden standard of sorting data returned in how and what manner!
As it is today it "seems like" that if we take a garbled WU and sends it to one cpu, and a older gpu code and a newer gpu code it gets different.
We all know that we got that limit of storage space allowed (30).
If we imagine that we remove that limit and just digs through the whole WU we could now imagine that we encountered: 78 Spikes, 112 Pulses, 7 Tripplets etc.
As it is today the calculation stops when reached 30 different detection Points and sends the result back.
The cpu would start processing from 0 and gets to 100 linear and amongst the way it founds 20 Pulses and 8 spikes and 2 tripplets example in this order:

PPSPSSPPPTPPPSSPPPSPPPPTPPSSPP.... boom 1870 seconds spent on the linear cpu.

Now we takes the old gpu code that is sped up significantly but is still "serial" even if it can calculate portions faster and it produces:

PPSPSSPPPTPPPSSPPPSPPPPTPPSSPP.... boom 165 seconds in it stops with the same as it is a straight cpu/gpu port and the code haven't evolved more than a regular port.

Ok then lets move on the other portion of new executable that speds up.

PPSPSTPPPSPSSSTPPPSSPPSPPSPSPP... boom 45 seconds in it stops and sends this back.

Now this seems wrong by the validator because it differs so much in numbers found etc and in order. But in the reality if we removed the 30 limit block and let the code in all variants crunch through in this whole WU it would get the same amount 78 Spikes, 112 Pulses, 7 Tripplets found but in different order on the last faster executable but the values in every measure Point is correct.
For what it seems today the last executable which return data got a "inconclusive mark" and the inconclusive rate is ofcourse higher.

Until someone starts to make a multicore version of s@h executable to cpu exactly the way Petri seems to have done to the Gpu version this "inconclusive result" numbers would be high.
If that was done and Boinc knows this then a 12core cpu would only start one task but it would process this much faster and with 100% utilisation on all cores and get low times on finishing time but the validator would match the latest cpu code to the latest gpu code because its the same process pattern and inconclusives would drop to perhaps 10/1000 instead of 150/1000 as it is today.

The more parallel executions the more diversity in inconclusives would occur.
Now can this disparity be fixed until the cpu code catches up and gets multicore?! I don't know. Only you optimizers do!
22) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1822153)
Posted 6 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
Validation inconclusive (164)
Quite impressive number. Is it OK enough or not OK - worth to check.

15% Inconclusive. Not good. Actually, very bad IMHO.


Well in my World it is not that easy because Petris code seems to work on workunits like the Picture below more. In that way the information sent back especially with dealing with -9 units will almost all fall into the inconclusive state for sure.

Cpus and older code seem to work more like pixel for pixel Top Left - Top Right and down a notch to next row (Metaphorically speaking) so it's not crazy at all that the inconclusive is higher than normal. What really actually matters is that the rawdata returned is calculating correctly and it seems now like that in 99.9% of the cases it does and the rest gets binned away as invalid!

For what i Believe how Petris code more work i include this picture of an theoretical explanation of how it now seems to process data! Only Petri can tell more here but if it works exactly this way there is no wonder if information is beeing presented differently towards the validator and it takes a second/third opinion until ruled out even if the results do match.

I've written numerous times now that Petri has stated that his offline testbed of WUs all fall in the Q99+ range (Strongly similar sts) if he hasn't got something now the latest weeks that behaves differently.


23) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1821936)
Posted 5 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
Can you re-run offline and check if overflow repeats?


I dont have the wu! But the machine is running nonstop 24/7 now and hasnt been rebooted or so. It has calculated more wus after the invalid one without reboot.
24) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1821934)
Posted 5 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
Yes i know. But the code cannot fix what the validator thinks primarly. All various offline mb wus is 99+% last when i spoke to Petri but that one that i found was clearly off the charts!
25) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1821862)
Posted 5 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
This host of mine has no errors as of yet. (EDIT: Those Three if you see them was me when i forgot to put the file in right directory)

"Consecutive valid tasks 2990"

http://setiathome.berkeley.edu/results.php?hostid=8053171
26) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1821861)
Posted 5 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
I Found one!!!

http://setiathome.berkeley.edu/workunit.php?wuid=2279637413

SOG VERSION


Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 0.006367
Used GPU device parameters are:
Number of compute units: 12
Single buffer allocation size: 128MB
Total device global memory: 3072MB
max WG size: 1024
local mem type: Real
FERMI path used: yes
LotOfMem path: yes
LowPerformanceGPU path: no
period_iterations_num=50
Spike: peak=24.61833, time=5.727, d_freq=1352321279.39, chirp=-5.9643, fft_len=128k
Spike: peak=26.11531, time=5.727, d_freq=1352321279.39, chirp=-5.9656, fft_len=128k
Spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Spike: peak=25.41258, time=5.727, d_freq=1352321279.37, chirp=-5.9681, fft_len=128k
Spike: peak=24.49572, time=5.727, d_freq=1352321279.39, chirp=-5.9796, fft_len=128k
Spike: peak=25.01148, time=5.727, d_freq=1352321279.39, chirp=-5.9808, fft_len=128k
Spike: peak=24.24306, time=5.727, d_freq=1352321279.38, chirp=-5.9821, fft_len=128k
Pulse: peak=5.914622, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Pulse: peak=2.288692, time=45.84, period=3.867, d_freq=1352320459.77, score=1.002, chirp=-9.1617, fft_len=512
Pulse: peak=3.762496, time=45.86, period=9.015, d_freq=1352323815.94, score=1.004, chirp=-13.957, fft_len=1024
Spike: peak=24.19372, time=85.9, d_freq=1352327150.67, chirp=23.385, fft_len=128k
Spike: peak=24.54026, time=85.9, d_freq=1352327150.67, chirp=23.39, fft_len=128k
Pulse: peak=9.39383, time=46.17, period=28.63, d_freq=1352321706.07, score=1.02, chirp=38.231, fft_len=8k
Pulse: peak=3.339135, time=45.84, period=7.494, d_freq=1352328860.23, score=1, chirp=-40.942, fft_len=512
Pulse: peak=5.770851, time=45.86, period=13.69, d_freq=1352323539.85, score=1.034, chirp=46.311, fft_len=1024
Pulse: peak=1.297242, time=45.82, period=1.746, d_freq=1352326113.32, score=1.011, chirp=-60.411, fft_len=256
Pulse: peak=2.637339, time=45.9, period=4.593, d_freq=1352324522.16, score=1.029, chirp=74.726, fft_len=2k

Best spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Best autocorr: peak=17.29338, time=28.63, delay=4.1283, d_freq=1352324638.74, chirp=-27.965, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.123e+011, d_freq=0,
score=-12, null_hyp=0, chirp=0, fft_len=0
Best pulse: peak=5.914622, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Best triplet: peak=0, time=-2.123e+011, period=0, d_freq=0, chirp=0, fft_len=0


Flopcounter: 3909489318052.926800

Spike count: 9
Autocorr count: 0
Pulse count: 8
Triplet count: 0
Gaussian count: 0
Wallclock time elapsed since last restart: 1148.8 seconds

class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0
class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0


class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0


class Gaussian_new_best: total=0, N=0, <>=0, min=0 max=0
class Gaussian_report: total=0, N=0, <>=0, min=0 max=0
class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0


class PC_triplet_find_hit: total=54180, N=54180, <>=1, min=1 max=1
class PC_triplet_find_miss: total=2816, N=2816, <>=1, min=1 max=1


class PC_pulse_find_hit: total=44603, N=44603, <>=1, min=1 max=1
class PC_pulse_find_miss: total=18, N=18, <>=1, min=1 max=1
class PC_pulse_find_early_miss: total=16, N=16, <>=1, min=1 max=1
class PC_pulse_find_2CPU: total=0, N=0, <>=0, min=0 max=0


class PoT_transfer_not_needed: total=54165, N=54165, <>=1, min=1 max=1
class PoT_transfer_needed: total=2832, N=2832, <>=1, min=1 max=1

GPU device sync requested... ...GPU device synched
12:51:55 (160): called boinc_finish(0)

</stderr_txt>

PETRI VERSION
Work Unit Info:
...............
WU true angle range is : 0.006367
Sigma 710
Sigma > GaussTOffsetStop: 710 > -646
Thread call stack limit is: 1k
Spike: peak=24.61833, time=5.727, d_freq=1352321279.39, chirp=-5.9643, fft_len=128k
Spike: peak=26.11531, time=5.727, d_freq=1352321279.39, chirp=-5.9656, fft_len=128k
Spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Spike: peak=25.41257, time=5.727, d_freq=1352321279.37, chirp=-5.9681, fft_len=128k
Spike: peak=24.49572, time=5.727, d_freq=1352321279.39, chirp=-5.9796, fft_len=128k
Spike: peak=25.01146, time=5.727, d_freq=1352321279.39, chirp=-5.9808, fft_len=128k
Spike: peak=24.24305, time=5.727, d_freq=1352321279.38, chirp=-5.9821, fft_len=128k
Pulse: peak=5.914618, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Pulse: peak=2.288692, time=45.84, period=3.867, d_freq=1352320459.77, score=1.002, chirp=-9.1617, fft_len=512
Pulse: peak=3.762486, time=45.86, period=9.015, d_freq=1352323815.94, score=1.004, chirp=-13.957, fft_len=1024
Spike: peak=24.19372, time=85.9, d_freq=1352327150.67, chirp=23.385, fft_len=128k
Spike: peak=24.54028, time=85.9, d_freq=1352327150.67, chirp=23.39, fft_len=128k
Pulse: peak=9.393847, time=46.17, period=28.63, d_freq=1352321706.07, score=1.02, chirp=38.231, fft_len=8k
Pulse: peak=3.33913, time=45.84, period=7.494, d_freq=1352328860.23, score=1, chirp=-40.942, fft_len=512
setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce GTX 1080, 8112 MiB, regsPerBlock 65536
computeCap 6.1, multiProcs 20
pciBusID = 1, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 1080 is okay
SETI@home using CUDA accelerated device GeForce GTX 1080
Using pfb = 8 from command line args
Using pfp = 128 from command line args
Using unroll = 20 from command line args
Restarted at 30.47 percent, with setiathome enhanced x41p_zi3j, Cuda 8.00 special
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Sigma 710
Sigma > GaussTOffsetStop: 710 > -646
Thread call stack limit is: 1k
Find triplets Cuda kernel encountered too many triplets, or bins above threshold, reprocessing this PoT on CPU... err = 1
setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce GTX 1080, 8112 MiB, regsPerBlock 65536
computeCap 6.1, multiProcs 20
pciBusID = 1, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 1080 is okay
SETI@home using CUDA accelerated device GeForce GTX 1080
Using pfb = 8 from command line args
Using pfp = 128 from command line args
Using unroll = 20 from command line args
Restarted at 30.47 percent, with setiathome enhanced x41p_zi3j, Cuda 8.00 special
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Sigma 710
Sigma > GaussTOffsetStop: 710 > -646
Thread call stack limit is: 1k
Find triplets Cuda kernel encountered too many triplets, or bins above threshold, reprocessing this PoT on CPU... err = 1
setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce GTX 1080, 8112 MiB, regsPerBlock 65536
computeCap 6.1, multiProcs 20
pciBusID = 1, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 1080 is okay
SETI@home using CUDA accelerated device GeForce GTX 1080
Using pfb = 8 from command line args
Using pfp = 128 from command line args
Using unroll = 20 from command line args
Restarted at 30.47 percent, with setiathome enhanced x41p_zi3j, Cuda 8.00 special
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Sigma 710
Sigma > GaussTOffsetStop: 710 > -646
Thread call stack limit is: 1k
Spike: peak=97.2344, time=23.59, d_freq=1352327061.65, chirp=-29.776, fft_len=256
Spike: peak=134.1317, time=23.81, d_freq=1352324641.01, chirp=-29.776, fft_len=256
Spike: peak=240.1877, time=24.37, d_freq=1352324579.65, chirp=-29.776, fft_len=256
Spike: peak=240.1717, time=24.39, d_freq=1352324400.17, chirp=-29.776, fft_len=256
Spike: peak=256, time=24.42, d_freq=1352323684.25, chirp=-29.776, fft_len=256
Spike: peak=256, time=24.55, d_freq=1352324350.8, chirp=-29.776, fft_len=256
Spike: peak=51.20004, time=24.57, d_freq=1352325244.21, chirp=-29.776, fft_len=256
Spike: peak=240.16, time=24.6, d_freq=1352323678.92, chirp=-29.776, fft_len=256
Spike: peak=240.1455, time=24.62, d_freq=1352324572.32, chirp=-29.776, fft_len=256
Spike: peak=25.60004, time=24.68, d_freq=1352326045.54, chirp=-29.776, fft_len=256
Spike: peak=36.57143, time=24.71, d_freq=1352324614.36, chirp=-29.776, fft_len=256
Spike: peak=85.77132, time=24.8, d_freq=1352321840.08, chirp=-29.776, fft_len=256
Spike: peak=32.00062, time=24.95, d_freq=1352322729.49, chirp=-29.776, fft_len=256
Spike: peak=51.19973, time=24.98, d_freq=1352326305.1, chirp=-29.776, fft_len=256
Spike: peak=256, time=25.11, d_freq=1352324736.48, chirp=-29.776, fft_len=256
Spike: peak=256, time=25.15, d_freq=1352324958.67, chirp=-29.776, fft_len=256
Spike: peak=256, time=25.4, d_freq=1352324861.94, chirp=-29.776, fft_len=256
Spike: peak=256, time=25.69, d_freq=1352324138.02, chirp=-29.776, fft_len=256
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.

Best spike: peak=256, time=24.42, d_freq=1352323684.25, chirp=-29.776, fft_len=256
Best autocorr: peak=17.2934, time=28.63, delay=4.1283, d_freq=1352324638.74, chirp=-27.965, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.123e+11, d_freq=0,
score=-12, null_hyp=0, chirp=0, fft_len=0
Best pulse: peak=5.914618, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Best triplet: peak=0, time=-2.123e+11, period=0, d_freq=0, chirp=0, fft_len=0

Flopcounter: 18627826515589.917969

Spike count: 27
Autocorr count: 0
Pulse count: 3
Triplet count: 0
Gaussian count: 0
00:12:49 (3124): called boinc_finish(0)

</stderr_txt>
]]>

AND DARWIN VERSION WUT? IT WORX :)

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
OpenCL platform detected: Apple
Number of OpenCL devices found : 1
BOINC assigns slot on device #0.
Info: BOINC provided OpenCL device ID used

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_CHIRP3 ASYNC_SPIKE FFTW SSE3 64bit
System: Darwin x86_64 Kernel: 15.6.0
CPU : Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
GenuineIntel x86, Family 6 Model 60 Stepping 3
Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX1.0

OpenCL-kernels filename : MultiBeam_Kernels_r3321.cl
ar=0.006367 NumCfft=116085 NumGauss=0 NumPulse=46762762368 NumTriplet=59733895584
Currently allocated 185 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768
OS X optimized setiathome_v8 application
Version info: SSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSE3x OS X 64bit Build 3321 , Ported by : Raistmer, JDWhale, Urs Echternacht


OpenCL version by Raistmer, r3321

AMD HD5 version by Raistmer

Number of OpenCL platforms: 1


OpenCL Platform Name: Apple
Number of devices: 1
Max compute units: 16
Max work group size: 256
Max clock frequency: 975Mhz
Max memory allocation: 536870912
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Queue properties:
Out-of-Order: No
Name: AMD Radeon R9 M290 Compute Engine
Vendor: AMD
Driver version: 1.2 (Aug 29 2016 22:17:00)
Version: OpenCL 1.2
Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_depth_images cl_APPLE_command_queue_priority cl_APPLE_command_queue_select_compute_units cl_khr_fp64


Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 0.006367
Used GPU device parameters are:
Number of compute units: 16
Single buffer allocation size: 128MB
Total device global memory: 2048MB
max WG size: 256
local mem type: Real
LotOfMem path: no
period_iterations_num=50
Spike: peak=24.61832, time=5.727, d_freq=1352321279.39, chirp=-5.9643, fft_len=128k
Spike: peak=26.11531, time=5.727, d_freq=1352321279.39, chirp=-5.9656, fft_len=128k
Spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Spike: peak=25.41257, time=5.727, d_freq=1352321279.37, chirp=-5.9681, fft_len=128k
Spike: peak=24.49572, time=5.727, d_freq=1352321279.39, chirp=-5.9796, fft_len=128k
Spike: peak=25.01146, time=5.727, d_freq=1352321279.39, chirp=-5.9808, fft_len=128k
Spike: peak=24.24305, time=5.727, d_freq=1352321279.38, chirp=-5.9821, fft_len=128k
Pulse: peak=5.91462, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Pulse: peak=2.288693, time=45.84, period=3.867, d_freq=1352320459.77, score=1.002, chirp=-9.1617, fft_len=512
Pulse: peak=3.762485, time=45.86, period=9.015, d_freq=1352323815.94, score=1.004, chirp=-13.957, fft_len=1024
Spike: peak=24.19375, time=85.9, d_freq=1352327150.67, chirp=23.385, fft_len=128k
Spike: peak=24.54029, time=85.9, d_freq=1352327150.67, chirp=23.39, fft_len=128k
Pulse: peak=9.393847, time=46.17, period=28.63, d_freq=1352321706.07, score=1.02, chirp=38.231, fft_len=8k
Pulse: peak=3.339135, time=45.84, period=7.494, d_freq=1352328860.23, score=1, chirp=-40.942, fft_len=512
Pulse: peak=5.770851, time=45.86, period=13.69, d_freq=1352323539.85, score=1.034, chirp=46.311, fft_len=1024
Pulse: peak=1.297241, time=45.82, period=1.746, d_freq=1352326113.32, score=1.011, chirp=-60.411, fft_len=256
Pulse: peak=2.637341, time=45.9, period=4.593, d_freq=1352324522.16, score=1.029, chirp=74.726, fft_len=2k

Best spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Best autocorr: peak=17.29339, time=28.63, delay=4.1283, d_freq=1352324638.74, chirp=-27.965, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.123e+11, d_freq=0,
score=-12, null_hyp=0, chirp=0, fft_len=0
Best pulse: peak=5.91462, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Best triplet: peak=0, time=-2.123e+11, period=0, d_freq=0, chirp=0, fft_len=0


Flopcounter: 12534120800678.304688

Spike count: 9
Autocorr count: 0
Pulse count: 8
Triplet count: 0
Gaussian count: 0
Time cpu in use since last restart: 199.5 seconds
GPU device sync requested... ...GPU device synched
20:10:42 (89204): called boinc_finish(0)

</stderr_txt>
]]>
27) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820505)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:

Edit - and the newly-validated one provides an excellent case study. We have an iGPU (HD Graphics 530) with an enormous inconclusive count, and a canonical signal display from the ATi. I'll grab them, and compare after lunch.


I remember this from last year when i noticed something with iGPU on Intel.

Posted: 16 Sep 2015, 14:12:34 UTC Edit Hide Move
Last modified: 16 Sep 2015, 14:12:51 UTC


Hey

Need some assistance Before i start to plunge Deep into my issue.
One of my crunchers has got a new Cpu up and running. Problem is that my Intel GPU is starting to pause work in progress and start on the next and next and so on so my computer is refused new work on the Nvidia GPU.

I presume it's an EDF thing. How is the right way to adress this nowadays?


I bought it solely to Crunch at iGPU and Cpu at the same time as the Igpu is powerful but i sold it and bought a 6700K instead.

http://ark.intel.com/sv/products/88040/Intel-Core-i7-5775C-Processor-6M-Cache-up-to-3_70-GHz

This processor couldn't do Astropulse and just paused the work and started the next unit, no one at lunatics had an answer that solved this back then either.
28) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820504)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
I'd actually not call that validated at all, but we're stuck with a binary choice in the status column.

And this "feature" really hides issues making builds validation and debugging harder (though allows that damned credits receiving of course...)


You got a Point there, because it all gets down to human psychology. If something doesn't needs to be fixed because it won't matter in the end (credits) it won't get fixed that much. But if weakly ones doesnt get a single credit then things would sped up dramtically to make it work or if it can't work then "ban" the computer/platform/gpu combo in the servers instead and don't send units to devices that can't compute them thoroughly. As simple as that really.
It would be ashame if Cuda/Amds GPU hardware gets there but in the end it is the same rules that then would apply for everyone.
29) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820481)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
Now one of them validated them all!

http://setiathome.berkeley.edu/workunit.php?wuid=2276193382
30) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820479)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
If the s@h people says that they want to use IEEE754 in future releases to iron out differences then science wise it should be very welcome.

It's easier to ban an platform/compiler that doesn't conform to those rules in Boinc API if you developers find an combination that doesnt work properly.

https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interchange_formats

Its only to binary32 or decimal32 what serves best from the simpliest cpu application up to monster quadruple gpu/fpga/asic cores in the future. If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard.

EDIT: All this above is to get code more to the Q100 mark whatever platform/combination as possible but as a second step perhaps but as we've noticed that thing that i mention now has nothing to do with the main topic of the thread of inconclusive validations, that is Another thing ofcourse that actually needs to be fixed on Another level because i'm sure that each and every one of those applications if compared to all signals found (30+) would get the Q99+ so they most certainly would validate.
31) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820477)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow..


Isn't you all using fp:precise (single) i just wanted to ask what happens if app is compiled and tested with fp:strict (single) instead! What is the speed penalty of going precise(single) to strict(single)??

This was mentioned in this thread before. Increasing precision is not a solution for overflow tasks

I'm not talking about solving overflow tasks, that was not the purpose. (This was an offtopic question that popped up in my mind)
The purpose in my mind was an overall platform standard that should follow IEEE754 regardless of cpu, x32 x64 arm, gpu. When calculated and fixed correctly then the outcome would be so very Close to Q100 as it possibly can resulting in less headbanging for all of you optimisers in the future.
The idea of me telling you to test for that direction is mainly for you all to switch more to code optimising instead of bughunting various platforms until hell freezes over. It will only increase as i say not decrease.

Until you know for sure that it won't work i will continue to push on this for unification if it isn't so much slower than using precise.
When numbers have been presented here as an comparison then we know 100% if this is not Worth it or not but if going fp:strict is for an example 3% slower but Q is increased to Q99.99 - Q100 range then if i were an Project manager i would vouch to go that route now instead of banging heads for more months/years to come chasing annoying rounding bugs and result disparities.
32) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820465)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
Gentlemen!

I sum it up so far as i'm happy that a mindset has been brought up to Daylight instead of pinpointing applications to the left and right.
Something needs to be done and Jeff highlited with hardproof of what i've seen and haven't been able to express verbaly what i've seen and figured out to be addressed.

Now that we're all seem to be on the same page and recognizing that we have issues in various different platforms/apps/compilers whatever then from now forth i Think the solution might popup in all of this later on in how to Think and act to resolve this and forthcoming issues.

My hat of to all of u lads!

Now if we go back to idea of FP:strict (IEEE754) vs anything else Double precision etc etc, Have any of you an idea of speed penalty of going strict instead of precise, double precise is? If going to fp:strict single precision (More isn't needed apparently) is a few percent slower then so be it for the sake of conformity! But if it is half the speed etc then, No that is not the route to go "for now" but instead of focusing in the validator/re-order-of-work-reported issue that seems to be apparent.
33) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820456)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:

But to solve this by doing a "find all and sort them afterwards" would mean that every task would have to run to full term, and we'd lose the efficiency of quitting early after 10 seconds or so for the really noisy WUs.


Well if we lose efficiency of quitting early why should validator even "validate" -9 work when the server code could see .. "Ohh geez this is a overflow result! Thanks! Here is your credits!" if compared to other -9s

If the device sends a -9 result back but the other application sees this as a real result then you should be awarded zero credits anyway.
34) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820435)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
this doesn't guarantee other compilers, or hardware device manufacturers implemented their chips in the bit identical way suggested


Yes but if other compilers or hardware isn't bit identical then there would be a flaw in their IEEE754 implementation and you all would know that and needs to tacle that platform or device differently and puts effort there!

I'm only suggesting that IEEE754 should be used so the majority of applications get to the Q100 mark! Then you all know that when compiling under Linux,Windows, Bla bla This work as intended and when a new version breaks it then you would know it 100% for sure and could revert back or "change lines in the code" required to get to Q100 mark.

Haven't mentioned validation as it could validate non Q100 results also but i'm proposing this as a base and way of thinking route to ease future headache instead.
35) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820427)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:


If two apps find the same subset of 30 out of the available 50, then I'm pretty sure the validator will pass the result, even if the reporting order is different - I had a walk through the code a few days ago.

But if the app - by doing a parallel search - finds a different subset of 30 from 50, then the results are different, and no amount of tweaking the validator is going make any difference.


Yup, that is so totally true! Thats why i'm naging about that the result sent back should be unified (presentation wise) so the stock CPU has a sorting routine incorporated in the future if so and every other application aswell so we won't ever get this again in the future. If a WU is overflowed it is ofcourse and it's crap. But why perhaps don't get credit for 5600 seconds of cputime if it gets ironed out of other "juggling order" applications when you could do the code right from the beginning?

Incorporate a result sorting routine in main s@h code and let the other (Tweakers follow its lead). The only thing in the future all would get is less headache when dealing with other forthcoming optimisations and variations which will only increase, not decrease :-/
36) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820423)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:


In essense if it comes down to precise Vs Strict with single floats (which I doubt this is), then double precision should be used if it's really necessary.


I don't really see the need of it. They only need to use IEE strict and use single precision (single precision 32 bits ±1.18×10−38 to ±3.4×1038 approx. 7 decimal digits ) , That should set every application and even CPUs so we would get the Q=100 mark.
And if it indeed doesn't then validator portion of code needs to be adressed.

Perhaps something for you all to pursue Eric to go down this route if this isn't so extremely more slow than other FP modes. It seems like of S@H goes down this IEE route then i Believe you coders would get alot less headache in the future when you're optimising the analyzing part and can focus more on development instead of chasing rounding bugs slipped through.

Compare this to write some code in c++ compared to pure machine code. What is the most easy code to maintain when bugs arise? :) , Certainly not the good old Classic F8 E4 code Lol...
37) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820417)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
could be tricky to locate the source of cumulative error.


Indeed! And this needs to be addressed "NOW", not in the future or later on because the variety of different apps, platforms, compilers, Cuda, OpenCL, Vulkan bla bla yada yada is increasing and thus this problem increases exponentially.

I Think that what we're seeing now were a "non issue" in the past Before 2010 where the majority of computers were CPU based (Serial non-interesting-output) but now more and more ppl add their PS3s, Androids, AMD Gpus , Nvidia GPUs bla bla this "inconclusive era" seems to have got out of reach in every app produced! Not to mention the real black Apple issue sheep!

This is only my way of seeing this, and perhaps real old code that worked perfectly in a CPU-only world needs to be changed and i'm not talking about the analyzing part that you guys are tweaking the hell out of, perhaps the Server Validator code needs to be changed that perhaps was written in 2006 where we had none of the new devices that pops up regularly.
If that code part is "stupid" and doesn't do the sorting and juggling required to do then you coders "need to patch the outgoing results from the analyze so the validator gets it because it is serial-coded-minded instead of parallell-coded-minded"
38) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820414)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
fp: precise leads to inconclusive results vs stock.
Better to forget about fp:precise completely.
This is non-portable feature of CPU.


Hmm

http://stackoverflow.com/questions/12514516/difference-between-fpstrict-and-fpprecise
I Think that strict should be used in every code produced if i read this above. "bitwise compatibility between different compilers and platforms"
https://en.wikipedia.org/wiki/IEEE_754-1985

I posted this for others to get it too as to why!
39) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820411)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
I'm open to debate, however my opinion is that CPU serial dictates order by rules typically adopted implementing parallel algorithms, in they must always be reducible to serial form and produce the same result.


Exactly what i Believe also, all finds in all apps need to be uploaded in the same order-of-sequence to not fall in the inconclusive ballpark (as it seems?!) . It's easy when WUs get compared to the stock application (CPU) and gets validated in the end but Think of it when a WU is sent to for instance an Android, Apple Darwin, SoG and Cuda (What application is more right and wrong than the other is hard to find if this isn't addresed) and none of them passes through because the result sent back is Always different in some way even if it perhaps get Q99+ for real (Or does it really and everyone Believes that the code works?!). Has anyone looked at the results in a excel spreadsheet and try to sort and compare the results there (Or other human viewable application) :) Lol
40) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820403)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
Should I check the other one ?


Yes! Please do! Because i'm very curious of the outcome of this experience and if this issue gets sorted out then we will probably see alot of false-positives that will vanish.

As in your testing, you got strong Q and when i recieved a message from Petri with his "banks" of testwus all of them were in the Q99+ range when he ran his application but yet still they seem to fall into the inconclusive swamp anyway.


Previous 20 · Next 20


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.