Posts by -= Vyper =-

1) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1834772)
Posted 2 days ago by Profile -= Vyper =-Project Donor
Post:
TBAR: Have you compared the speed of your compile compared to Petris different builds? Good that the invalid rates are down but as we all know by know we cant eliminate the way the validator thing works either.
The more you produce faster the more inconclusive ratio that host seem to have until it vanishes of.

What i now write below is my theory:

With that i mean, if you have a slow host that doesn't process that much WUs per day you tend to end up crunching units that your wingman already has crunched. If the validator compares the work of a (I call it Petri Cuda) WU and compare it to the other that has been crunched already you get a validation pass and both get rewarded credits and the WU soon is cleared from the system (Cannot find the WU) as we can see when they have been processed and thus the invalid ratio is low!

When you have the opposite a ultra-speedy system that crunches thousands of WUs per day the more inconclusive you will get because that machine is so fast and returns the work first of them all and Waits for other computers to Catch up and when they start to return WUs and the overflowed results are pooring in so that speedy Machines inconclusive ratio will rise faster than others as well.

/End of Theory

What actually matters is ofcourse that the code does the work properly! Q ratio high as possible in various tasks, GBT, High/Low AR etc etc you all know that part but the value as Tbar refers to as "Consecutive valid tasks"
That one is the main thing to keep track of in my mind not the inconclusive part because the more parallel code the more inconclusives we will get wether it's an CPU, GPU , FPGA, PS4 yada.

Thanks for your work TBar and thank you Petri for going the brute force route of taking advantage of newer hardware that made this leap. Latest SoG is also speedy as hell! The 1080 if mine is utilized better than running multiple parallell Cudas now! Thank you Raistmer,Jason,Urs and all you Alpha/Beta testers and others that has contributed that has made that we're where we are at the moment! The list of ppl would get long.
2) Message boards : Number crunching : Spammers (Message 1830835)
Posted 23 days ago by Profile -= Vyper =-Project Donor
Post:
Delete: Just tested.. From. -= Vyper =- (Is this bullshit for real with extreme amounts of letters in the name, why is it even possible in the first place? Who could have figured that you could write a novel here and it is allowed to have so many characters in the name. Lol
3) Message boards : Number crunching : Spammers (Message 1830834)
Posted 23 days ago by Profile -= Vyper =-Project Donor
Post:
Omg! I tested and it worked! Lol! :D
4) Message boards : Number crunching : The wonders of Micro$oft Windoze (Message 1829935)
Posted 28 days ago by Profile -= Vyper =-Project Donor
Post:
And this then? https://support.microsoft.com/en-us/kb/947821
5) Message boards : Politics : Hillary Clinton - the next president of America? (Message 1829266)
Posted 9 Nov 2016 by Profile -= Vyper =-Project Donor
Post:
It seems to be over for Hillary now!

Trump is Trumphiant :P
6) Message boards : Politics : Donald Trump for President? (Message 1829265)
Posted 9 Nov 2016 by Profile -= Vyper =-Project Donor
Post:
Lol! It seems done now!

The World just got more Trumpified! :D
7) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1827674)
Posted 31 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
So, why is the Intel build faster on the nVidia cards?


Nice find!
Maybe Intel Crippling is back again or something?! I don't know! I can only guess. They've done it in the past and may very well do so again :)
http://www.agner.org/optimize/blog/read.php?i=49
8) Message boards : Number crunching : Moore's Law illustrated (Message 1827036)
Posted 27 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
Lol, I've programmed Assembly and i loved it back in my Amiga Days. Workbench friendly assembly with correct calls to libraries! :)
9) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1825193)
Posted 18 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
And here is CPU code vs two SoGs that are inconclusive:

http://setiathome.berkeley.edu/workunit.php?wuid=2295768996
5221951752 8108844 16 Oct 2016, 10:45:17 UTC 17 Oct 2016, 11:24:57 UTC Completed, validation inconclusive 1,617.93 1,571.58 pending SETI@home v8
Anonymous platform (CPU)
5221951753 7804722 16 Oct 2016, 10:45:20 UTC 16 Oct 2016, 15:08:30 UTC Completed, validation inconclusive 625.20 603.70 pending SETI@home v8 v8.10 (opencl_nvidia_SoG)
x86_64-pc-linux-gnu
5224825595 8060288 17 Oct 2016, 16:47:42 UTC 18 Oct 2016, 3:35:46 UTC Completed, validation inconclusive 826.10 815.27 pending SETI@home v8 v8.19 (opencl_nvidia_SoG)
10) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1823219)
Posted 10 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
Sweet, sweet. Well we can now conclude that we can bin the precision questions that i've started lately.

A lot of more time spent doing nothing is not doable to justify! Thanks Richard and Raistmer for this.
If we now enters next phase then? I'm venting an idea now.

Question:
Should there exist a double precision variant of the cpu executable and a set of WUs that ofcourse would be slow as hell to calculate but we got so much precision of it that it would set the "gold standard" Q=100 on the .res files and make it the reference values that every other optimised and production executable would try to get as near Q100 as possible to?
That application is not ment to be for users i'm more talking like a this is the best results that can ever be calculated for every WU out there and is used by you optimisers and S@H crew themselves as an origin and outcome of what "perfect" would be!

Ok part two then:
Then we have the validator thing to address instead to invent a golden standard of sorting data returned in how and what manner!
As it is today it "seems like" that if we take a garbled WU and sends it to one cpu, and a older gpu code and a newer gpu code it gets different.
We all know that we got that limit of storage space allowed (30).
If we imagine that we remove that limit and just digs through the whole WU we could now imagine that we encountered: 78 Spikes, 112 Pulses, 7 Tripplets etc.
As it is today the calculation stops when reached 30 different detection Points and sends the result back.
The cpu would start processing from 0 and gets to 100 linear and amongst the way it founds 20 Pulses and 8 spikes and 2 tripplets example in this order:

PPSPSSPPPTPPPSSPPPSPPPPTPPSSPP.... boom 1870 seconds spent on the linear cpu.

Now we takes the old gpu code that is sped up significantly but is still "serial" even if it can calculate portions faster and it produces:

PPSPSSPPPTPPPSSPPPSPPPPTPPSSPP.... boom 165 seconds in it stops with the same as it is a straight cpu/gpu port and the code haven't evolved more than a regular port.

Ok then lets move on the other portion of new executable that speds up.

PPSPSTPPPSPSSSTPPPSSPPSPPSPSPP... boom 45 seconds in it stops and sends this back.

Now this seems wrong by the validator because it differs so much in numbers found etc and in order. But in the reality if we removed the 30 limit block and let the code in all variants crunch through in this whole WU it would get the same amount 78 Spikes, 112 Pulses, 7 Tripplets found but in different order on the last faster executable but the values in every measure Point is correct.
For what it seems today the last executable which return data got a "inconclusive mark" and the inconclusive rate is ofcourse higher.

Until someone starts to make a multicore version of s@h executable to cpu exactly the way Petri seems to have done to the Gpu version this "inconclusive result" numbers would be high.
If that was done and Boinc knows this then a 12core cpu would only start one task but it would process this much faster and with 100% utilisation on all cores and get low times on finishing time but the validator would match the latest cpu code to the latest gpu code because its the same process pattern and inconclusives would drop to perhaps 10/1000 instead of 150/1000 as it is today.

The more parallel executions the more diversity in inconclusives would occur.
Now can this disparity be fixed until the cpu code catches up and gets multicore?! I don't know. Only you optimizers do!
11) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1822153)
Posted 6 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
Validation inconclusive (164)
Quite impressive number. Is it OK enough or not OK - worth to check.

15% Inconclusive. Not good. Actually, very bad IMHO.


Well in my World it is not that easy because Petris code seems to work on workunits like the Picture below more. In that way the information sent back especially with dealing with -9 units will almost all fall into the inconclusive state for sure.

Cpus and older code seem to work more like pixel for pixel Top Left - Top Right and down a notch to next row (Metaphorically speaking) so it's not crazy at all that the inconclusive is higher than normal. What really actually matters is that the rawdata returned is calculating correctly and it seems now like that in 99.9% of the cases it does and the rest gets binned away as invalid!

For what i Believe how Petris code more work i include this picture of an theoretical explanation of how it now seems to process data! Only Petri can tell more here but if it works exactly this way there is no wonder if information is beeing presented differently towards the validator and it takes a second/third opinion until ruled out even if the results do match.

I've written numerous times now that Petri has stated that his offline testbed of WUs all fall in the Q99+ range (Strongly similar sts) if he hasn't got something now the latest weeks that behaves differently.


12) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1821936)
Posted 5 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
Can you re-run offline and check if overflow repeats?


I dont have the wu! But the machine is running nonstop 24/7 now and hasnt been rebooted or so. It has calculated more wus after the invalid one without reboot.
13) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1821934)
Posted 5 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
Yes i know. But the code cannot fix what the validator thinks primarly. All various offline mb wus is 99+% last when i spoke to Petri but that one that i found was clearly off the charts!
14) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1821862)
Posted 5 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
This host of mine has no errors as of yet. (EDIT: Those Three if you see them was me when i forgot to put the file in right directory)

"Consecutive valid tasks 2990"

http://setiathome.berkeley.edu/results.php?hostid=8053171
15) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1821861)
Posted 5 Oct 2016 by Profile -= Vyper =-Project Donor
Post:
I Found one!!!

http://setiathome.berkeley.edu/workunit.php?wuid=2279637413

SOG VERSION


Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 0.006367
Used GPU device parameters are:
Number of compute units: 12
Single buffer allocation size: 128MB
Total device global memory: 3072MB
max WG size: 1024
local mem type: Real
FERMI path used: yes
LotOfMem path: yes
LowPerformanceGPU path: no
period_iterations_num=50
Spike: peak=24.61833, time=5.727, d_freq=1352321279.39, chirp=-5.9643, fft_len=128k
Spike: peak=26.11531, time=5.727, d_freq=1352321279.39, chirp=-5.9656, fft_len=128k
Spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Spike: peak=25.41258, time=5.727, d_freq=1352321279.37, chirp=-5.9681, fft_len=128k
Spike: peak=24.49572, time=5.727, d_freq=1352321279.39, chirp=-5.9796, fft_len=128k
Spike: peak=25.01148, time=5.727, d_freq=1352321279.39, chirp=-5.9808, fft_len=128k
Spike: peak=24.24306, time=5.727, d_freq=1352321279.38, chirp=-5.9821, fft_len=128k
Pulse: peak=5.914622, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Pulse: peak=2.288692, time=45.84, period=3.867, d_freq=1352320459.77, score=1.002, chirp=-9.1617, fft_len=512
Pulse: peak=3.762496, time=45.86, period=9.015, d_freq=1352323815.94, score=1.004, chirp=-13.957, fft_len=1024
Spike: peak=24.19372, time=85.9, d_freq=1352327150.67, chirp=23.385, fft_len=128k
Spike: peak=24.54026, time=85.9, d_freq=1352327150.67, chirp=23.39, fft_len=128k
Pulse: peak=9.39383, time=46.17, period=28.63, d_freq=1352321706.07, score=1.02, chirp=38.231, fft_len=8k
Pulse: peak=3.339135, time=45.84, period=7.494, d_freq=1352328860.23, score=1, chirp=-40.942, fft_len=512
Pulse: peak=5.770851, time=45.86, period=13.69, d_freq=1352323539.85, score=1.034, chirp=46.311, fft_len=1024
Pulse: peak=1.297242, time=45.82, period=1.746, d_freq=1352326113.32, score=1.011, chirp=-60.411, fft_len=256
Pulse: peak=2.637339, time=45.9, period=4.593, d_freq=1352324522.16, score=1.029, chirp=74.726, fft_len=2k

Best spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Best autocorr: peak=17.29338, time=28.63, delay=4.1283, d_freq=1352324638.74, chirp=-27.965, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.123e+011, d_freq=0,
score=-12, null_hyp=0, chirp=0, fft_len=0
Best pulse: peak=5.914622, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Best triplet: peak=0, time=-2.123e+011, period=0, d_freq=0, chirp=0, fft_len=0


Flopcounter: 3909489318052.926800

Spike count: 9
Autocorr count: 0
Pulse count: 8
Triplet count: 0
Gaussian count: 0
Wallclock time elapsed since last restart: 1148.8 seconds

class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0
class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0


class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0


class Gaussian_new_best: total=0, N=0, <>=0, min=0 max=0
class Gaussian_report: total=0, N=0, <>=0, min=0 max=0
class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0


class PC_triplet_find_hit: total=54180, N=54180, <>=1, min=1 max=1
class PC_triplet_find_miss: total=2816, N=2816, <>=1, min=1 max=1


class PC_pulse_find_hit: total=44603, N=44603, <>=1, min=1 max=1
class PC_pulse_find_miss: total=18, N=18, <>=1, min=1 max=1
class PC_pulse_find_early_miss: total=16, N=16, <>=1, min=1 max=1
class PC_pulse_find_2CPU: total=0, N=0, <>=0, min=0 max=0


class PoT_transfer_not_needed: total=54165, N=54165, <>=1, min=1 max=1
class PoT_transfer_needed: total=2832, N=2832, <>=1, min=1 max=1

GPU device sync requested... ...GPU device synched
12:51:55 (160): called boinc_finish(0)

</stderr_txt>

PETRI VERSION
Work Unit Info:
...............
WU true angle range is : 0.006367
Sigma 710
Sigma > GaussTOffsetStop: 710 > -646
Thread call stack limit is: 1k
Spike: peak=24.61833, time=5.727, d_freq=1352321279.39, chirp=-5.9643, fft_len=128k
Spike: peak=26.11531, time=5.727, d_freq=1352321279.39, chirp=-5.9656, fft_len=128k
Spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Spike: peak=25.41257, time=5.727, d_freq=1352321279.37, chirp=-5.9681, fft_len=128k
Spike: peak=24.49572, time=5.727, d_freq=1352321279.39, chirp=-5.9796, fft_len=128k
Spike: peak=25.01146, time=5.727, d_freq=1352321279.39, chirp=-5.9808, fft_len=128k
Spike: peak=24.24305, time=5.727, d_freq=1352321279.38, chirp=-5.9821, fft_len=128k
Pulse: peak=5.914618, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Pulse: peak=2.288692, time=45.84, period=3.867, d_freq=1352320459.77, score=1.002, chirp=-9.1617, fft_len=512
Pulse: peak=3.762486, time=45.86, period=9.015, d_freq=1352323815.94, score=1.004, chirp=-13.957, fft_len=1024
Spike: peak=24.19372, time=85.9, d_freq=1352327150.67, chirp=23.385, fft_len=128k
Spike: peak=24.54028, time=85.9, d_freq=1352327150.67, chirp=23.39, fft_len=128k
Pulse: peak=9.393847, time=46.17, period=28.63, d_freq=1352321706.07, score=1.02, chirp=38.231, fft_len=8k
Pulse: peak=3.33913, time=45.84, period=7.494, d_freq=1352328860.23, score=1, chirp=-40.942, fft_len=512
setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce GTX 1080, 8112 MiB, regsPerBlock 65536
computeCap 6.1, multiProcs 20
pciBusID = 1, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 1080 is okay
SETI@home using CUDA accelerated device GeForce GTX 1080
Using pfb = 8 from command line args
Using pfp = 128 from command line args
Using unroll = 20 from command line args
Restarted at 30.47 percent, with setiathome enhanced x41p_zi3j, Cuda 8.00 special
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Sigma 710
Sigma > GaussTOffsetStop: 710 > -646
Thread call stack limit is: 1k
Find triplets Cuda kernel encountered too many triplets, or bins above threshold, reprocessing this PoT on CPU... err = 1
setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce GTX 1080, 8112 MiB, regsPerBlock 65536
computeCap 6.1, multiProcs 20
pciBusID = 1, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 1080 is okay
SETI@home using CUDA accelerated device GeForce GTX 1080
Using pfb = 8 from command line args
Using pfp = 128 from command line args
Using unroll = 20 from command line args
Restarted at 30.47 percent, with setiathome enhanced x41p_zi3j, Cuda 8.00 special
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Sigma 710
Sigma > GaussTOffsetStop: 710 > -646
Thread call stack limit is: 1k
Find triplets Cuda kernel encountered too many triplets, or bins above threshold, reprocessing this PoT on CPU... err = 1
setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce GTX 1080, 8112 MiB, regsPerBlock 65536
computeCap 6.1, multiProcs 20
pciBusID = 1, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 1080 is okay
SETI@home using CUDA accelerated device GeForce GTX 1080
Using pfb = 8 from command line args
Using pfp = 128 from command line args
Using unroll = 20 from command line args
Restarted at 30.47 percent, with setiathome enhanced x41p_zi3j, Cuda 8.00 special
Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Sigma 710
Sigma > GaussTOffsetStop: 710 > -646
Thread call stack limit is: 1k
Spike: peak=97.2344, time=23.59, d_freq=1352327061.65, chirp=-29.776, fft_len=256
Spike: peak=134.1317, time=23.81, d_freq=1352324641.01, chirp=-29.776, fft_len=256
Spike: peak=240.1877, time=24.37, d_freq=1352324579.65, chirp=-29.776, fft_len=256
Spike: peak=240.1717, time=24.39, d_freq=1352324400.17, chirp=-29.776, fft_len=256
Spike: peak=256, time=24.42, d_freq=1352323684.25, chirp=-29.776, fft_len=256
Spike: peak=256, time=24.55, d_freq=1352324350.8, chirp=-29.776, fft_len=256
Spike: peak=51.20004, time=24.57, d_freq=1352325244.21, chirp=-29.776, fft_len=256
Spike: peak=240.16, time=24.6, d_freq=1352323678.92, chirp=-29.776, fft_len=256
Spike: peak=240.1455, time=24.62, d_freq=1352324572.32, chirp=-29.776, fft_len=256
Spike: peak=25.60004, time=24.68, d_freq=1352326045.54, chirp=-29.776, fft_len=256
Spike: peak=36.57143, time=24.71, d_freq=1352324614.36, chirp=-29.776, fft_len=256
Spike: peak=85.77132, time=24.8, d_freq=1352321840.08, chirp=-29.776, fft_len=256
Spike: peak=32.00062, time=24.95, d_freq=1352322729.49, chirp=-29.776, fft_len=256
Spike: peak=51.19973, time=24.98, d_freq=1352326305.1, chirp=-29.776, fft_len=256
Spike: peak=256, time=25.11, d_freq=1352324736.48, chirp=-29.776, fft_len=256
Spike: peak=256, time=25.15, d_freq=1352324958.67, chirp=-29.776, fft_len=256
Spike: peak=256, time=25.4, d_freq=1352324861.94, chirp=-29.776, fft_len=256
Spike: peak=256, time=25.69, d_freq=1352324138.02, chirp=-29.776, fft_len=256
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.

Best spike: peak=256, time=24.42, d_freq=1352323684.25, chirp=-29.776, fft_len=256
Best autocorr: peak=17.2934, time=28.63, delay=4.1283, d_freq=1352324638.74, chirp=-27.965, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.123e+11, d_freq=0,
score=-12, null_hyp=0, chirp=0, fft_len=0
Best pulse: peak=5.914618, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Best triplet: peak=0, time=-2.123e+11, period=0, d_freq=0, chirp=0, fft_len=0

Flopcounter: 18627826515589.917969

Spike count: 27
Autocorr count: 0
Pulse count: 3
Triplet count: 0
Gaussian count: 0
00:12:49 (3124): called boinc_finish(0)

</stderr_txt>
]]>

AND DARWIN VERSION WUT? IT WORX :)

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
OpenCL platform detected: Apple
Number of OpenCL devices found : 1
BOINC assigns slot on device #0.
Info: BOINC provided OpenCL device ID used

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_CHIRP3 ASYNC_SPIKE FFTW SSE3 64bit
System: Darwin x86_64 Kernel: 15.6.0
CPU : Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
GenuineIntel x86, Family 6 Model 60 Stepping 3
Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX1.0

OpenCL-kernels filename : MultiBeam_Kernels_r3321.cl
ar=0.006367 NumCfft=116085 NumGauss=0 NumPulse=46762762368 NumTriplet=59733895584
Currently allocated 185 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768
OS X optimized setiathome_v8 application
Version info: SSE3x (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSE3x OS X 64bit Build 3321 , Ported by : Raistmer, JDWhale, Urs Echternacht


OpenCL version by Raistmer, r3321

AMD HD5 version by Raistmer

Number of OpenCL platforms: 1


OpenCL Platform Name: Apple
Number of devices: 1
Max compute units: 16
Max work group size: 256
Max clock frequency: 975Mhz
Max memory allocation: 536870912
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Queue properties:
Out-of-Order: No
Name: AMD Radeon R9 M290 Compute Engine
Vendor: AMD
Driver version: 1.2 (Aug 29 2016 22:17:00)
Version: OpenCL 1.2
Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_depth_images cl_APPLE_command_queue_priority cl_APPLE_command_queue_select_compute_units cl_khr_fp64


Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 0.006367
Used GPU device parameters are:
Number of compute units: 16
Single buffer allocation size: 128MB
Total device global memory: 2048MB
max WG size: 256
local mem type: Real
LotOfMem path: no
period_iterations_num=50
Spike: peak=24.61832, time=5.727, d_freq=1352321279.39, chirp=-5.9643, fft_len=128k
Spike: peak=26.11531, time=5.727, d_freq=1352321279.39, chirp=-5.9656, fft_len=128k
Spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Spike: peak=25.41257, time=5.727, d_freq=1352321279.37, chirp=-5.9681, fft_len=128k
Spike: peak=24.49572, time=5.727, d_freq=1352321279.39, chirp=-5.9796, fft_len=128k
Spike: peak=25.01146, time=5.727, d_freq=1352321279.39, chirp=-5.9808, fft_len=128k
Spike: peak=24.24305, time=5.727, d_freq=1352321279.38, chirp=-5.9821, fft_len=128k
Pulse: peak=5.91462, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Pulse: peak=2.288693, time=45.84, period=3.867, d_freq=1352320459.77, score=1.002, chirp=-9.1617, fft_len=512
Pulse: peak=3.762485, time=45.86, period=9.015, d_freq=1352323815.94, score=1.004, chirp=-13.957, fft_len=1024
Spike: peak=24.19375, time=85.9, d_freq=1352327150.67, chirp=23.385, fft_len=128k
Spike: peak=24.54029, time=85.9, d_freq=1352327150.67, chirp=23.39, fft_len=128k
Pulse: peak=9.393847, time=46.17, period=28.63, d_freq=1352321706.07, score=1.02, chirp=38.231, fft_len=8k
Pulse: peak=3.339135, time=45.84, period=7.494, d_freq=1352328860.23, score=1, chirp=-40.942, fft_len=512
Pulse: peak=5.770851, time=45.86, period=13.69, d_freq=1352323539.85, score=1.034, chirp=46.311, fft_len=1024
Pulse: peak=1.297241, time=45.82, period=1.746, d_freq=1352326113.32, score=1.011, chirp=-60.411, fft_len=256
Pulse: peak=2.637341, time=45.9, period=4.593, d_freq=1352324522.16, score=1.029, chirp=74.726, fft_len=2k

Best spike: peak=26.40693, time=5.727, d_freq=1352321279.38, chirp=-5.9669, fft_len=128k
Best autocorr: peak=17.29339, time=28.63, delay=4.1283, d_freq=1352324638.74, chirp=-27.965, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.123e+11, d_freq=0,
score=-12, null_hyp=0, chirp=0, fft_len=0
Best pulse: peak=5.91462, time=45.86, period=13.15, d_freq=1352325185.62, score=1.061, chirp=-8.9471, fft_len=1024
Best triplet: peak=0, time=-2.123e+11, period=0, d_freq=0, chirp=0, fft_len=0


Flopcounter: 12534120800678.304688

Spike count: 9
Autocorr count: 0
Pulse count: 8
Triplet count: 0
Gaussian count: 0
Time cpu in use since last restart: 199.5 seconds
GPU device sync requested... ...GPU device synched
20:10:42 (89204): called boinc_finish(0)

</stderr_txt>
]]>
16) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820505)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:

Edit - and the newly-validated one provides an excellent case study. We have an iGPU (HD Graphics 530) with an enormous inconclusive count, and a canonical signal display from the ATi. I'll grab them, and compare after lunch.


I remember this from last year when i noticed something with iGPU on Intel.

Posted: 16 Sep 2015, 14:12:34 UTC Edit Hide Move
Last modified: 16 Sep 2015, 14:12:51 UTC


Hey

Need some assistance Before i start to plunge Deep into my issue.
One of my crunchers has got a new Cpu up and running. Problem is that my Intel GPU is starting to pause work in progress and start on the next and next and so on so my computer is refused new work on the Nvidia GPU.

I presume it's an EDF thing. How is the right way to adress this nowadays?


I bought it solely to Crunch at iGPU and Cpu at the same time as the Igpu is powerful but i sold it and bought a 6700K instead.

http://ark.intel.com/sv/products/88040/Intel-Core-i7-5775C-Processor-6M-Cache-up-to-3_70-GHz

This processor couldn't do Astropulse and just paused the work and started the next unit, no one at lunatics had an answer that solved this back then either.
17) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820504)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
I'd actually not call that validated at all, but we're stuck with a binary choice in the status column.

And this "feature" really hides issues making builds validation and debugging harder (though allows that damned credits receiving of course...)


You got a Point there, because it all gets down to human psychology. If something doesn't needs to be fixed because it won't matter in the end (credits) it won't get fixed that much. But if weakly ones doesnt get a single credit then things would sped up dramtically to make it work or if it can't work then "ban" the computer/platform/gpu combo in the servers instead and don't send units to devices that can't compute them thoroughly. As simple as that really.
It would be ashame if Cuda/Amds GPU hardware gets there but in the end it is the same rules that then would apply for everyone.
18) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820481)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
Now one of them validated them all!

http://setiathome.berkeley.edu/workunit.php?wuid=2276193382
19) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820479)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
If the s@h people says that they want to use IEEE754 in future releases to iron out differences then science wise it should be very welcome.

It's easier to ban an platform/compiler that doesn't conform to those rules in Boinc API if you developers find an combination that doesnt work properly.

https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interchange_formats

Its only to binary32 or decimal32 what serves best from the simpliest cpu application up to monster quadruple gpu/fpga/asic cores in the future. If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard.

EDIT: All this above is to get code more to the Q100 mark whatever platform/combination as possible but as a second step perhaps but as we've noticed that thing that i mention now has nothing to do with the main topic of the thread of inconclusive validations, that is Another thing ofcourse that actually needs to be fixed on Another level because i'm sure that each and every one of those applications if compared to all signals found (30+) would get the Q99+ so they most certainly would validate.
20) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820477)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Post:
There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow..


Isn't you all using fp:precise (single) i just wanted to ask what happens if app is compiled and tested with fp:strict (single) instead! What is the speed penalty of going precise(single) to strict(single)??

This was mentioned in this thread before. Increasing precision is not a solution for overflow tasks

I'm not talking about solving overflow tasks, that was not the purpose. (This was an offtopic question that popped up in my mind)
The purpose in my mind was an overall platform standard that should follow IEEE754 regardless of cpu, x32 x64 arm, gpu. When calculated and fixed correctly then the outcome would be so very Close to Q100 as it possibly can resulting in less headbanging for all of you optimisers in the future.
The idea of me telling you to test for that direction is mainly for you all to switch more to code optimising instead of bughunting various platforms until hell freezes over. It will only increase as i say not decrease.

Until you know for sure that it won't work i will continue to push on this for unification if it isn't so much slower than using precise.
When numbers have been presented here as an comparison then we know 100% if this is not Worth it or not but if going fp:strict is for an example 3% slower but Q is increased to Q99.99 - Q100 range then if i were an Project manager i would vouch to go that route now instead of banging heads for more months/years to come chasing annoying rounding bugs and result disparities.


Next 20


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.