Posts by -= Vyper =-


log in
1) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820505)
Posted 1 day ago by Profile -= Vyper =-

Edit - and the newly-validated one provides an excellent case study. We have an iGPU (HD Graphics 530) with an enormous inconclusive count, and a canonical signal display from the ATi. I'll grab them, and compare after lunch.


I remember this from last year when i noticed something with iGPU on Intel.

Posted: 16 Sep 2015, 14:12:34 UTC Edit Hide Move
Last modified: 16 Sep 2015, 14:12:51 UTC


Hey

Need some assistance Before i start to plunge Deep into my issue.
One of my crunchers has got a new Cpu up and running. Problem is that my Intel GPU is starting to pause work in progress and start on the next and next and so on so my computer is refused new work on the Nvidia GPU.

I presume it's an EDF thing. How is the right way to adress this nowadays?


I bought it solely to Crunch at iGPU and Cpu at the same time as the Igpu is powerful but i sold it and bought a 6700K instead.

http://ark.intel.com/sv/products/88040/Intel-Core-i7-5775C-Processor-6M-Cache-up-to-3_70-GHz

This processor couldn't do Astropulse and just paused the work and started the next unit, no one at lunatics had an answer that solved this back then either.
2) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820504)
Posted 1 day ago by Profile -= Vyper =-
I'd actually not call that validated at all, but we're stuck with a binary choice in the status column.

And this "feature" really hides issues making builds validation and debugging harder (though allows that damned credits receiving of course...)


You got a Point there, because it all gets down to human psychology. If something doesn't needs to be fixed because it won't matter in the end (credits) it won't get fixed that much. But if weakly ones doesnt get a single credit then things would sped up dramtically to make it work or if it can't work then "ban" the computer/platform/gpu combo in the servers instead and don't send units to devices that can't compute them thoroughly. As simple as that really.
It would be ashame if Cuda/Amds GPU hardware gets there but in the end it is the same rules that then would apply for everyone.
3) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820481)
Posted 1 day ago by Profile -= Vyper =-
Now one of them validated them all!

http://setiathome.berkeley.edu/workunit.php?wuid=2276193382
4) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820479)
Posted 1 day ago by Profile -= Vyper =-
If the s@h people says that they want to use IEEE754 in future releases to iron out differences then science wise it should be very welcome.

It's easier to ban an platform/compiler that doesn't conform to those rules in Boinc API if you developers find an combination that doesnt work properly.

https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interchange_formats

Its only to binary32 or decimal32 what serves best from the simpliest cpu application up to monster quadruple gpu/fpga/asic cores in the future. If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard.

EDIT: All this above is to get code more to the Q100 mark whatever platform/combination as possible but as a second step perhaps but as we've noticed that thing that i mention now has nothing to do with the main topic of the thread of inconclusive validations, that is Another thing ofcourse that actually needs to be fixed on Another level because i'm sure that each and every one of those applications if compared to all signals found (30+) would get the Q99+ so they most certainly would validate.
5) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820477)
Posted 1 day ago by Profile -= Vyper =-
There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow..


Isn't you all using fp:precise (single) i just wanted to ask what happens if app is compiled and tested with fp:strict (single) instead! What is the speed penalty of going precise(single) to strict(single)??

This was mentioned in this thread before. Increasing precision is not a solution for overflow tasks

I'm not talking about solving overflow tasks, that was not the purpose. (This was an offtopic question that popped up in my mind)
The purpose in my mind was an overall platform standard that should follow IEEE754 regardless of cpu, x32 x64 arm, gpu. When calculated and fixed correctly then the outcome would be so very Close to Q100 as it possibly can resulting in less headbanging for all of you optimisers in the future.
The idea of me telling you to test for that direction is mainly for you all to switch more to code optimising instead of bughunting various platforms until hell freezes over. It will only increase as i say not decrease.

Until you know for sure that it won't work i will continue to push on this for unification if it isn't so much slower than using precise.
When numbers have been presented here as an comparison then we know 100% if this is not Worth it or not but if going fp:strict is for an example 3% slower but Q is increased to Q99.99 - Q100 range then if i were an Project manager i would vouch to go that route now instead of banging heads for more months/years to come chasing annoying rounding bugs and result disparities.
6) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820465)
Posted 1 day ago by Profile -= Vyper =-
Gentlemen!

I sum it up so far as i'm happy that a mindset has been brought up to Daylight instead of pinpointing applications to the left and right.
Something needs to be done and Jeff highlited with hardproof of what i've seen and haven't been able to express verbaly what i've seen and figured out to be addressed.

Now that we're all seem to be on the same page and recognizing that we have issues in various different platforms/apps/compilers whatever then from now forth i Think the solution might popup in all of this later on in how to Think and act to resolve this and forthcoming issues.

My hat of to all of u lads!

Now if we go back to idea of FP:strict (IEEE754) vs anything else Double precision etc etc, Have any of you an idea of speed penalty of going strict instead of precise, double precise is? If going to fp:strict single precision (More isn't needed apparently) is a few percent slower then so be it for the sake of conformity! But if it is half the speed etc then, No that is not the route to go "for now" but instead of focusing in the validator/re-order-of-work-reported issue that seems to be apparent.
7) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820456)
Posted 2 days ago by Profile -= Vyper =-

But to solve this by doing a "find all and sort them afterwards" would mean that every task would have to run to full term, and we'd lose the efficiency of quitting early after 10 seconds or so for the really noisy WUs.


Well if we lose efficiency of quitting early why should validator even "validate" -9 work when the server code could see .. "Ohh geez this is a overflow result! Thanks! Here is your credits!" if compared to other -9s

If the device sends a -9 result back but the other application sees this as a real result then you should be awarded zero credits anyway.
8) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820435)
Posted 2 days ago by Profile -= Vyper =-
this doesn't guarantee other compilers, or hardware device manufacturers implemented their chips in the bit identical way suggested


Yes but if other compilers or hardware isn't bit identical then there would be a flaw in their IEEE754 implementation and you all would know that and needs to tacle that platform or device differently and puts effort there!

I'm only suggesting that IEEE754 should be used so the majority of applications get to the Q100 mark! Then you all know that when compiling under Linux,Windows, Bla bla This work as intended and when a new version breaks it then you would know it 100% for sure and could revert back or "change lines in the code" required to get to Q100 mark.

Haven't mentioned validation as it could validate non Q100 results also but i'm proposing this as a base and way of thinking route to ease future headache instead.
9) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820427)
Posted 2 days ago by Profile -= Vyper =-


If two apps find the same subset of 30 out of the available 50, then I'm pretty sure the validator will pass the result, even if the reporting order is different - I had a walk through the code a few days ago.

But if the app - by doing a parallel search - finds a different subset of 30 from 50, then the results are different, and no amount of tweaking the validator is going make any difference.


Yup, that is so totally true! Thats why i'm naging about that the result sent back should be unified (presentation wise) so the stock CPU has a sorting routine incorporated in the future if so and every other application aswell so we won't ever get this again in the future. If a WU is overflowed it is ofcourse and it's crap. But why perhaps don't get credit for 5600 seconds of cputime if it gets ironed out of other "juggling order" applications when you could do the code right from the beginning?

Incorporate a result sorting routine in main s@h code and let the other (Tweakers follow its lead). The only thing in the future all would get is less headache when dealing with other forthcoming optimisations and variations which will only increase, not decrease :-/
10) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820423)
Posted 2 days ago by Profile -= Vyper =-


In essense if it comes down to precise Vs Strict with single floats (which I doubt this is), then double precision should be used if it's really necessary.


I don't really see the need of it. They only need to use IEE strict and use single precision (single precision 32 bits ±1.18×10−38 to ±3.4×1038 approx. 7 decimal digits ) , That should set every application and even CPUs so we would get the Q=100 mark.
And if it indeed doesn't then validator portion of code needs to be adressed.

Perhaps something for you all to pursue Eric to go down this route if this isn't so extremely more slow than other FP modes. It seems like of S@H goes down this IEE route then i Believe you coders would get alot less headache in the future when you're optimising the analyzing part and can focus more on development instead of chasing rounding bugs slipped through.

Compare this to write some code in c++ compared to pure machine code. What is the most easy code to maintain when bugs arise? :) , Certainly not the good old Classic F8 E4 code Lol...
11) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820417)
Posted 2 days ago by Profile -= Vyper =-
could be tricky to locate the source of cumulative error.


Indeed! And this needs to be addressed "NOW", not in the future or later on because the variety of different apps, platforms, compilers, Cuda, OpenCL, Vulkan bla bla yada yada is increasing and thus this problem increases exponentially.

I Think that what we're seeing now were a "non issue" in the past Before 2010 where the majority of computers were CPU based (Serial non-interesting-output) but now more and more ppl add their PS3s, Androids, AMD Gpus , Nvidia GPUs bla bla this "inconclusive era" seems to have got out of reach in every app produced! Not to mention the real black Apple issue sheep!

This is only my way of seeing this, and perhaps real old code that worked perfectly in a CPU-only world needs to be changed and i'm not talking about the analyzing part that you guys are tweaking the hell out of, perhaps the Server Validator code needs to be changed that perhaps was written in 2006 where we had none of the new devices that pops up regularly.
If that code part is "stupid" and doesn't do the sorting and juggling required to do then you coders "need to patch the outgoing results from the analyze so the validator gets it because it is serial-coded-minded instead of parallell-coded-minded"
12) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820414)
Posted 2 days ago by Profile -= Vyper =-
fp: precise leads to inconclusive results vs stock.
Better to forget about fp:precise completely.
This is non-portable feature of CPU.


Hmm

http://stackoverflow.com/questions/12514516/difference-between-fpstrict-and-fpprecise
I Think that strict should be used in every code produced if i read this above. "bitwise compatibility between different compilers and platforms"
https://en.wikipedia.org/wiki/IEEE_754-1985

I posted this for others to get it too as to why!
13) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820411)
Posted 2 days ago by Profile -= Vyper =-
I'm open to debate, however my opinion is that CPU serial dictates order by rules typically adopted implementing parallel algorithms, in they must always be reducible to serial form and produce the same result.


Exactly what i Believe also, all finds in all apps need to be uploaded in the same order-of-sequence to not fall in the inconclusive ballpark (as it seems?!) . It's easy when WUs get compared to the stock application (CPU) and gets validated in the end but Think of it when a WU is sent to for instance an Android, Apple Darwin, SoG and Cuda (What application is more right and wrong than the other is hard to find if this isn't addresed) and none of them passes through because the result sent back is Always different in some way even if it perhaps get Q99+ for real (Or does it really and everyone Believes that the code works?!). Has anyone looked at the results in a excel spreadsheet and try to sort and compare the results there (Or other human viewable application) :) Lol
14) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820403)
Posted 2 days ago by Profile -= Vyper =-
Should I check the other one ?


Yes! Please do! Because i'm very curious of the outcome of this experience and if this issue gets sorted out then we will probably see alot of false-positives that will vanish.

As in your testing, you got strong Q and when i recieved a message from Petri with his "banks" of testwus all of them were in the Q99+ range when he ran his application but yet still they seem to fall into the inconclusive swamp anyway.
15) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820401)
Posted 2 days ago by Profile -= Vyper =-
Nice find Jeff!

This pretty much Nail it with what i'm trying to say for a time now that the validator seem to detect anomilies and it says that the validator doesn't bother about out-of-order when dealing with WUs with high amount of detections.

We will see how this will pan out and i Believe this pretty much Nails it that sorting may be required or a revamp of the validator code (or be nothing less of 100% sure of how it handles information in comparison of other result sent in) as i'm not a coder anylonger but only a Think-tank :)
16) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820148)
Posted 4 days ago by Profile -= Vyper =-
Thanks! This was neat scripts!
Gonna explore later on..
17) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1818437)
Posted 10 days ago by Profile -= Vyper =-
I still have struggles with accepting why the validator then marks a result as invalid in the first attempt ...

It doesn't - it marks them as inconclusive, which is an important distinction.

... but when the third machine comes along it suddenly marks all results as valid.

That's because of the generosity of the SETI staff, who award bonus credit for a 'near miss' (weakly similar). I personally think that the credit for weakly similar tasks should be 50%, to alert users to the fact that their work isn't truly valid.


Spot on! That should be it. Because some of the later outputs regarding Petris ops was that it is strongly similar against a large set of different WUs and GBTs etc, we're talking 99%+ on every different type of WU thrown at it.
In that case it would actually be more enlightning with that the similar ratio is printed viewable for everyone on the invalid and inconclusive page.
18) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1818426)
Posted 10 days ago by Profile -= Vyper =-
Alright then! I still have struggles with accepting why the validator then marks a result as invalid in the first attempt but when the third machine comes along it suddenly marks all results as valid.
If the results were indeed bad why does the invalid rate stays so low anyway.

http://setiathome.berkeley.edu/results.php?hostid=8094722&offset=0&show_names=0&state=5&appid=

and

http://setiathome.berkeley.edu/results.php?hostid=8053171&offset=0&show_names=0&state=5&appid=
19) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1818410)
Posted 10 days ago by Profile -= Vyper =-
1) The order of processing is different. The check for triplets, pulses, spikes, gaussians and autocorrelations is not done in the same order as in main version. Pulses tend to take longest on GPU, so I check them last. I see no problem in sending a 4 second task for rechecking with another host. The data is invalid anyway. I could store the findings and report them at the same order as main but that is not my priority right now.

There may be over 30 pulses, over 30 triplets, over 30 autocorrelations over 30 spikes in the same packet. Any of them can cause an overflow and some of them may have not been processed yet. Parallel execution is different from sequential.


Hmm, i don't question your parallell thingy you've been doing, that's great and awesome! Progress is the key.
I'm more thinking that the validator server software is "stupid" written and don't sort the incoming data in terms of pulse strength, spike strength etc etc so it gets aligned and compared row-for-row.

If we take the main software (CPU) and it for instance reports on row 3 a result with a strength of 23.45 and then your application sends Another of 24.32 and then later on reports 23.45 in that Place i really Think the validator gets confused and your application gets a "inconclusive mark" even if we dig through all the results back in fact is real when sorted and matched.

In my World thats called a "false positive" but is incorrectly used here when it actually is legit and get a huge amount of high "inconclusive result" tags. The priority is ofcourse to iron out miscalculations etc but it tends to drown in a 12% inconclusive list but if the output is sorted correctly as the validator expects it it may could go down to 1-2% instead.

Much more easy to spot the real problem WUs intend of drowning in "false positives".

What do you others Think of it? Is it the way the validator server code actually works? Or does the validator itself do the sorting and rechecking when two Machines get a mismatch and then when the third gets along and sends it data and suddenly all Machines get a "valid result" and you are awarded the credit?!

Just trying to sort out the right thing to do to easy the load and Resources to all developers and codewizards now, if it is an quite easy fix to do then i actually suggest sorting and shuffling the calculated data to match the real application in the end Before sending the result back to S@H for validation :)
20) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1818384)
Posted 11 days ago by Profile -= Vyper =-
I've got alot of Those overflow units and it gets invalid at first on my big host.
Petri, do you Think your code differ in error management or priority until buffer overflow and sent to the validator if so i Think there is much to be had just to get the numbers sorted equally on the output of those Quick overflows.
Otherwise the app seems very solid when crunching real "non-overflowed" work a few invalids here and there but mostly the numbers (Tripplet, Spike, Pulse, Gaussian) etc are matching.
Just a hint to grab a few quickies and compare because they error out on regular code, SoG code so there is an anomalie there it seems.

If this issue with quickies get fixed i seriousky Think most of the irregularities would actually be solved. Perhaps it's just subroutine juggling that needs to be addressed in the correct order for the output to match for validator to grab.

Shaggie76: Could you write a script that gets the internal data and matches it to host number when a workunit falls into the inconclusive column?
What i want to accomplish is a database of what the inconclusive are matched to (cuda, SoG, IntelX86) and when clicked further the summary of pulses, spikes, etc etc compared to the other host. That could just be called statistics alpha gathering to see if the main numbers differ anywhere.
You could monitor my two Linux hosts and others using the new Code. Perhaps a percentage between validated and inconclusive also! Is this asking too much? :)

My 2 cents!

Example:

5165706276 8055485 19 Sep 2016, 13:13:35 UTC 19 Sep 2016, 14:13:36 UTC Completed, validation inconclusive 11.56 11.53 pending SETI@home v8 v8.00 (cuda32)
windows_intelx86
5165706277 8053171 19 Sep 2016, 13:13:44 UTC 20 Sep 2016, 5:06:44 UTC Completed, validation inconclusive 4.11 1.75 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5167724779 --- --- --- Unsent --- --- --- ---

5165608260 8053171 19 Sep 2016, 12:14:52 UTC 20 Sep 2016, 4:15:01 UTC Completed, validation inconclusive 4.22 1.79 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165608261 7737824 19 Sep 2016, 12:14:52 UTC 19 Sep 2016, 12:19:59 UTC Completed, validation inconclusive 21.14 12.95 pending SETI@home v8 v8.12 (opencl_nvidia_SoG)
windows_intelx86
5167629550 --- --- --- Unsent --- --- --- ---

5165582012 8053171 19 Sep 2016, 11:59:22 UTC 20 Sep 2016, 4:15:01 UTC Completed, validation inconclusive 4.21 1.86 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165582013 7740995 19 Sep 2016, 11:59:21 UTC 19 Sep 2016, 12:04:31 UTC Completed, validation inconclusive 13.76 10.52 pending SETI@home v8 v8.12 (opencl_ati5_cat132)
windows_intelx86
5167629376 --- --- --- Unsent --- --- --- ---

5150302343 7923287 11 Sep 2016, 13:31:35 UTC 19 Sep 2016, 8:33:13 UTC Completed, validation inconclusive 123.85 118.97 pending SETI@home v8 v8.12 (opencl_nvidia_SoG)
windows_intelx86
5150302344 7814899 11 Sep 2016, 13:31:37 UTC 12 Sep 2016, 7:26:21 UTC Completed, validation inconclusive 641.23 517.47 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165739708 8053171 19 Sep 2016, 13:34:25 UTC 20 Sep 2016, 5:27:26 UTC Completed, validation inconclusive 41.50 16.48 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5167766155 --- --- --- Unsent --- --- --- ---

5165287979 8053171 19 Sep 2016, 8:59:20 UTC 20 Sep 2016, 2:16:02 UTC Completed, validation inconclusive 4.22 1.76 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165287980 8096298 19 Sep 2016, 8:59:20 UTC 19 Sep 2016, 18:33:42 UTC Completed, validation inconclusive 22.57 21.66 pending SETI@home v8 v8.00
windows_intelx86
5167459854 --- --- --- Unsent --- --- --- ---


Next 20

Copyright © 2016 University of California