Posts by -= Vyper =-

21) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820456)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor

But to solve this by doing a "find all and sort them afterwards" would mean that every task would have to run to full term, and we'd lose the efficiency of quitting early after 10 seconds or so for the really noisy WUs.


Well if we lose efficiency of quitting early why should validator even "validate" -9 work when the server code could see .. "Ohh geez this is a overflow result! Thanks! Here is your credits!" if compared to other -9s

If the device sends a -9 result back but the other application sees this as a real result then you should be awarded zero credits anyway.
22) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820435)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
this doesn't guarantee other compilers, or hardware device manufacturers implemented their chips in the bit identical way suggested


Yes but if other compilers or hardware isn't bit identical then there would be a flaw in their IEEE754 implementation and you all would know that and needs to tacle that platform or device differently and puts effort there!

I'm only suggesting that IEEE754 should be used so the majority of applications get to the Q100 mark! Then you all know that when compiling under Linux,Windows, Bla bla This work as intended and when a new version breaks it then you would know it 100% for sure and could revert back or "change lines in the code" required to get to Q100 mark.

Haven't mentioned validation as it could validate non Q100 results also but i'm proposing this as a base and way of thinking route to ease future headache instead.
23) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820427)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor


If two apps find the same subset of 30 out of the available 50, then I'm pretty sure the validator will pass the result, even if the reporting order is different - I had a walk through the code a few days ago.

But if the app - by doing a parallel search - finds a different subset of 30 from 50, then the results are different, and no amount of tweaking the validator is going make any difference.


Yup, that is so totally true! Thats why i'm naging about that the result sent back should be unified (presentation wise) so the stock CPU has a sorting routine incorporated in the future if so and every other application aswell so we won't ever get this again in the future. If a WU is overflowed it is ofcourse and it's crap. But why perhaps don't get credit for 5600 seconds of cputime if it gets ironed out of other "juggling order" applications when you could do the code right from the beginning?

Incorporate a result sorting routine in main s@h code and let the other (Tweakers follow its lead). The only thing in the future all would get is less headache when dealing with other forthcoming optimisations and variations which will only increase, not decrease :-/
24) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820423)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor


In essense if it comes down to precise Vs Strict with single floats (which I doubt this is), then double precision should be used if it's really necessary.


I don't really see the need of it. They only need to use IEE strict and use single precision (single precision 32 bits ±1.18×10−38 to ±3.4×1038 approx. 7 decimal digits ) , That should set every application and even CPUs so we would get the Q=100 mark.
And if it indeed doesn't then validator portion of code needs to be adressed.

Perhaps something for you all to pursue Eric to go down this route if this isn't so extremely more slow than other FP modes. It seems like of S@H goes down this IEE route then i Believe you coders would get alot less headache in the future when you're optimising the analyzing part and can focus more on development instead of chasing rounding bugs slipped through.

Compare this to write some code in c++ compared to pure machine code. What is the most easy code to maintain when bugs arise? :) , Certainly not the good old Classic F8 E4 code Lol...
25) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820417)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
could be tricky to locate the source of cumulative error.


Indeed! And this needs to be addressed "NOW", not in the future or later on because the variety of different apps, platforms, compilers, Cuda, OpenCL, Vulkan bla bla yada yada is increasing and thus this problem increases exponentially.

I Think that what we're seeing now were a "non issue" in the past Before 2010 where the majority of computers were CPU based (Serial non-interesting-output) but now more and more ppl add their PS3s, Androids, AMD Gpus , Nvidia GPUs bla bla this "inconclusive era" seems to have got out of reach in every app produced! Not to mention the real black Apple issue sheep!

This is only my way of seeing this, and perhaps real old code that worked perfectly in a CPU-only world needs to be changed and i'm not talking about the analyzing part that you guys are tweaking the hell out of, perhaps the Server Validator code needs to be changed that perhaps was written in 2006 where we had none of the new devices that pops up regularly.
If that code part is "stupid" and doesn't do the sorting and juggling required to do then you coders "need to patch the outgoing results from the analyze so the validator gets it because it is serial-coded-minded instead of parallell-coded-minded"
26) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820414)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
fp: precise leads to inconclusive results vs stock.
Better to forget about fp:precise completely.
This is non-portable feature of CPU.


Hmm

http://stackoverflow.com/questions/12514516/difference-between-fpstrict-and-fpprecise
I Think that strict should be used in every code produced if i read this above. "bitwise compatibility between different compilers and platforms"
https://en.wikipedia.org/wiki/IEEE_754-1985

I posted this for others to get it too as to why!
27) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820411)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
I'm open to debate, however my opinion is that CPU serial dictates order by rules typically adopted implementing parallel algorithms, in they must always be reducible to serial form and produce the same result.


Exactly what i Believe also, all finds in all apps need to be uploaded in the same order-of-sequence to not fall in the inconclusive ballpark (as it seems?!) . It's easy when WUs get compared to the stock application (CPU) and gets validated in the end but Think of it when a WU is sent to for instance an Android, Apple Darwin, SoG and Cuda (What application is more right and wrong than the other is hard to find if this isn't addresed) and none of them passes through because the result sent back is Always different in some way even if it perhaps get Q99+ for real (Or does it really and everyone Believes that the code works?!). Has anyone looked at the results in a excel spreadsheet and try to sort and compare the results there (Or other human viewable application) :) Lol
28) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820403)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Should I check the other one ?


Yes! Please do! Because i'm very curious of the outcome of this experience and if this issue gets sorted out then we will probably see alot of false-positives that will vanish.

As in your testing, you got strong Q and when i recieved a message from Petri with his "banks" of testwus all of them were in the Q99+ range when he ran his application but yet still they seem to fall into the inconclusive swamp anyway.
29) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820401)
Posted 29 Sep 2016 by Profile -= Vyper =-Project Donor
Nice find Jeff!

This pretty much Nail it with what i'm trying to say for a time now that the validator seem to detect anomilies and it says that the validator doesn't bother about out-of-order when dealing with WUs with high amount of detections.

We will see how this will pan out and i Believe this pretty much Nails it that sorting may be required or a revamp of the validator code (or be nothing less of 100% sure of how it handles information in comparison of other result sent in) as i'm not a coder anylonger but only a Think-tank :)
30) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820148)
Posted 27 Sep 2016 by Profile -= Vyper =-Project Donor
Thanks! This was neat scripts!
Gonna explore later on..
31) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1818437)
Posted 20 Sep 2016 by Profile -= Vyper =-Project Donor
I still have struggles with accepting why the validator then marks a result as invalid in the first attempt ...

It doesn't - it marks them as inconclusive, which is an important distinction.

... but when the third machine comes along it suddenly marks all results as valid.

That's because of the generosity of the SETI staff, who award bonus credit for a 'near miss' (weakly similar). I personally think that the credit for weakly similar tasks should be 50%, to alert users to the fact that their work isn't truly valid.


Spot on! That should be it. Because some of the later outputs regarding Petris ops was that it is strongly similar against a large set of different WUs and GBTs etc, we're talking 99%+ on every different type of WU thrown at it.
In that case it would actually be more enlightning with that the similar ratio is printed viewable for everyone on the invalid and inconclusive page.
32) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1818426)
Posted 20 Sep 2016 by Profile -= Vyper =-Project Donor
Alright then! I still have struggles with accepting why the validator then marks a result as invalid in the first attempt but when the third machine comes along it suddenly marks all results as valid.
If the results were indeed bad why does the invalid rate stays so low anyway.

http://setiathome.berkeley.edu/results.php?hostid=8094722&offset=0&show_names=0&state=5&appid=

and

http://setiathome.berkeley.edu/results.php?hostid=8053171&offset=0&show_names=0&state=5&appid=
33) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1818410)
Posted 20 Sep 2016 by Profile -= Vyper =-Project Donor
1) The order of processing is different. The check for triplets, pulses, spikes, gaussians and autocorrelations is not done in the same order as in main version. Pulses tend to take longest on GPU, so I check them last. I see no problem in sending a 4 second task for rechecking with another host. The data is invalid anyway. I could store the findings and report them at the same order as main but that is not my priority right now.

There may be over 30 pulses, over 30 triplets, over 30 autocorrelations over 30 spikes in the same packet. Any of them can cause an overflow and some of them may have not been processed yet. Parallel execution is different from sequential.


Hmm, i don't question your parallell thingy you've been doing, that's great and awesome! Progress is the key.
I'm more thinking that the validator server software is "stupid" written and don't sort the incoming data in terms of pulse strength, spike strength etc etc so it gets aligned and compared row-for-row.

If we take the main software (CPU) and it for instance reports on row 3 a result with a strength of 23.45 and then your application sends Another of 24.32 and then later on reports 23.45 in that Place i really Think the validator gets confused and your application gets a "inconclusive mark" even if we dig through all the results back in fact is real when sorted and matched.

In my World thats called a "false positive" but is incorrectly used here when it actually is legit and get a huge amount of high "inconclusive result" tags. The priority is ofcourse to iron out miscalculations etc but it tends to drown in a 12% inconclusive list but if the output is sorted correctly as the validator expects it it may could go down to 1-2% instead.

Much more easy to spot the real problem WUs intend of drowning in "false positives".

What do you others Think of it? Is it the way the validator server code actually works? Or does the validator itself do the sorting and rechecking when two Machines get a mismatch and then when the third gets along and sends it data and suddenly all Machines get a "valid result" and you are awarded the credit?!

Just trying to sort out the right thing to do to easy the load and Resources to all developers and codewizards now, if it is an quite easy fix to do then i actually suggest sorting and shuffling the calculated data to match the real application in the end Before sending the result back to S@H for validation :)
34) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1818384)
Posted 20 Sep 2016 by Profile -= Vyper =-Project Donor
I've got alot of Those overflow units and it gets invalid at first on my big host.
Petri, do you Think your code differ in error management or priority until buffer overflow and sent to the validator if so i Think there is much to be had just to get the numbers sorted equally on the output of those Quick overflows.
Otherwise the app seems very solid when crunching real "non-overflowed" work a few invalids here and there but mostly the numbers (Tripplet, Spike, Pulse, Gaussian) etc are matching.
Just a hint to grab a few quickies and compare because they error out on regular code, SoG code so there is an anomalie there it seems.

If this issue with quickies get fixed i seriousky Think most of the irregularities would actually be solved. Perhaps it's just subroutine juggling that needs to be addressed in the correct order for the output to match for validator to grab.

Shaggie76: Could you write a script that gets the internal data and matches it to host number when a workunit falls into the inconclusive column?
What i want to accomplish is a database of what the inconclusive are matched to (cuda, SoG, IntelX86) and when clicked further the summary of pulses, spikes, etc etc compared to the other host. That could just be called statistics alpha gathering to see if the main numbers differ anywhere.
You could monitor my two Linux hosts and others using the new Code. Perhaps a percentage between validated and inconclusive also! Is this asking too much? :)

My 2 cents!

Example:

5165706276 8055485 19 Sep 2016, 13:13:35 UTC 19 Sep 2016, 14:13:36 UTC Completed, validation inconclusive 11.56 11.53 pending SETI@home v8 v8.00 (cuda32)
windows_intelx86
5165706277 8053171 19 Sep 2016, 13:13:44 UTC 20 Sep 2016, 5:06:44 UTC Completed, validation inconclusive 4.11 1.75 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5167724779 --- --- --- Unsent --- --- --- ---

5165608260 8053171 19 Sep 2016, 12:14:52 UTC 20 Sep 2016, 4:15:01 UTC Completed, validation inconclusive 4.22 1.79 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165608261 7737824 19 Sep 2016, 12:14:52 UTC 19 Sep 2016, 12:19:59 UTC Completed, validation inconclusive 21.14 12.95 pending SETI@home v8 v8.12 (opencl_nvidia_SoG)
windows_intelx86
5167629550 --- --- --- Unsent --- --- --- ---

5165582012 8053171 19 Sep 2016, 11:59:22 UTC 20 Sep 2016, 4:15:01 UTC Completed, validation inconclusive 4.21 1.86 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165582013 7740995 19 Sep 2016, 11:59:21 UTC 19 Sep 2016, 12:04:31 UTC Completed, validation inconclusive 13.76 10.52 pending SETI@home v8 v8.12 (opencl_ati5_cat132)
windows_intelx86
5167629376 --- --- --- Unsent --- --- --- ---

5150302343 7923287 11 Sep 2016, 13:31:35 UTC 19 Sep 2016, 8:33:13 UTC Completed, validation inconclusive 123.85 118.97 pending SETI@home v8 v8.12 (opencl_nvidia_SoG)
windows_intelx86
5150302344 7814899 11 Sep 2016, 13:31:37 UTC 12 Sep 2016, 7:26:21 UTC Completed, validation inconclusive 641.23 517.47 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165739708 8053171 19 Sep 2016, 13:34:25 UTC 20 Sep 2016, 5:27:26 UTC Completed, validation inconclusive 41.50 16.48 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5167766155 --- --- --- Unsent --- --- --- ---

5165287979 8053171 19 Sep 2016, 8:59:20 UTC 20 Sep 2016, 2:16:02 UTC Completed, validation inconclusive 4.22 1.76 pending SETI@home v8
Anonymous platform (NVIDIA GPU)
5165287980 8096298 19 Sep 2016, 8:59:20 UTC 19 Sep 2016, 18:33:42 UTC Completed, validation inconclusive 22.57 21.66 pending SETI@home v8 v8.00
windows_intelx86
5167459854 --- --- --- Unsent --- --- --- ---
35) Message boards : Number crunching : Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again) (Message 1817277)
Posted 15 Sep 2016 by Profile -= Vyper =-Project Donor
Yeah i'm positive that it must be something that need to be juiced at the GPU, Lower ramspeed, lower gpu speed, increase ref. voltage or so.

If it still errors out i would start to Think that the PSU is rippling as hell when it computes instead of having a solid +12V line.

The Cpu can get hot but for 24/7 use i would suggest that the inner core temperature of the Cpu is no more than 70-75 degrees Celsius (158 degrees F) in the long run when all cores are Active.
36) Message boards : Number crunching : Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again) (Message 1816166)
Posted 10 Sep 2016 by Profile -= Vyper =-Project Donor
Robert miles: Please try to increase the corevoltage on the gpu a notch. You can use nvidia inspector to do so. Increase slowly each day to see if your problems dissapear.
37) Message boards : Number crunching : MB v8: CPU vs GPU (in terms of efficiency) (Message 1815876)
Posted 9 Sep 2016 by Profile -= Vyper =-Project Donor
In Another thread i posted this that reflects how much juice it's required for work:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.35 Driver Version: 367.35 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti Off | 0000:01:00.0 Off | N/A |
| 39% 57C P0 23W / 46W | 1016MiB / 1998MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 750 Ti Off | 0000:02:00.0 Off | N/A |
| 40% 58C P0 27W / 46W | 1016MiB / 2000MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 750 Ti Off | 0000:04:00.0 Off | N/A |
| 38% 53C P0 25W / 46W | 1016MiB / 2000MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 750 Ti Off | 0000:05:00.0 Off | N/A |
| 37% 51C P0 24W / 46W | 1016MiB / 2000MiB | 98% Default |
+-------------------------------+----------------------+----------------------+

It seems like each card consumes about 25W when crunching on my Quad 750TI host.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=8053171

But then again you need to take into account that you need a computer "around the cards" to drive it. But i presume that computer consumes around 200W at the wall but i can't confirm it though.
38) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1815441)
Posted 6 Sep 2016 by Profile -= Vyper =-Project Donor
Has anyone the knowledge exactly how the validator works? I mean how much is the tolerance, does it compare spikes, tripplets, gaussians and pulses with exactly the same tolerance in percentage?
How much is the +/- percentage even what it consider a mismatch and why does it later on even validate the result when it first marked it with a invalid.
Whats the difference there first and second time?

EDIT: Could perhaps look at the server portion of the code if its public available
39) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1815436)
Posted 6 Sep 2016 by Profile -= Vyper =-Project Donor
If we're about to take the original S@H code today and compare it against all platforms would all variants produce 100% instead of 99+%?

We're talking a bunch of WU's and a whole lot of variants:

Debian, Ubuntu , Arch , Mint etc x32, x64.
Apple variants
Windows Vista,7,8,8.1,10,Server 2008,Server 2012 x32 and x64
Arm Linux, Android etc etc.

There are alot to compare through in a chart to see if the cpu portion of the code works as intended on all platforms with 100% similarity.
If not then i Think s@h themselves need to have a plan to solve this because if the cpu portion of the executables isn't a 100% match how on Earth would we ever get as Close to 100% on GPUs when the problems would be multiplied within the code.

Just a thought and my 2 cents of thinking outside the box now.
40) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1815331)
Posted 6 Sep 2016 by Profile -= Vyper =-Project Donor
Thats insane, and that is without changing a single line of code, only exchanging O/S version?
If so i wonder what is going on really. Seems like s@h need to ban El Capitan or what is the suggestion now?


Previous 20 · Next 20


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.