Posts by jason_gee

21) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875279)
Posted 26 Jun 2017 by Profile jason_gee
Post:
I know there is a problem with my code reporting over 20 pulses at identical time with a small difference in frequency. That is an extremely rare event. And it always happens at 46.something.
That sounds like the problem I was running into with my GTX 780 (now replaced by a GTX 980), which I detailed in Message 1864874. In fact, with the Cuda8.0 Special App, it was happening quite frequently. Dialing back to the Cuda6.5 version, it became rare, but didn't go away entirely. It has never (yet) shown up on any of my other cards (GTX 750Ti, GTX 960, GTX 980). You'd need to find somebody else running a 780 to see if the problem is common to that model or unique to my card.


I'll put my 780 back in the Mac Pro on the weekend. Its unique Hyper-Q feature might be in play, differs by OS and Cuda version in subtle implementation ways.
22) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875255)
Posted 26 Jun 2017 by Profile jason_gee
Post:
SoG has own parallelized reduction for Gaussians (should implement same logic though).
And what warries me - the difference between SoG and non-SoG OpenCL results - that's definitely worth check when I'll have easy access to hardware for that.


Yes, I've lost track of which apps match and which don't now, and have yet to examine in detail Petri's fix for the pulse race condition also. So plenty to examine as the dust falls out.
23) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875249)
Posted 26 Jun 2017 by Profile jason_gee
Post:
...
But chi-square check seems to be present.

Saw that part, but not sure in which codepaths that's active (e.g. SoG). If active in all paths then will have to arrange a bench with 8.00 Win32 reference, then call for suspects. Will have to be after Wednesday for me. [I'm not clear on if this particular Gaussian rabbit-hole has enough impact to be concerned about, but understanding it would be good for me]
24) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875138)
Posted 26 Jun 2017 by Profile jason_gee
Post:
Correct, Best Pulse has nothing to do with Best Gaussian (Those certainly can be separately influenced by Unroll, and if uncorrected [by Petri yet] will be random + rare problem event )

The fact there is indeed a 'cluster' of different things going on, is precisely why stock Win32 CPU is considered reference (not OpenCL, not Cuda, Not Linux, not AK, or anything else)
25) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875115)
Posted 26 Jun 2017 by Profile jason_gee
Post:
All the OpenCL App s come from AKv8 as far as I know. That includes the Apps that don't use the SoG path and work, such as my r3567 and the Non-SoG r3584 Linux App.


Ugh, that's a lot of builds if really [some] missing the fix, as it appears. [Probably Raistmer will have to identify which use codebases with the fix, as there are a lot of alternate codepaths there]

[Edit:] Some paths appear to have their own implementation of something similar, some not.
26) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875112)
Posted 26 Jun 2017 by Profile jason_gee
Post:
Which branch is that MB8 derived from ? Stock seti_boinc master ? or AKv8 ? The difference may be important here.
[which one(s) differ to reference Windows/x86 8.00 may point in the right directions]
My OSX CPU App r3344 is from AKv8 and it has the same results as r3330.


Probably that fix is missing from the AK derived builds then [seems to be the case, and includes an SIGNALS_ON_GPU path in the same file (sah_v7_opt\AKv8\client\gaussfit.cpp ] .
27) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875111)
Posted 26 Jun 2017 by Profile jason_gee
Post:
Hmmm, AK8 branch *might be missing Joe's fix from ~2011 ? (svn posted earlier in thread):

gaussfit.cpp (stock seti_boinc branch):
report = chisqOK // chisqOK is (ChiSq <= swi.analysis_cfg.gauss_chi_sq_thresh)
&& (gi.g.peak_power >= gi.g.mean_power * swi.analysis_cfg.gauss_peak_power_thresh)
&& (gi.g.null_chisqr >= swi.analysis_cfg.gauss_null_chi_sq_thresh);
if (gaussian_count==0||report) {
gi.score = score_offset
+lcgf(0.5*gauss_dof,std::max(gi.g.chisqr*0.5*gauss_bins,0.5*gauss_dof+1))
-lcgf(0.5*null_dof,std::max(gi.g.null_chisqr*0.5*gauss_bins,0.5*null_dof+1));
}
// Only include "real" Gaussians (those meeting the chisqr threshold)
// in the best Gaussian display.
if (gi.score > best_gauss->score && chisqOK) {
*best_gauss = gi;

....


The special appears to have it, as does Cuda baseline.
28) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875107)
Posted 26 Jun 2017 by Profile jason_gee
Post:
The only way to tell for sure is to run the task with your CPU and compare the results. You should give that a try, you can run a CPU task in the benchmark App while running BOINC. Just reduce the CPU usage by One in BOINC and remove any Apps from the APPS folder in the Benchmark package. The CPU App in the REF_APPS folder will search the WU folder and run any task it doesn't have results for. The Benchmark tool is here, KWSN Linux MB Bench v2.01.08. Extract the KWSN-Bench-Linux-MBv7_v2.01.08.7z to your Home folder and run it from there.
Okay, I tried running it with the Windows CPU app that I use here on my daily driver. It almost perfectly matches the v8.22 (opencl_ati_cat132) result.

Workunit 2567983999 (20oc08aa.4777.254820.12.39.5)
Task 5794100079 (S=10, A=3, P=0, T=0, G=0, BG=0) v8.22 (opencl_ati_cat132) windows_intelx86
Task 5829376759 (S=10, A=3, P=0, T=0, G=0, BG=0) x41p_zi3v, Cuda 8.00 special

v8.22 (opencl_ati_cat132) windows_intelx86 - Best pulse: peak=0.4685673, time=98.45, period=0.01441, d_freq=1420048834.69, score=0.9218, chirp=-61.928, fft_len=8
x41p_zi3v, Cuda 8.00 special - Best pulse: peak=0.3951461, time=68.92, period=0.0147, d_freq=1420052490.23, score=0.7774, chirp=0, fft_len=8.
MB8_win_x86_SSE3_VS2008_r3330 - Best pulse: peak=0.4685681, time=98.45, period=0.01441, d_freq=1420048834.69, score=0.9218, chirp=-61.928, fft_len=8


Are you able to cross compare that with Cuda Baseline? It'll narrow down where to look once I get to the special code.

[Edit:] Which branch is that MB8 derived from ? Stock seti_boinc master ? or AKv8 ? The difference may be important here.
[which one(s) differ to reference Windows/x86 8.00 may point in the right directions]
29) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875076)
Posted 25 Jun 2017 by Profile jason_gee
Post:
...All the reported signals and Best signals seem to match between the two.


If implemented as I picture: For the pulse mechanism shunt/workaround, the stderr.txt 'realtime' log might see the racing pulse detections, then shunt to unroll 1 to record the correct ones. If that's the case, it does reflect reality in the new 'racey-fixey' kindof way, but may need to be presented more clearly.
Ah, so perhaps the actual Result file would contain a different Best Pulse value than the Stderr shows?


*possible*, making a lot of assumptions there. Naturally the result file is the important one. Probably prior assumptions about processing vs printing order become somewhat muddy as parallelism and reprocessing is involved, while stderr is sequential. Something that will have to no doubt be de-confused as we go along.
30) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1875057)
Posted 25 Jun 2017 by Profile jason_gee
Post:
...All the reported signals and Best signals seem to match between the two.


If implemented as I picture: For the pulse mechanism shunt/workaround, the stderr.txt 'realtime' log might see the racing pulse detections, then shunt to unroll 1 to record the correct ones. If that's the case, it does reflect reality in the new 'racey-fixey' kindof way, but may need to be presented more clearly.
31) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874806)
Posted 24 Jun 2017 by Profile jason_gee
Post:
Have Verified Cuda baseline matches Joe Segur's Bugfix/changes to stock best scoring
Revision: 1146
Author: korpela
Date: Wednesday, 17 August 2011 7:41:35 AM
Message:
- Fix to bug introduced in last change of gaussfit.cpp. Much of the new code
is from the AK8 branch.
- Version number to 6.97

----
Modified : /branches/sah_v7/seti_boinc/client/gaussfit.cpp
...


Coming from Baseline Petri's special should match that logic (to check). The original modification this fixes, by Joe Segur, contains comments by Raistmer, and appears to be from an AK commit by Raistmer (also committed via Eric) sometime before. I'm unfamiliar with the intent, as mentioned, though Joe's comments on the code seem reasonable.

// Gauss score used for "best of" and graphics.
// This score is now set to be based upon the probability that a signal
// would occur due to noise and the probability that it is shaped like
// a Gaussian (normalized to 0 at thresholds). Thanks to Tetsuji for
// making me think about this. The Gaussian has 62 degrees of freedom and
// the null hypothesis has 63 degrees of freedom when gauss_pot_length=64;
//JWS: Calculate invariant terms once, ala Alex Kan and Przemyslaw Zych

Probably each branch will need checking, in case a lack of this change propagated from AK sources into other builds.
32) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874797)
Posted 24 Jun 2017 by Profile jason_gee
Post:

...
One thing that I think I'm noticing is that when there is a reported Gaussian, that peak will match the Best Gaussian peak in SoG. However, in the other apps, the Best Gaussian will have a higher peak than the reported Gaussian. Perhaps there's some significance there. Or perhaps not. :^)
Hmmm, I am seeing svn commits after January on stock CPU multibeam, though nothing immediately stands out as affecting best/reportable policy. Will do some trawling of the codebases.

[Edit:] Superficially looks like certain incompatible changes to Gaussian best reporting were integrated circa 2011, By Eric from AK code, then 'bugfixed' a revision later. So will need more digging, but there could be as many as 3 different best gaussian reporting variants floating about. Will check the baseline and special CUDA variants against the 'bugfixed' variant logic. Ideally we'd want to match stock CPU logic, therefore if any number of applications require updates, will probably have to happen. Probably the actual intent of the change will need to be looked at, as it may possibly have holes in it. At the moment it appears as though the reportable gaussian should not be being used to update best if it isn't very gaussian-ey, though the intent is unclear to me on the first pass (a possible red flag)
33) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874681)
Posted 23 Jun 2017 by Profile jason_gee
Post:
Does x41p_zi3v contain bugfix for <wrong Pulse selecting as reportable> issue?


I could be corrected, but I believe it contains the shunt/workaround I recommended to serialise the race condition, though Petri implemented it and I haven't had a chance to examine it. Word is that it worked though, so validation characteristic should be more or less identical to Older Cuda variants.
34) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874436)
Posted 22 Jun 2017 by Profile jason_gee
Post:
Ugh, well that's a starker demonstration than I intended, while engineering in the truth. We've lost people recently.
35) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874382)
Posted 22 Jun 2017 by Profile jason_gee
Post:
It could be as Jason theorized here; https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1874104#1874104
Those seem like pretty low peaks to start with for best Gaussian. 1 in 300 with contention deep in the noise floor 'Feels' as though we're pushing technology limits (once again), but it will warrant more definite understanding either way. Plotting the PoT data from the results and visually comparing if they look anything alike might say something. My suspicion is they won't look very 'Gaussiany' at all. If so, pushing further into the noisefloor, while possible, may be fruitless. Eric's ruled out that we need double-precision or bit-Identical results below reportable thresholds (in the case of Gaussians, iirc score derived from the ChiSq Fit and null hypothesis).

Or it could be something else. I'm currently running the same WUs with the older OpenCL App MBv8r3567, which doesn't use the nVidia SoG path, and the newer MBv8r3602 from Lunatics. So far r3602 is batting .33% while r3567 is batting 1000%. Seems r3567 is a little slower though, but it is producing the correct Best Gaussians.
Interesting.
oops, r3602 just failed another one...

In any event, seeing as how it takes Hundreds of tasks to find one bad Best Gaussian it's well within the Project's Goal of less than 5% Inconclusive. 5% would allow 5 Inconclusives per 100 tasks.


Some history, just clarifying the origins of that 5% target. Back in ~v5 days, inconclusives were upwards of 20%. ~v6 ~10% as GPU apps came in, now <5% with v7.

That was due to a combination of stock CPU apps (then 32 bit only) using x87 FPU only, which are 80-bit internal, other KWSN and Alex Kan (Mac) builds using SIMD ( MMX, SSE through SSSE3). With v6, Joe Segur injecting KWSN and AK via Lunatics into stock CPU. For v7 I performed several numerical analyses of the algorithms in Matlab, mostly while attempting to devise a GPU form of the autocorrelation in v7 (which didn't previously exist).

The Cuda numbers actually came out more accurate, due to the way certain sums were calculated, but differing enough that something needed to be done to make stock 64 bit and cross platform (e.g. android science app, which didn;t exist yet either) more viable in terms of cross platform match, less error growth as the workunit analysis parameters were widened and later GBT added. So With Eric's permission I changed some stock sums to block sums (which are similar to Cuda blocked and AKv8 SSEx Striped summations), so pulling the results about 3-6 decimal places closer together.

Bearing in mind the platforms/devices all have different compilers, use different algorithms, and the vagaries inherent in floating point computation, the 5% is chosen as target (by me) because that's where Eric tends to set thresholds for the analysis, such that ~5% of results return overflow. That's where the analysis+(telescope)recording noise floor would be, so any better than 5% cross platform match we're pretty much digging into technological limitations outside our control (with existing Fourier method anyway).

At some point, I forget when, the improved cross platform matches and application reliability across the board allowed the workunit initial replication to be reduced from 3 to 2, reducing server load by a third.

On the basis of all that, If other various classes of builds/devices see much worse than 5% inconclusives, then they're not up to par, while at the same time Eric's assertions that additional precision shouldn't be needed suggest <5% is 'good enough' statistically. (Which I'm fine with, because much more tightening toward bit exact cross platform would be extremely expensive (development time, money and computationally) and lead to de-optimisation.

I'm mostly writing this out, because at some point I'll need to add explanations to stock documentation, because the temptation to optimise out the stock summing refinements may be a trap for future developers [Saved for when I revisit the stock codebase, at some point Astropulse should undergo a similar analysis, though it's beyond my resources at present].
36) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874209)
Posted 21 Jun 2017 by Profile jason_gee
Post:
Oh yeah, no illusions there. Have prepared for probable extended downtime by stocking up on the mobile data. Switching over to a completely new infrastructure will have teething problems.
37) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874189)
Posted 21 Jun 2017 by Profile jason_gee
Post:
OK. Will be some (yet more, *sigh*) juggling here, as better broadband arrives tomorrow, a month sooner than expected. Teething problems are likely, though will factor in setting aside some hosting space for various and sundry as the dust settles. Hopefully things get less difficult as time goes on.
38) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874138)
Posted 20 Jun 2017 by Profile jason_gee
Post:
Oh well. It appears since moving to the new server I'm not allowed to upload. All I get is;
403
Forbidden
Access to this resource on the server is denied!
So....you'll have to wait.


Contact Arkayn, as he did message about the ftp server shifting a week or so ago. If problems still by the weekend, let me know and I'll put it to jgopt.org (if you email the binaries). Once proper broadband is installed here in a month or so, I'll be rehosting my own domain to in my living room, so there will be juggling nomatter which of those works, but you could share it on Google Drive or similar in .7z form or similar if you're stuck.
39) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874114)
Posted 20 Jun 2017 by Profile jason_gee
Post:
Just a comment, I think this may deserve it's own thread since it appears this is leaning towards an SoG issue more than CUDA.

Say What? It's about proving the CUDA App should be accepted to Beta as it agrees with the CPU Apps Better than the current 'standard'.
I just checked...it is My name on the thread.


It's going to be pretty important to discuss these things here, because the special is the new kid on the block, with the most changes in quite a while. If it turns out the SoG app needs some attention, then that's a good thing, because it solidifies knowledge all around. My unconfirmed suspicion is that the SoG app may still be using an OpenCL derivation of the single precision chirp I made for Pre-Fermi (CUDA), which was tailored for unique Pre-Fermi characteristics, namely that Pre-Fermi Cuda devices don't have IEEE-754 floating point compliance, and in Pre-GTX2xx cases no double precision at all, therefore it won't necessarily compile to the most accurate GPU code on Fermi or later devices. It was made specifically for G80 type devices. I switched to double precision chirp for newer devices many moons ago. So under the hood there is valuable history to take into account, and probably should be properly documented one day.

As for 'Allowing on Beta' I'd concur the special needs to be run extensively under anonymous platform, so will be aiming for some trial builds ASAP. As previously mentioned, stock distribution is problematic because of Boinc server limitations more or less demanding a quite generally compatible app. There may be a driver version cuttoff where Pre-Fermi or Fermi class drop support, though generalising to even Kepler class onwards will still require significant work.
40) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1874104)
Posted 20 Jun 2017 by Profile jason_gee
Post:
Those seem like pretty low peaks to start with for best Gaussian. 1 in 300 with contention deep in the noise floor 'Feels' as though we're pushing technology limits (once again), but it will warrant more definite understanding either way. Plotting the PoT data from the results and visually comparing if they look anything alike might say something. My suspicion is they won't look very 'Gaussiany' at all. If so, pushing further into the noisefloor, while possible, may be fruitless. Eric's ruled out that we need double-precision or bit-Identical results below reportable thresholds (in the case of Gaussians, iirc score derived from the ChiSq Fit and null hypothesis).

[Edit:] That's also at a very high chirp rate near chirp limits, so one or another application struggling with that, especially 64-bit builds, wouldn't be surprising or unacceptable. Cumulative error will be at its greatest there.


Previous 20 · Next 20


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.