Posts by Raistmer


log in
1) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1820770)
Posted 8 hours ago by Profile Raistmer
Ah, indeed. App checks that value and will not try to aquire more than runtime could give.
So, this GPU can't use more than 128M as single allocation. But it has 512Mb total so can allocate enough number of separate buffers for LoM path.
2) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1820565)
Posted 22 hours ago by Profile Raistmer
So,it just can't allow more. And older revision, non-SoG one?
3) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820500)
Posted 1 day ago by Profile Raistmer
If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard.

Hm... looks like you dont' read these forums frequently. Else you would know how long is that "patch their shit" list currently is even w/o any precision-compliance.

To elaborate little more on this:
http://setiathome.berkeley.edu/forum_thread.php?id=80247&postid=1820339
Recently testing new builds with Mike and his GPU we discovered that last build stopped to provide inconclusives being run in multiple instances... But then I looked inside stderr and found obviously bad and wrong numbers in profiling counters app prints now. They work OK in single-instance and OK in multiple-instance modes in other configs I tested (where multiple instances allowed before too). So, I think it's straight evidence that driver GPU context switching just bugged for that whole AMD GPUs family on Windows! And we talking about rare borderline inconclusives from Q99 instead of Q100 here....
4) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820492)
Posted 1 day ago by Profile Raistmer
I'd actually not call that validated at all, but we're stuck with a binary choice in the status column.

And this "feature" really hides issues making builds validation and debugging harder (though allows that damned credits receiving of course...)
5) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820489)
Posted 1 day ago by Profile Raistmer

Is anyone currently seeing systemic validator rejections because of poor precision of the correct signal, rather than selection of the wrong signal to report?

Yes, in some iGPUs as you know from beta.
6) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820485)
Posted 1 day ago by Profile Raistmer

EDIT: All this above is to get code more to the Q100 mark whatever platform/combination as possible but as a second step perhaps but as we've noticed that thing that i mention now has nothing to do with the main topic of the thread of inconclusive validations, that is Another thing ofcourse that actually needs to be fixed on Another level because i'm sure that each and every one of those applications if compared to all signals found (30+) would get the Q99+ so they most certainly would validate.


Actually, conforming IEEE754 standart in rounding will not result in Q100 mark either.

Standart just describes how rounding will be made, it can't prevent precision lost in such case for example:
A+B+C versus A+(B+C) in case where A is big number and B and C much smaller ones.
7) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820483)
Posted 1 day ago by Profile Raistmer
Now one of them validated them all!

http://setiathome.berkeley.edu/workunit.php?wuid=2276193382

As should be with current validator in most cases.
Nothing really interesting here. But to reduce inefficiency of re-processing validator should be changed. As I said earlier, this topic in discussion with Eric.
8) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820482)
Posted 1 day ago by Profile Raistmer
If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard.

Hm... looks like you dont' read these forums frequently. Else you would know how long is that "patch their shit" list currently is even w/o any precision-compliance.

Well, currently we have 2 platforms with real precision issues:
OpenCL NV + OS X of modern version;
OpenCL Intel + some (still not known exactly those) devices and drivers.

OS X out of my scope, but I could do some experiments with iGPU builds regarding /fp:* switches.
I think better to make experimental proofs versus plain discussions.
9) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820478)
Posted 1 day ago by Profile Raistmer
There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow..


Isn't you all using fp:precise (single) i just wanted to ask what happens if app is compiled and tested with fp:strict (single) instead! What is the speed penalty of going precise(single) to strict(single)??

No, all my builds use /fp:fast for example as always was with AKv8 derivatives AFAIK.
Out of interest I could provide you builds for comparison.
Recently found inefficiency in CPU pulsefinding makes CPU apps rebuild worthwhile.


The purpose in my mind was an overall platform standard that should follow IEEE754 regardless of cpu, x32 x64 arm, gpu.

This rules out IEE754-incompatible devices w/o any real need to doing so.


The idea of me telling you to test for that direction is mainly for you all to switch more to code optimising instead of bughunting various platforms until hell freezes over. It will only increase as i say not decrease.

Unjustified idealization here. Most of bugs hunting (except own bugs of course) coming from non-complying runtimes. If runtime doesn't comply with standart stricly following standart will not help, just make debugging even more obscure.
10) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820475)
Posted 1 day ago by Profile Raistmer

Well what can I say, Opposite was true for Cuda builds and I guess the host code is different. It was Richard that brought flaky Gaussians to my attention, host fp:precise fixed them against 8.00, and no repeatable dissimilarity to 8.00 CPU has been reported to me since.

Yes, host code is different indeed.
So, I'll reformulate regarding fp:precise: it's not an universal solution to fix precision-related issues.
And of course, not a solution at all for different ordering issue on overflows (as already stated in this thread).
11) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820474)
Posted 1 day ago by Profile Raistmer

Now if we go back to idea of FP:strict (IEEE754) vs anything else Double precision etc etc, Have any of you an idea of speed penalty of going strict instead of precise, double precise is? If going to fp:strict single precision (More isn't needed apparently) is a few percent slower then so be it for the sake of conformity! But if it is half the speed etc then, No that is not the route to go "for now" but instead of focusing in the validator/re-order-of-work-reported issue that seems to be apparent.

There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow.
This was mentioned in this thread before. Increasing precision is not a solution for overflow tasks.
12) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820473)
Posted 1 day ago by Profile Raistmer
I hope the people contributing to this discussion are also reading the Beta forum. Recently (Beta message 59698)

Raistmer wrote:
EDIT2: recently I looked into pulse signal selection algorithm. And it appeared more resemble AstroPulse one than I thought. It contains same PoT signal replacement too. That is, if another, more strong, signal will be found inside same PoT but on another fold level (another period) old one will be replaced by new one. Old was not reported. It's one of possible places where bug with such manifestation could hide.

That does indeed suggest that we ought to pay some attention (if we don't already) to where the '30 signal' breakpoint is invoked in both serial and parallel cases - and ensure they are compatible. Running on to the end of the current PoT - enabling replacement - before breaking and reporting would seem to be wise in both cases.

This means that separate periods for same Pot should be processed in whole and best single reportable signal should be chosen amongst all results.
This part never will result in overflow, it's about missing correct reportable pulse (as Petri's build demonstrated for beta task in discussion).
Such bugs should be fixed of course cause they result in wrong signals report for all types of tasks.
13) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820471)
Posted 1 day ago by Profile Raistmer
I think we would need to find a way of reporting the 'serial first 30' signals from a parallel application. That might involve choosing an intermediate point - 30%? 50%? - after which the parallel app would continue to the end, find all signals, and sort out the reportable ones. All of which is much easier to suggest than to implement...

Strongly disagree. Effort is worthless. (And very resource-costly to implement by my current estimates).
14) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820469)
Posted 1 day ago by Profile Raistmer

But to solve this by doing a "find all and sort them afterwards" would mean that every task would have to run to full term, and we'd lose the efficiency of quitting early after 10 seconds or so for the really noisy WUs.


Well if we lose efficiency of quitting early why should validator even "validate" -9 work when the server code could see .. "Ohh geez this is a overflow result! Thanks! Here is your credits!" if compared to other -9s

If the device sends a -9 result back but the other application sees this as a real result then you should be awarded zero credits anyway.

Modifications of validator for overflows currently in discussion with Berkeley's team (look beta forums for example).
15) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820467)
Posted 1 day ago by Profile Raistmer

regarding fp:precise usage for CUDA :
in host or device code?


Host only.

With MSVC builds (both CPU and OpenCL GPU) specifying /fp:precise for host code leads to difference in results with stock. It's the topic of initial 7.99/8.0 deployment precision issue where stock disagreed between own Linux/Windows x64/x86 builds. Was discussed in details in E-mail conversations at those times.
16) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820444)
Posted 1 day ago by Profile Raistmer
Regarding fp:strict : before going further estimate performance penalty of this option enabled.
Don't forget that doing double precision math generally gives more precision. Doing arbitrary-precision calculations is possible too... but just not suits our needs.
regarding fp:precise usage for CUDA :
in host or device code?
17) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820412)
Posted 1 day ago by Profile Raistmer
fp: precise leads to inconclusive results vs stock.
Better to forget about fp:precise completely.
This is non-portable feature of CPU.
18) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1820410)
Posted 1 day ago by Profile Raistmer
Yes.
And well, it means app considered enough memory to chose that path.
BTW, try to experiment if selection works properly - increase -sbs value - will you see switch to non-LoM path?
19) Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread (Message 1820396)
Posted 1 day ago by Profile Raistmer
r3528 is SoG while r3500 isn't.
"Lot of mem" path has meaning only for SoG.



. . Now I am confused. If r3500 is not SoG what is it??

Stephen

.

It's non-SoG AMD build:

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY OCL_CHIRP3 FFTW AMD specific USE_SSE2 x86
CPUID: AMD Athlon(tm) II X3 455 Processor

Cache: L1=64K L2=512K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSE4A
OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl
ar=0.429339 NumCfft=195489 NumGauss=1101398310 NumPulse=226362612175 NumTriplet=452728333399
Currently allocated 209 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768

Windows optimized setiathome_v8 application
Based on Intel, Core 2-optimized v8-nographics V5.13 by Alex Kan
SSE2xj Win32 Build 3500 , Ported by : Raistmer, JDWhale

SETI8 update by Raistmer

OpenCL version by Raistmer, r3500

AMD HD5 version by Raistmer
20) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1820391)
Posted 1 day ago by Profile Raistmer

Task 5164467009 (S=10, A=0, P=19, T=1, G=0) SSE3xj Win32 Build 3500

Wrong marking. It's not SSE3 CPU, it's OpenCL NV too.


Next 20

Copyright © 2016 University of California