Monitoring inconclusive GBT validations and harvesting data for testing

Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 22 · 23 · 24 · 25 · 26 · 27 · 28 . . . 36 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1823256 - Posted: 10 Oct 2016, 11:25:43 UTC - in response to Message 1823254.  
Last modified: 10 Oct 2016, 11:31:31 UTC

Wiki's Advanced Vector Extensions page says that for x86, FMA only became available with AVX2 - as your intel blog reply already told us.

But again, hardly that x86 is what that used on iGPU.

Hehe...
https://software.intel.com/en-us/forums/opencl/topic/277001
Unfortunately the offline compiler only displays CPU asm and we do not currently expose the graphics ISA from this tool, even though the ISA is available as part of the linux graphics documentation (http://intellinuxgraphics.org/documentation.html).


https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol02a-commandreference-instructions.pdf
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1823256 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1823261 - Posted: 10 Oct 2016, 11:38:10 UTC - in response to Message 1823256.  
Last modified: 10 Oct 2016, 12:05:39 UTC

Multiply Add
mad - Multiply Add
Source: EuIsa
Length Bias: 4
The mad instruction takes component-wise multiplication of src1 and src2, adds the results with the
corresponding src0 values, and then stores the final results in dst.
The conditional modifier and saturation (.sat) must not be used when src1 or src2 are dwords.
Format:
[(pred)] mad[.cmod] (exec_size) dst src0 src1 src2
Restriction
No explicit accumulator access because this is a three-source instruction. AccWrEn is allowed for implicitly
updating the accumulator.
All three-source instructions have certain restrictions, described in Instruction Formats.


Multiply Accumulate
mac - Multiply Accumulate
Source: EuIsa
Length Bias: 4
The mac instruction takes component-wise multiplication of src0 and src1, adds the results with the
corresponding accumulator values, and then stores the final results in dst.
Format:
[(pred)] mac[.cmod] (exec_size) dst src0 src1
Programming Notes
When source and destination datatypes are different, the implied datatype for the accumulator operand is
always the destination datatype.
Restriction
Accumulator is an implicit source and thus cannot be an explicit source operand.
Syntax
[(pred)] mac[.cmod] (exec_size) reg reg reg [(pred)] mac[.cmod] (exec_size) reg reg imm32
Pseudocode
Evaluate(WrEn);
for ( n = 0; n < exec_size; n++ ) {
if ( WrEn.chan[n] ) {
dst.chan[n] = src0.chan[n] * src1.chan[n] + acc0.chan[n];
}
}


Multiply Add for Macro
madm - Multiply Add for Macro
Source: EuIsa
Length Bias: 4
The madm instruction takes component-wise multiplication of src1 and src2, adds the results with the
corresponding src0 values, and then stores the final results in dst. The source and destination operands have a
higher precision carried in the exponent for this operation. The madm instruction is used for macro operations,
where precision is accumulated over several instructions. This accumulation requires the exponent to increase
by 2 extra bits across multiple madm operations. Refer to Macros Defined in 'Math' Section for usage and
restrictions of this operation.
Format:
[(pred)] madm[.cmod] (exec_size) dst src0 src1 src2
Restriction
Accumulator access is restricted to the sp


As one say черт ногу сломит :D

And seems no FMA per se, BTW.

And definition of MAD doesn't discuss any precision considerations. Pseudocode is simple: dst.chan[n] = src1.chan[n] * src2.chan[n] + src0.chan[n];
how rounding occurs - not specified.
Bravo, Intel's manual writers, decades of development did not vanish... :P

EDIT2: Well, my timeslice for iGPU finished. I found no discussion of precision of iGPU. If someone find it please give the reference.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1823261 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1823266 - Posted: 10 Oct 2016, 12:11:39 UTC - in response to Message 1823248.  

So try test iGPU build.

8 cooking on beta using 8.19 app as we speak. Will take a couple of hours for them to be done. All are Arecibo at the moment, no sign of guppis.

Tasks
BOINC blog
ID: 1823266 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1823268 - Posted: 10 Oct 2016, 12:21:01 UTC - in response to Message 1823266.  

So try test iGPU build.

8 cooking on beta using 8.19 app as we speak. Will take a couple of hours for them to be done. All are Arecibo at the moment, no sign of guppis.

Tasks

beta app will produce inconclusives (but hceck this as baseline).
Then check test app: https://cloud.mail.ru/public/2aUP/dborYAw9G
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1823268 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1823269 - Posted: 10 Oct 2016, 12:22:42 UTC
Last modified: 10 Oct 2016, 12:24:41 UTC

And for reference: https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-hsw-commandreference-instructions_0_0.pdf seems Haswell has lower number of MADs (only accumulate one)
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1823269 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1823271 - Posted: 10 Oct 2016, 12:24:40 UTC - in response to Message 1823242.  

...
I think I'm reading that as Intel saying that the 'mad' optimisation happens automatically in the newer compilers, without any option?
...


With Cuda code (for comparison only), the instructions resolve to either MADs, MAFSs or Adds and Muls depending on architecture, yielding different results, so there are some similarities in the situation. It isn't as sensitive though, because CUFFT library is used instead of self compiled OCLFFT.

The way around this in Cuda is relatively simple, using the example
Answer_mul = float0 * float1;
Answer_add = Answer_mul + float2;

it becomes wired by hand as corresponding intrinsics in sensitive places, that generate explicit instructions, or Inline PTX assembly, which the compiler cannot optimise or change.

The OpenCL situation may be murkier, with its wider range of hardware. I would have expected Intel to have provided some math.h or similar in their SDK, with either intrinsic functions, override switch of MADs, or vendor extensions with assembly... but not something I've looked at directly for the Intel case, due to bot actively running such a GPU.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1823271 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1823272 - Posted: 10 Oct 2016, 12:30:50 UTC - in response to Message 1823271.  

And that oclFFT (we share with Einstein btw) uses mad() in code generator.
Well, I think time to look for Skylake results from modded binary I posted.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1823272 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1823274 - Posted: 10 Oct 2016, 12:42:04 UTC

Just for comparison with Intel: how MAD description sounds for HD6900:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf
Floating-Point Multiply-Add
Instruction MULADD
Description Floating-point multiply-add (MAD). Gives same results as ADD after MUL.
dst = src0 * src1 + src2;
Microcode
Format ALU_WORD0 (page 9-23) and ALU_WORD1_OP3 (page 9-32).
Instruction Field ALU_INST == OP3_INST_MULADD, opcode 20 (0x14).

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1823274 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1823296 - Posted: 10 Oct 2016, 15:01:17 UTC - in response to Message 1823243.  

I would prefer to find original Intel's thread about this issue.
Asked on Einstein's site about that already.

Christian has replied
It was a direct mail exchange with an Intel developer where I got the explanation from.
ID: 1823296 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1823341 - Posted: 10 Oct 2016, 17:17:45 UTC - in response to Message 1823296.  

I would prefer to find original Intel's thread about this issue.
Asked on Einstein's site about that already.

Christian has replied
It was a direct mail exchange with an Intel developer where I got the explanation from.

So awaiting results from new build.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1823341 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1823415 - Posted: 10 Oct 2016, 21:20:02 UTC - in response to Message 1823246.  

https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/mad.html
https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/fma.html


So, OpenCL specification distinguishes between these instruction indeed.
MAD and FMA are different.
BUT(!) they should be specified directly in code to be used, a*b+c will not be replaced automatically.


There's another post in Einstein forums, Intel GPU brp app returns incorect results with beignet 1.2 drivers.

In Beignet 1.2 FP_CONTRACT was switched to ON and the code generated for x*y+z was changed from MUL+ADD to MAD. (commit)

If I'm reading FP_CONTRACT documentation correctly it seems that implementations are supposed to use fused instructions unless told otherwise. What it doesn't say is whether FMA or MAD should be used, but since there's --cl_mad_enable compiler option I suppose FMA should be used.

I can imagine Windows drivers have made similar change earlier.
ID: 1823415 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1823428 - Posted: 10 Oct 2016, 22:32:51 UTC - in response to Message 1823415.  
Last modified: 10 Oct 2016, 22:42:58 UTC

Not sure that contract expression means using mad/fma.
It's more generalized term than just MAD instead of ab+c substitution.
AMD docs directly state that such replacement will not occur.

But (as usual) worth to try.
When I will get feedback from already provided build I could try to disable this pragma too.

Also, would be interesting to add to CLinfo printing of FMA status.
What we will see for different devices/platforms?...

EDIT: from comitted code looks like they map mad to hardware mad now instead of emulating it via mul and add.
Still it doesn't imply silent replacement of ab+c to mad(a,b,c) but also it will change behavior of mad(a,b,c) call and as I said earlier oclFFT heavely using mad.

The question to Intel is: why their mad so imprecise versus 2 other vendors??
(BTW, iGPU imprecision in native trigonometry was demonstrated by Einstein's team before, in oclFFT. )
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1823428 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1823482 - Posted: 11 Oct 2016, 3:35:21 UTC - in response to Message 1823428.  
Last modified: 11 Oct 2016, 3:39:29 UTC

The question to Intel is: why their mad so imprecise versus 2 other vendors?
can be as simple as a float implemented as double half float hardware emulation sequence, or some other shortcut. Maybe they even use something like x87 80 bit intermediate registers underneath, or blocks of pentium circuits with fdiv bugs (j/k)

That aside, using fma's etc changes algorithms, and error growth. So you'll see different codelets even in fftw CPU sources to compensate. We don't completely escape problems on Cuda either, especially from Pre-compute capability 1.3 not having doubles, nor IEEE 754 compliance, and fma not coming until much later. We escape a lot though, because of CUFFT hard wired paths, and we use a fair whack of intrinsics already (more assembly gradually)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1823482 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1823524 - Posted: 11 Oct 2016, 11:47:36 UTC

When I will get feedback from already provided build I could try to disable this pragma too.

Sorry for the delay. Work intervened.

Another set running using new app. Same hosts as before. I also snagged some guppies this time.
BOINC blog
ID: 1823524 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1823698 - Posted: 12 Oct 2016, 11:13:49 UTC - in response to Message 1823524.  

When I will get feedback from already provided build I could try to disable this pragma too.

Sorry for the delay. Work intervened.

Another set running using new app. Same hosts as before. I also snagged some guppies this time.

Looking through the results it would seem v8.19 supplied by beta validate most of the time. The r3525's are almost all inconclusive or invalid.
BOINC blog
ID: 1823698 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1823790 - Posted: 12 Oct 2016, 19:40:23 UTC - in response to Message 1823428.  

EDIT: from comitted code looks like they map mad to hardware mad now instead of emulating it via mul and add.
Still it doesn't imply silent replacement of ab+c to mad(a,b,c) but also it will change behavior of mad(a,b,c) call and as I said earlier oclFFT heavely using mad.


I could be mistaken but I think that is really what it now does. LLVM uses fmuladd to let code generator decide between using mul+add or fma. llvm.fmuladd
ID: 1823790 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1823795 - Posted: 12 Oct 2016, 20:04:17 UTC - in response to Message 1823790.  

EDIT: from comitted code looks like they map mad to hardware mad now instead of emulating it via mul and add.
Still it doesn't imply silent replacement of ab+c to mad(a,b,c) but also it will change behavior of mad(a,b,c) call and as I said earlier oclFFT heavely using mad.


I could be mistaken but I think that is really what it now does. LLVM uses fmuladd to let code generator decide between using mul+add or fma. llvm.fmuladd

Thanks, perhaps you are right.
That means detailed definition regarding precision behavior is required for iGPU MAD/MAC/"macro MAD".

I'll try to disable corresponding macro in code. Will see if it help.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1823795 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1824740 - Posted: 16 Oct 2016, 19:49:42 UTC - in response to Message 1823795.  

New build: https://cloud.mail.ru/public/EbPU/q7ZKhRnYV
More details on beta: https://setiweb.ssl.berkeley.edu/beta//forum_thread.php?id=2266&postid=59828
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1824740 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1824975 - Posted: 17 Oct 2016, 18:27:20 UTC

FWIW, I just had an overflow task running SoG r3528 get ganged up on by a pair of x41p_zi3j Petri Specials. It was really an extreme case where my host found 30 Pulses while the two Special hosts found 30 Triplets. The WU is 2295032503, although it's now too late to grab the file since I didn't spot the Inconclusive before the second Special host reported.
ID: 1824975 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1825013 - Posted: 17 Oct 2016, 21:10:30 UTC - in response to Message 1824975.  
Last modified: 17 Oct 2016, 21:12:10 UTC

FWIW, I just had an overflow task running SoG r3528 get ganged up on by a pair of x41p_zi3j Petri Specials. It was really an extreme case where my host found 30 Pulses while the two Special hosts found 30 Triplets. The WU is 2295032503, although it's now too late to grab the file since I didn't spot the Inconclusive before the second Special host reported.


Someone else could say this: I think it is a 'bad' packet having noisy data. This time it was reported as 'bad' by a different version of software looking into something else before looking into something different but still something 'broken'.

EDIT: and each time it could still be something, although probably noise.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1825013 · Report as offensive
Previous · 1 . . . 22 · 23 · 24 · 25 · 26 · 27 · 28 . . . 36 · Next

Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.