Which result is correct, CPU, anonymous CUDA, or v6.09 (cuda23)

Message boards : Number crunching : Which result is correct, CPU, anonymous CUDA, or v6.09 (cuda23)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1093599 - Posted: 5 Apr 2011, 13:22:05 UTC
Last modified: 5 Apr 2011, 13:25:22 UTC

WU
Result .

Which is the correct one, the CPU with 1 Gaussian or the 2 GPU's, which didn't find anything ???
ID: 1093599 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1093618 - Posted: 5 Apr 2011, 14:58:27 UTC - in response to Message 1093599.  
Last modified: 5 Apr 2011, 15:02:19 UTC

WU
Result .

Which is the correct one, the CPU with 1 Gaussian or the 2 GPU's, which didn't find anything ???

Whether that is the "correct" result or not, the canonical result, which will go in the science database, was awarded to the cuda23 machine. But cpu result was sufficiently similar to the two gpu results to grant credit to all three results.
Donald
Infernal Optimist / Submariner, retired
ID: 1093618 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34822
Credit: 261,360,520
RAC: 489
Australia
Message 1093624 - Posted: 5 Apr 2011, 15:03:52 UTC - in response to Message 1093599.  

Personally I feel that the CUDA 1's are as the CPU had done several restarts in its log, but this could be factored in (who know's these days).

Cheers.
ID: 1093624 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1093630 - Posted: 5 Apr 2011, 15:12:27 UTC - in response to Message 1093624.  

Personally I feel that the CUDA 1's are as the CPU had done several restarts in its log, but this could be factored in (who know's these days).

Cheers.

Yes, it's difficult to say for certain, since the stderr output is only a summary of the actual result, but since Fred's Lunatics Opt. App result seems to match the canonical result, those two are most likely right.
Donald
Infernal Optimist / Submariner, retired
ID: 1093630 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1093795 - Posted: 6 Apr 2011, 4:14:36 UTC - in response to Message 1093630.  
Last modified: 6 Apr 2011, 4:28:09 UTC

Personally I feel that the CUDA 1's are as the CPU had done several restarts in its log, but this could be factored in (who know's these days).

Cheers.

Yes, it's difficult to say for certain, since the stderr output is only a summary of the actual result, but since Fred's Lunatics Opt. App result seems to match the canonical result, those two are most likely right.


I have been examining precision/validation charecteristics lately in quite some detail, as a means to establish what types & level of error are acceptable, normal or expected in cross platform/different application/architecture combinations while in the process of optimisation. That's to aid in attempting to avoid unintentionally making inconclusives more common while making fairly radical structural changes inside applications.

What we actually have here is an example of a combination of architecture differences, 'normal' subtle floating point arithmetic variation, and original algorithm/code limitations.

In cases like this the validation process is designed to be quite tolerant of these more subtle forms of error. For me personally that doesn't lessen the desire to refine things in such a a way that inconclusives are minimised where feasible. From an efficiency standpoint, I do consider that as part of optimisation to some extent.

Here, In essence, all three results 'agreed' sufficiently to say all three results are 'correct'. When you have individual results disagreeing by one or a few signals, in many cases it does relate to one of two known to me (there could be more yet) & closely related precision differences between host results:

1) The signal(s) are very close to threshold: SetiAtHome uses fixed thresholds which mandates a go-nogo kind of result. Given that floating point representations are not 'exact' as is the common misconception, various sorts of cummulative error in the code, respond as dictated by the choices of algorithm and the hardware. That variation between implementations can be minimised by attempting to stick with some standards compliant arithmetic instructions, combined with sensible choices of algorithm that minimise error. Increasing precision does not tend to help those characteristics if the two mentioned strategies aren't considered and fixed thresholds are used, as a varying in the signal peak power by as little as 'machine epsilon' (a very small number) can be enough to determine whether the signal is above threshold or not. This is analogous to the more familiar 'aliasing' artefacts you see in computer graphics. Use of complicated anti-aliasing strategies is possible if the threshold mechanism were to be redisgned, but really that is what validators already do via 'inconclusives' & reissues pertaining to signals near threshold anyway.

2) The Chirping used to de-doppler shift potential signals is a high precision computation: Since this process effects all signals to some degree, different hardware implementation, compilers & even machine state or code bug differences can introduce similar aggravating numerical varaiation around threshold as described in #1.

So once again, all three results were 'correct', as the validator says so, and it's likely that CPU<->GPU inconclusives will lower in number as application refinements are introduced going into V7, though they will never be eliminated while fixed absolute thresholds are used. I don't see a particular need to go to any more sophisticated mechanism myself, as the mechanism seems to cope well, though there are certainly suitable approaches if the project were to decide it was warranted.

HTH,
Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1093795 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1093866 - Posted: 6 Apr 2011, 10:13:21 UTC - in response to Message 1093795.  
Last modified: 6 Apr 2011, 10:22:44 UTC

Thanks Jason, for your clear explanation, as I did find this 'error' and couldn't
resist reporting it.
Since 1 of my hosts has a tendency to run too hot, I do check it's results, when I came back I found this host 'locked' and had to reBOOT it, (probably?) termal-shutdown, or the 470 getting too hot.

Still busy with a new host with an i7-2600 and 2
EAH5870 GPU's, and going to run SETI MB and AP WU, and a "test" on MW, in
order to see what amount of heat will build up and trying to control this,
you may wish me luck on this one, it takes me hours, days, having 1 useable eye,
to put it all together.

ID: 1093866 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1093889 - Posted: 6 Apr 2011, 13:57:43 UTC - in response to Message 1093866.  
Last modified: 6 Apr 2011, 14:01:40 UTC

Since 1 of my hosts has a tendency to run too hot, I do check it's results, when I came back I found this host 'locked' and had to reBOOT it, (probably?) termal-shutdown, or the 470 getting too hot.


GF100 die can comfortably run 24x7 at 91 degrees celcius. If it's not stable but below that temperature, then upping the graphics core voltage can help. Another factor with these is having a nice clean power supply, and keeping the frequencies & memory below those that will cause result erroring out. A good sign that something needs attention is if the GPU/Driver automatically downclocks to protect itself. IIRC the memory controller in the 480 & 470 was nVidia's first attempt at GDDR5, so has limitations below current models.

When it comes to cooling I've gradually come to respect the art of overkill, and intend to move my 480 to water cooling. I actually like the sound of the 480 delta fan above 90%, as I find it drowns out street noise. I'll probably have to rig up some alternative noise suppresion scheme when I do that.

Current pre-alpha 'poking around' application test builds do reportedly raise the power by about 20 Watts per GTX 480 so far, probably will be something similar for 470, which is something you might want to keep in mind for preparing your cooling & power for the future. Probably better in the long run to back things off a little rather than run right on the knife edge, and plan for some kindof overkill approach.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1093889 · Report as offensive

Message boards : Number crunching : Which result is correct, CPU, anonymous CUDA, or v6.09 (cuda23)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.