Message boards :
Number crunching :
Which result is correct, CPU, anonymous CUDA, or v6.09 (cuda23)
Message board moderation
Author | Message |
---|---|
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
|
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
WU Whether that is the "correct" result or not, the canonical result, which will go in the science database, was awarded to the cuda23 machine. But cpu result was sufficiently similar to the two gpu results to grant credit to all three results. Donald Infernal Optimist / Submariner, retired |
Wiggo Send message Joined: 24 Jan 00 Posts: 34822 Credit: 261,360,520 RAC: 489 |
Personally I feel that the CUDA 1's are as the CPU had done several restarts in its log, but this could be factored in (who know's these days). Cheers. |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
Personally I feel that the CUDA 1's are as the CPU had done several restarts in its log, but this could be factored in (who know's these days). Yes, it's difficult to say for certain, since the stderr output is only a summary of the actual result, but since Fred's Lunatics Opt. App result seems to match the canonical result, those two are most likely right. Donald Infernal Optimist / Submariner, retired |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Personally I feel that the CUDA 1's are as the CPU had done several restarts in its log, but this could be factored in (who know's these days). I have been examining precision/validation charecteristics lately in quite some detail, as a means to establish what types & level of error are acceptable, normal or expected in cross platform/different application/architecture combinations while in the process of optimisation. That's to aid in attempting to avoid unintentionally making inconclusives more common while making fairly radical structural changes inside applications. What we actually have here is an example of a combination of architecture differences, 'normal' subtle floating point arithmetic variation, and original algorithm/code limitations. In cases like this the validation process is designed to be quite tolerant of these more subtle forms of error. For me personally that doesn't lessen the desire to refine things in such a a way that inconclusives are minimised where feasible. From an efficiency standpoint, I do consider that as part of optimisation to some extent. Here, In essence, all three results 'agreed' sufficiently to say all three results are 'correct'. When you have individual results disagreeing by one or a few signals, in many cases it does relate to one of two known to me (there could be more yet) & closely related precision differences between host results: 1) The signal(s) are very close to threshold: SetiAtHome uses fixed thresholds which mandates a go-nogo kind of result. Given that floating point representations are not 'exact' as is the common misconception, various sorts of cummulative error in the code, respond as dictated by the choices of algorithm and the hardware. That variation between implementations can be minimised by attempting to stick with some standards compliant arithmetic instructions, combined with sensible choices of algorithm that minimise error. Increasing precision does not tend to help those characteristics if the two mentioned strategies aren't considered and fixed thresholds are used, as a varying in the signal peak power by as little as 'machine epsilon' (a very small number) can be enough to determine whether the signal is above threshold or not. This is analogous to the more familiar 'aliasing' artefacts you see in computer graphics. Use of complicated anti-aliasing strategies is possible if the threshold mechanism were to be redisgned, but really that is what validators already do via 'inconclusives' & reissues pertaining to signals near threshold anyway. 2) The Chirping used to de-doppler shift potential signals is a high precision computation: Since this process effects all signals to some degree, different hardware implementation, compilers & even machine state or code bug differences can introduce similar aggravating numerical varaiation around threshold as described in #1. So once again, all three results were 'correct', as the validator says so, and it's likely that CPU<->GPU inconclusives will lower in number as application refinements are introduced going into V7, though they will never be eliminated while fixed absolute thresholds are used. I don't see a particular need to go to any more sophisticated mechanism myself, as the mechanism seems to cope well, though there are certainly suitable approaches if the project were to decide it was warranted. HTH, Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Thanks Jason, for your clear explanation, as I did find this 'error' and couldn't resist reporting it. Since 1 of my hosts has a tendency to run too hot, I do check it's results, when I came back I found this host 'locked' and had to reBOOT it, (probably?) termal-shutdown, or the 470 getting too hot. Still busy with a new host with an i7-2600 and 2 EAH5870 GPU's, and going to run SETI MB and AP WU, and a "test" on MW, in order to see what amount of heat will build up and trying to control this, you may wish me luck on this one, it takes me hours, days, having 1 useable eye, to put it all together. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Since 1 of my hosts has a tendency to run too hot, I do check it's results, when I came back I found this host 'locked' and had to reBOOT it, (probably?) termal-shutdown, or the 470 getting too hot. GF100 die can comfortably run 24x7 at 91 degrees celcius. If it's not stable but below that temperature, then upping the graphics core voltage can help. Another factor with these is having a nice clean power supply, and keeping the frequencies & memory below those that will cause result erroring out. A good sign that something needs attention is if the GPU/Driver automatically downclocks to protect itself. IIRC the memory controller in the 480 & 470 was nVidia's first attempt at GDDR5, so has limitations below current models. When it comes to cooling I've gradually come to respect the art of overkill, and intend to move my 480 to water cooling. I actually like the sound of the 480 delta fan above 90%, as I find it drowns out street noise. I'll probably have to rig up some alternative noise suppresion scheme when I do that. Current pre-alpha 'poking around' application test builds do reportedly raise the power by about 20 Watts per GTX 480 so far, probably will be something similar for 470, which is something you might want to keep in mind for preparing your cooling & power for the future. Probably better in the long run to back things off a little rather than run right on the knife edge, and plan for some kindof overkill approach. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.