ATI - 6.10.13 - GFLOPS - How accurate?

Author	Message
BigWaveSurfer Send message Joined: 29 Nov 01 Posts: 186 Credit: 36,311,381 RAC: 141	Message 939894 - Posted: 14 Oct 2009, 17:19:21 UTC I installed 6.10.13 on my one system with an ATI card (yes, I know SETI does not have an ap for ATI cards....yet) but I was surprised by the GFLOPS the card pulled. BOINC says the card (ATI Radeon HD 2600 1GB) is 174 GFLOPS, is that accurate?! My OC'ed 9600GT is only 41 GFLOPS, that is a huge difference. Just curious, thanks! ID: 939894 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 939895 - Posted: 14 Oct 2009, 17:23:19 UTC - in response to Message 939894. I was using my 2600XT 1GB on Collatz until it blew up a few weeks ago. It could run a Collatz WU in about 2 hours. My 4770 can do the same work in about 12 minutes using their optimized app 2.05b . its free credits so I wouldn't knock how slow that card is. Though I would upgrade the HSF if you intend on work collatz or Milkyway on it. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 939895 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 939905 - Posted: 14 Oct 2009, 18:13:19 UTC - in response to Message 939894. I installed 6.10.13 on my one system with an ATI card (yes, I know SETI does not have an ap for ATI cards....yet) but I was surprised by the GFLOPS the card pulled. BOINC says the card (ATI Radeon HD 2600 1GB) is 174 GFLOPS, is that accurate?! My OC'ed 9600GT is only 41 GFLOPS, that is a huge difference. Just curious, thanks! I think that this is probably yet another example of the difference between "marketing" flops and "working" (BOINC) flops. According to that GPUGrid chart, a 9600GT is rated by NVidia at 312 GFlops - no allowance for the overclocking. Those are what I call "marketing flops". Unfortunately, I suspect that BOINC v6.10.13 is reporting "marketing" flops for ATI cards, and "working" flops for NVidia cards. This is unfair, and is going to cause confusion for a long time to come - until there is a project which can process the same work on either card, and where we have some degree of confidence that the two applications are compiled with the same degree of optimisation. I doubt that will happen until OpenCL compilers are available for both manufacturers, and a project develops an application in OpenCL that can be compiled for both cards from a common codebase. Only then will we have a true comparison. ID: 939905 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 939907 - Posted: 14 Oct 2009, 18:38:32 UTC - in response to Message 939905. Agreed the Collatz project used optimized apps for both the ATI and CUda cards. Crunch3r has done a great job on them. It apprears that he's put a lot of effort in maximizing the ATI cards though. My 4770 runs Collatz much faster than a Cuda 260 and 275. I imagine that they are working on a better app as we speak for the Cuda cards In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 939907 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 939937 - Posted: 14 Oct 2009, 20:23:14 UTC - in response to Message 939894. Funny, I asked Andreas (Gipsel/Cluster P.) about that the other day. His answer to me: "The numbers for nvidia cards base on some kind of characterization or if you want benchmarking with the SETI application if I remember right. The ATI numbers are the theoretical single precision peak performance. And yes, in that sense they are correct. So it is basically the same as saying one CPU core running at 3GHz is capable of 6 GFlops in single precision (using the x87 FPU and for a more recent CPUs, a Pentium3 is only capable of 1 flop per cycle) or 24 GFlops using SSE (only Phenom/AthlonII and Core2/Core i7, for older ones it is 12 GFlops). To sum it up, the nvidia and ATI numbers are not comparable right now. I guess the CUDA runtime is also giving some information about the number of cores in the GPU. So it should be easy to get more comparable numbers to ATI. The peak SP performance is simply: cores * 2 * clock frequency One could include the apocryphal "missing MUL" of the nvidia cards by using cores * 3 * frequency (and for the GTX2xx line it has some credibility, but prior GPUs didn't have the capability, even when nvidias marketing claimed otherwise), but the next generation will be officially back to 2 flops per core and cycle. In ATIs case one has to deduce the number of units in the GPU from the number of SIMD blocks and the size of those, maybe one has to do something similar for nvidia cards too. In the end, one should arrive for instance for a GTX285 running stock clocks at: 240 cores * 3 flops * 1,476 GHz = 1,063 GFlops If one wants to include also double precision numbers, these are 1/5 of the SP peak in case of ATI and for nvidia 1/12 of the SP throughput (will be more with the next generation). So a HD5870 has 2720 GFlops in single precision and 544 GFlops in double precision. The Milkyway app is actually capable of really using about 400 DP GFlops of it, so it is a real figure and not just made up by the marketing department. Such a card is a real monster for those puny WUs ;). A GTX285 has the already mentioned 1063 GFlops in SP and a measly 88.6 GFlops with DP (and can use roughly 50GFlops of it in Milkyway after an nvidia engineer helped the project a bit)." ID: 939937 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 939992 - Posted: 14 Oct 2009, 23:37:27 UTC Very interesting information. Thanks Jord! ID: 939992 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 940157 - Posted: 15 Oct 2009, 14:37:00 UTC yes surprising information to say the least. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 940157 ·

Steve Send message Joined: 18 May 99 Posts: 94 Credit: 68,888 RAC: 0	Message 940195 - Posted: 15 Oct 2009, 18:35:30 UTC More info on the ati 5870 internals that may matter to those that understand the technical details of everything and what may limit what or allow for what. (src: beyond3d) best non-gaming review I've found so far anyhow - tho they do cover that slightly - I'm not in the "understand the intricate tech details" category tho it was useful to learn what marketing specs vs actual specs are, once sliced and diced. ID: 940195 ·

madasczik Send message Joined: 13 May 09 Posts: 12 Credit: 1,693,704 RAC: 0	Message 940223 - Posted: 15 Oct 2009, 21:42:09 UTC Cool, I was wondering why only NVIDIA cards were showing up on computer stats, felt left out with my dual ATI HD4870 X 2. It's great to see that it's being worked on. Will be nice having more cores working for the cause...going to give the 6.10.13 build a try. I wonder if it's going to see all 4 GPUs since the cards are running in quad crossfire mode... Collatz looks very promising, going to give that whirl too. ID: 940223 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20	Message 940226 - Posted: 15 Oct 2009, 21:54:01 UTC - in response to Message 940195. Last modified: 15 Oct 2009, 21:54:31 UTC More info on the ati 5870 internals that may matter to those that understand the technical details of everything and what may limit what or allow for what. (src: beyond3d) best non-gaming review I've found so far anyhow - tho they do cover that slightly - I'm not in the "understand the intricate tech details" category tho it was useful to learn what marketing specs vs actual specs are, once sliced and diced. At 40nm, over 2 billion transistors, and 188W peak power for a single piece of silicon, that all adds up to an impressive feat of design. The question there though is for how well the various bottlenecks balance out. Also, how flexible is that architecture for performing more general OpenCL (CUDA-esq) operations? One aspect that I noticed is that ATI appear to have more of a dedicated pipeline architecture whereas the nVidia architecture appears to be nearer to that of a more general purpose highly parallel array processor. Any GPU programmers able to comment on the pros/cons for programming them? Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 940226 ·

jenesuispasbavard Volunteer tester Send message Joined: 13 Sep 05 Posts: 49 Credit: 12,385,974 RAC: 0	Message 941284 - Posted: 19 Oct 2009, 7:53:57 UTC Doesn't 6.10.13 report nvidia's "marketing" FLOPS as well? The numbers in both cases (ATI and nVidia) are the peak single-precision float performance. ID: 941284 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 941307 - Posted: 19 Oct 2009, 10:09:15 UTC - in response to Message 941284. Last modified: 19 Oct 2009, 10:13:20 UTC Doesn't 6.10.13 report nvidia's "marketing" FLOPS as well? The numbers in both cases (ATI and nVidia) are the peak single-precision float performance. No. The ATI was the peak speed and the nvidia was a figure from BOINC based upon the speed the "reference" card could do. Or as others referred to them marketing flops and BOINC flops. This is one reason why the ATI appears to be faster if you look at just the numbers given at the BOINC startup. Its been changed in 6.10.14. From the change log... - client/scheduler: standardize the FLOPS estimate between NVIDIA and ATI. Make them both peak FLOPS, according to the formula supplied by the manufacturer. BOINC blog ID: 941307 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.