GPU FLOPS: Theory vs Reality

Author	Message
HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1899124 - Posted: 4 Nov 2017, 17:10:45 UTC - in response to Message 1899090. Fanboy-ism and trolling aside, we all know historically AMD cards have (almost) always had better raw computational power in the consumer market. Professional offerings are almost neck-and-neck (drivers and SW support not taken into account). A GTX580 (stock clocks) has ~1.6 GFLOPS of compute power. RX 580 on the other side has ~6.2 GFLOPS. That is 5 times (!) more raw power. Do you really believe the alt coin miners would go for 100s of AMD cards if they had a way to make GTX580 profitable? No way nvidia's GTX580 crunches more numbers than a RX 580, all other factors aside. This means that the key is in the "other factors", e.g. CUDA vs OpenCL, or other cruncher optimizations. Maybe the workload is just too non-typical and AMD cards have no shortcuts crunching it? Maybe I'm simply misinterpreting the chart? Anyhow, yesterday I got an email from S@H about how much more processing power is needed for the new telescopes and projects. Maybe, just maybe, if some skilled individual(s) spend some time optimizing the code for AMD cards more, we'll get some of that needed power for "free"? Bear in mind however, that I don't really know how many of the AMD owners are contributing to SETI@home, with all the mining craze currently raging. It might be not worth it to optimize further for just a few AMD cards, which would be really sad for me, since my RX is crunching for SETI most of the time. I'm not sure why you would compare a Nvidia GTX 580 to a Radeon RX 580. Given the GTX 580 was released in 2010 and the RX 580 in 2017. Really you would want to compare a GeForce GTX 1070 to the Radeon RX 580. Since those are both current models that came out this year and are rated for about 6 TFLOPS in Single Precision. With the GTX 1070 rated at 5783/6462 GFLOPS Base/Boost and the RX 580 at 5792/6175 GFLOPS Base/Boost. Some of the notable differences between the two I can think of right now are: Cost: The GTX 1070 lists for~$400USD. The RX 580 lists for ~$200USD Double Precision performance: GeForce GPUs are limited to 1/32 SP FLOPS. Radon GPUs are limited to 1/16 SP FLOPS. (This is not currently relevant to SETI@home use) Power: The GTX 1070 has a TDP of 150W. The RX 580 is rated for 185W. (Actual power usage varies but is often ~70% of TDP while running SETI@home) SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1899124 ·

Dimitar Stoynev Send message Joined: 7 Jan 09 Posts: 19 Credit: 336,531 RAC: 0	Message 1899158 - Posted: 4 Nov 2017, 19:43:40 UTC - in response to Message 1899124. I'm not sure why you would compare a Nvidia GTX 580 to a Radeon RX 580. Given the GTX 580 was released in 2010 and the RX 580 in 2017. According to Shaggie's leftmost chart a GTX580 generates more credit than the average RX 580, and so does the 770, and so on. More credit per hour means more crunching done, right? That is what I can't get. a. If more credit per hour means more numbers crunched, then how come the GTX580s crunch more numbers than a 4x faster card (in GFLOPS aspect)? b. If more credit per hour DOESN'T mean more numbers crunched, then what does it mean? ID: 1899158 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1899176 - Posted: 4 Nov 2017, 21:31:32 UTC - in response to Message 1899158. a. If more credit per hour means more numbers crunched, then how come the GTX580s crunch more numbers than a 4x faster card (in GFLOPS aspect)? A cards GFLOPs rating is a very much a theoretical number. The software being run has to actually be able to take advantage of that potential performance. Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind. And as good as that application is, it still doesn't come close to the theoretical (ie claimed) SP (Single Precision) performance of that card. There are lies, damned lies, statistics, benchmarks, and finally claimed Floating/Integer performance numbers. Grant Darwin NT ID: 1899176 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1899197 - Posted: 4 Nov 2017, 23:06:59 UTC - in response to Message 1899176. a. If more credit per hour means more numbers crunched, then how come the GTX580s crunch more numbers than a 4x faster card (in GFLOPS aspect)? A cards GFLOPs rating is a very much a theoretical number. The software being run has to actually be able to take advantage of that potential performance. Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind. And as good as that application is, it still doesn't come close to the theoretical (ie claimed) SP (Single Precision) performance of that card. There are lies, damned lies, statistics, benchmarks, and finally claimed Floating/Integer performance numbers. I believe that it was said that the original CUDA app was ~3-4% efficient. And it was basically written by Nvidia for SETI@home. I seem to recall that the SoG app is better when running GBT tasks but the CUDA app is slightly better for Arecibo tasks? I don't really watch my 750 ti that closely to be sure. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1899197 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1899198 - Posted: 4 Nov 2017, 23:32:49 UTC I don't know for sure but I suspect that mining tasks use integer arithmetic but SETI@home uses floating-point math; AMD cards might offer better integer performance per watt which could explain the miners' enthusiasm. ID: 1899198 ·

M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8	Message 1899269 - Posted: 5 Nov 2017, 9:28:02 UTC - in response to Message 1899176. [quote] The software being run has to actually be able to take advantage of that potential performance. Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind. So it is about how well is application is suited for particular architecture but mostly how well is written to use potential performance - and at the end it seems that currently, in general, nVidia is a bit better in SETI, and AMD is bit better in coin mining? BTW if Linux special app is so much more efficient, why it isn't ported to Windows app? Is app so much reliant to underling OS, since CPU instruction set is the same and GPU drivers are probably very similar? ID: 1899269 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80	Message 1899271 - Posted: 5 Nov 2017, 9:40:40 UTC - in response to Message 1899269. [quote] The software being run has to actually be able to take advantage of that potential performance. Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind. So it is about how well is application is suited for particular architecture but mostly how well is written to use potential performance - and at the end it seems that currently, in general, nVidia is a bit better in SETI, and AMD is bit better in coin mining? BTW if Linux special app is so much more efficient, why it isn't ported to Windows app? Is app so much reliant to underling OS, since CPU instruction set is the same and GPU drivers are probably very similar? It also depends on the compiler in use. Some instructions have to be redefined and not each compiler produces the fastest code possible. With each crime and every kindness we birth our future. ID: 1899271 ·

Dimitar Stoynev Send message Joined: 7 Jan 09 Posts: 19 Credit: 336,531 RAC: 0	Message 1899287 - Posted: 5 Nov 2017, 13:06:35 UTC - in response to Message 1899198. Last modified: 5 Nov 2017, 13:07:01 UTC I don't know for sure but I suspect that mining tasks use integer arithmetic but SETI@home uses floating-point math; AMD cards might offer better integer performance per watt which could explain the miners' enthusiasm. Hashing (coin mining) is mostly integer operations, while SETI and most other sci(-fi?) projects are relying on floating point arithmetic. Not all floating points are created equal, enter PRECISION. Things are well explained here: https://arrayfire.com/explaining-fp64-performance-on-gpus/ https://steemit.com/gridcoin/@vortac/gridcoin-gpu-mining-6-obtaining-the-maximum-performance-out-of-your-gpus Minus the code optimizations, it seems the most relevant metric here is the FP32/FP64 performance, which is usually reserved for top professional cards. If only we knew what type of loads SETI is sending us for crunching... Speaking of which, can we get a dev to shed more light on the topic? Even with ALL said till this point, GTX580 has "anemic" (for modern standards) computing performance thorough and through: http://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/ At the same time, the RX 580 is a good all-rounder, outmatching every aspect of the GTX by orders of maginitudehttp://www.relaxedtech.com/reviews/amd/radeon-rx-580-rx-570/1 Something's really off here, per se... Q: How on Earth then does the GTX get more credit? ID: 1899287 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380	Message 1899320 - Posted: 5 Nov 2017, 15:53:32 UTC BTW if Linux special app is so much more efficient, why it isn't ported to Windows app? Is app so much reliant to underling OS, since CPU instruction set is the same and GPU drivers are probably very similar? People have been trying to get a Windows CUDA 6.0 (or greater) application to work as well as the Linux one does. There are a number of hurdles in the way, mostly to do with the way the two O/S drivers work through their respective APIs, crudely, Windows gets in the way of the "hyper optimisation" that Petri pioneered for CUDA, while Linux sits back and let's it happen. I'm sure someone will be along soon with all the blood and guts details, but that's the headlines.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1899320 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1899413 - Posted: 5 Nov 2017, 22:13:48 UTC - in response to Message 1899287. Last modified: 5 Nov 2017, 22:15:56 UTC Q: How on Earth then does the GTX get more credit? It processes more work. As I mentioned previously, the software has to make use of what the hardware has to offer. EDIT- just to confuse things even more- what really matters is how long it takes to crunch the (many) different types of WUs. Unfortunately the allocation of Credit, particularly for GPU work, is rather random with a very wide range of variability. Grant Darwin NT ID: 1899413 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1899414 - Posted: 5 Nov 2017, 22:17:55 UTC Shaggie, would it be possible to have a 2nd pair of graphs that look at just Linux Special application hosts, or even just Linux hosts including those with the Special Application? Grant Darwin NT ID: 1899414 ·

Dimitar Stoynev Send message Joined: 7 Jan 09 Posts: 19 Credit: 336,531 RAC: 0	Message 1899564 - Posted: 6 Nov 2017, 18:11:14 UTC - in response to Message 1899413. Q: How on Earth then does the GTX get more credit? It processes more work. As I mentioned previously, the software has to make use of what the hardware has to offer. And this means workers for AMD cards are very under-optimized. My experience with Lunatics' app is not so stellar thus far (pun not intended). 2 things here: a.) if Mike's app is really fast, then how come it's not pushed upstream for main cruncher? b.) if AMD workers are really not so well optimized, then how come nobody tuned them? It's "free" performance after all! Anyhow, for me personally, the case is closed. I dropped AMD a few lines and I hope we hear from them soon. In the meantime, I urge everyone who's really interested in better science through performance, to drop AMD a line as well. ID: 1899564 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380	Message 1899568 - Posted: 6 Nov 2017, 18:27:33 UTC It's actually Petri's application. Answering your questions 1 - There are still some issues that need to be resolved that are resulting in a very high "inconclusive" count (not "errors" or "invalids"). Work is being done in resolving this, but it is being very stubborn It is currently only available to run on recent versions of Linux, the Windows development is some months behind the Linux one. 2 - Nobody has (successfully) applied the same techniques as Petri has to an AMD based system. That is not to say it can't, but just that nobody has succeeded yet. However I suspect there are other issues around that make its success less likely. AMD don't appear to be very interested in developing their hardware and drivers for the sort of use that the SETI, and similar, applications require. Good luck with your approach to AMD, I hope it does bear fruit. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1899568 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1899570 - Posted: 6 Nov 2017, 18:38:16 UTC - in response to Message 1899414. Still not interested, sorry. ID: 1899570 ·

Dimitar Stoynev Send message Joined: 7 Jan 09 Posts: 19 Credit: 336,531 RAC: 0	Message 1899577 - Posted: 6 Nov 2017, 19:07:49 UTC - in response to Message 1899568. Thanks, Rob! I was somehow afraid that you'd say that. I'm bamboozled how only volunteers are trying to optimize the code. After all, SETI is a project with huge significance for the entire mankind. It's hard for me to believe that some AMD engineer can't spend a couple of hours and produce a nearly-optimal code for crunching SETI WU. My box is sitting idly approx. 85% of the time, generating noting more than an electricity bill and waste heat. It really hurts me seeing it's potential wasted. On the other hand, I was always fascinated with the deep space and its secrets, and that is why I chose to donate my "firepower" (an otherwise really decently performing rig) to SETI. I could have easily put a few more GPUs there and mine Ether, but I didn't. I guess I'll have to go "full retard" on AMD, hoping they would take a look at this. On a side note, I wonder how SETI project managers and owners don't go politely to AMD and ask them to spend some time on the project. ID: 1899577 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1900619 - Posted: 12 Nov 2017, 0:46:20 UTC I ran a few scans over the last two weeks and combined the data together to give a picture of some rarer cards that might not otherwise have enough valid results to get a picture from. There aren't enough Pascal Titans yet but this scan has some RX Vega parts in mix (sadly it does not seem well optimized for SETI by default). There aren't enough 1070 Ti hosts yet either (2 by my count). I took a stab at hooking up data for Intel IGPs -- the TDP values are probably wrong so don't pay too much attention to the CPWH chart for them -- I was mostly curious about throughput and it's pretty abysmal as you'd expect. I probably won't include those again since the CPWH is misleading. To try to keep the charts manageable I've started omitting some of the older generation cards; I assume that since you're probably looking at this data to help guide setting up machines it's unlikely that you'll be worried about vintage parts but I could be wrong. ID: 1900619 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20258 Credit: 7,508,002 RAC: 20	Message 1900630 - Posted: 12 Nov 2017, 1:23:25 UTC - in response to Message 1900619. Very good thanks... Eying up for 'Black Friday'... (Hope you're not being 'sponsored' ;-) ;-) ) Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 1900630 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1903225 - Posted: 28 Nov 2017, 0:05:18 UTC I ran another scan today and there still aren't enough 1070 Ti's in circulation yet. ID: 1903225 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1903250 - Posted: 28 Nov 2017, 1:46:10 UTC - in response to Message 1903225. I did my part and put in my GTX 1070 Ti 3 weeks ago. Surprised that there are still so few users of the Ti? It is still $100 cheaper than the cheapest 1080. I think even the Black Friday deals couldn't approach the price of the 1070Ti. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1903250 ·

EdwardPF Volunteer tester Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374	Message 1904541 - Posted: 3 Dec 2017, 2:24:39 UTC - in response to Message 1903225. folks over at GPUgrd are reporting 1070ti gpus ... Ed F ID: 1904541 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.