GPU FLOPS: Theory vs Reality

Message boards : Number crunching : GPU FLOPS: Theory vs Reality
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6466
Credit: 175,680,990
RAC: 50,627
United States
Message 1899124 - Posted: 4 Nov 2017, 17:10:45 UTC - in response to Message 1899090.  

Fanboy-ism and trolling aside, we all know historically AMD cards have (almost) always had better raw computational power in the consumer market. Professional offerings are almost neck-and-neck (drivers and SW support not taken into account).

A GTX580 (stock clocks) has ~1.6 GFLOPS of compute power. RX 580 on the other side has ~6.2 GFLOPS. That is 5 times (!) more raw power.

Do you really believe the alt coin miners would go for 100s of AMD cards if they had a way to make GTX580 profitable?

No way nvidia's GTX580 crunches more numbers than a RX 580, all other factors aside. This means that the key is in the "other factors", e.g. CUDA vs OpenCL, or other cruncher optimizations. Maybe the workload is just too non-typical and AMD cards have no shortcuts crunching it? Maybe I'm simply misinterpreting the chart?

Anyhow, yesterday I got an email from S@H about how much more processing power is needed for the new telescopes and projects. Maybe, just maybe, if some skilled individual(s) spend some time optimizing the code for AMD cards more, we'll get some of that needed power for "free"? Bear in mind however, that I don't really know how many of the AMD owners are contributing to SETI@home, with all the mining craze currently raging. It might be not worth it to optimize further for just a few AMD cards, which would be really sad for me, since my RX is crunching for SETI most of the time.

I'm not sure why you would compare a Nvidia GTX 580 to a Radeon RX 580. Given the GTX 580 was released in 2010 and the RX 580 in 2017.

Really you would want to compare a GeForce GTX 1070 to the Radeon RX 580. Since those are both current models that came out this year and are rated for about 6 TFLOPS in Single Precision. With the GTX 1070 rated at 5783/6462 GFLOPS Base/Boost and the RX 580 at 5792/6175 GFLOPS Base/Boost.
Some of the notable differences between the two I can think of right now are:
Cost: The GTX 1070 lists for~$400USD. The RX 580 lists for ~$200USD
Double Precision performance: GeForce GPUs are limited to 1/32 SP FLOPS. Radon GPUs are limited to 1/16 SP FLOPS. (This is not currently relevant to SETI@home use)
Power: The GTX 1070 has a TDP of 150W. The RX 580 is rated for 185W. (Actual power usage varies but is often ~70% of TDP while running SETI@home)
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1899124 · Report as offensive     Reply Quote
Dimitar Stoynev
Avatar

Send message
Joined: 7 Jan 09
Posts: 19
Credit: 333,920
RAC: 2,244
Bulgaria
Message 1899158 - Posted: 4 Nov 2017, 19:43:40 UTC - in response to Message 1899124.  


I'm not sure why you would compare a Nvidia GTX 580 to a Radeon RX 580. Given the GTX 580 was released in 2010 and the RX 580 in 2017.


According to Shaggie's leftmost chart a GTX580 generates more credit than the average RX 580, and so does the 770, and so on. More credit per hour means more crunching done, right?

That is what I can't get.
a. If more credit per hour means more numbers crunched, then how come the GTX580s crunch more numbers than a 4x faster card (in GFLOPS aspect)?
b. If more credit per hour DOESN'T mean more numbers crunched, then what does it mean?
ID: 1899158 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8877
Credit: 114,934,843
RAC: 69,648
Australia
Message 1899176 - Posted: 4 Nov 2017, 21:31:32 UTC - in response to Message 1899158.  

a. If more credit per hour means more numbers crunched, then how come the GTX580s crunch more numbers than a 4x faster card (in GFLOPS aspect)?

A cards GFLOPs rating is a very much a theoretical number.
The software being run has to actually be able to take advantage of that potential performance.
Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind. And as good as that application is, it still doesn't come close to the theoretical (ie claimed) SP (Single Precision) performance of that card.

There are lies, damned lies, statistics, benchmarks, and finally claimed Floating/Integer performance numbers.
Grant
Darwin NT
ID: 1899176 · Report as offensive     Reply Quote
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6466
Credit: 175,680,990
RAC: 50,627
United States
Message 1899197 - Posted: 4 Nov 2017, 23:06:59 UTC - in response to Message 1899176.  

a. If more credit per hour means more numbers crunched, then how come the GTX580s crunch more numbers than a 4x faster card (in GFLOPS aspect)?

A cards GFLOPs rating is a very much a theoretical number.
The software being run has to actually be able to take advantage of that potential performance.
Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind. And as good as that application is, it still doesn't come close to the theoretical (ie claimed) SP (Single Precision) performance of that card.

There are lies, damned lies, statistics, benchmarks, and finally claimed Floating/Integer performance numbers.

I believe that it was said that the original CUDA app was ~3-4% efficient. And it was basically written by Nvidia for SETI@home.

I seem to recall that the SoG app is better when running GBT tasks but the CUDA app is slightly better for Arecibo tasks?
I don't really watch my 750 ti that closely to be sure.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1899197 · Report as offensive     Reply Quote
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1899198 - Posted: 4 Nov 2017, 23:32:49 UTC

I don't know for sure but I suspect that mining tasks use integer arithmetic but SETI@home uses floating-point math; AMD cards might offer better integer performance per watt which could explain the miners' enthusiasm.
ID: 1899198 · Report as offensive     Reply Quote
Profile M_M
Avatar

Send message
Joined: 20 May 04
Posts: 57
Credit: 29,839,086
RAC: 23,495
Serbia
Message 1899269 - Posted: 5 Nov 2017, 9:28:02 UTC - in response to Message 1899176.  

[quote]
The software being run has to actually be able to take advantage of that potential performance.
Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind.

So it is about how well is application is suited for particular architecture but mostly how well is written to use potential performance - and at the end it seems that currently, in general, nVidia is a bit better in SETI, and AMD is bit better in coin mining?

BTW if Linux special app is so much more efficient, why it isn't ported to Windows app? Is app so much reliant to underling OS, since CPU instruction set is the same and GPU drivers are probably very similar?
ID: 1899269 · Report as offensive     Reply Quote
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30589
Credit: 57,541,157
RAC: 30,457
Germany
Message 1899271 - Posted: 5 Nov 2017, 9:40:40 UTC - in response to Message 1899269.  

[quote]
The software being run has to actually be able to take advantage of that potential performance.
Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind.

So it is about how well is application is suited for particular architecture but mostly how well is written to use potential performance - and at the end it seems that currently, in general, nVidia is a bit better in SETI, and AMD is bit better in coin mining?

BTW if Linux special app is so much more efficient, why it isn't ported to Windows app? Is app so much reliant to underling OS, since CPU instruction set is the same and GPU drivers are probably very similar?


It also depends on the compiler in use.
Some instructions have to be redefined and not each compiler produces the fastest code possible.
With each crime and every kindness we birth our future.
ID: 1899271 · Report as offensive     Reply Quote
Dimitar Stoynev
Avatar

Send message
Joined: 7 Jan 09
Posts: 19
Credit: 333,920
RAC: 2,244
Bulgaria
Message 1899287 - Posted: 5 Nov 2017, 13:06:35 UTC - in response to Message 1899198.  
Last modified: 5 Nov 2017, 13:07:01 UTC

I don't know for sure but I suspect that mining tasks use integer arithmetic but SETI@home uses floating-point math; AMD cards might offer better integer performance per watt which could explain the miners' enthusiasm.


Hashing (coin mining) is mostly integer operations, while SETI and most other sci(-fi?) projects are relying on floating point arithmetic. Not all floating points are created equal, enter PRECISION. Things are well explained here:

https://arrayfire.com/explaining-fp64-performance-on-gpus/

https://steemit.com/gridcoin/@vortac/gridcoin-gpu-mining-6-obtaining-the-maximum-performance-out-of-your-gpus

Minus the code optimizations, it seems the most relevant metric here is the FP32/FP64 performance, which is usually reserved for top professional cards. If only we knew what type of loads SETI is sending us for crunching... Speaking of which, can we get a dev to shed more light on the topic?

Even with ALL said till this point, GTX580 has "anemic" (for modern standards) computing performance thorough and through: http://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/
At the same time, the RX 580 is a good all-rounder, outmatching every aspect of the GTX by orders of maginitudehttp://www.relaxedtech.com/reviews/amd/radeon-rx-580-rx-570/1 Something's really off here, per se...

Q: How on Earth then does the GTX get more credit?

ID: 1899287 · Report as offensive     Reply Quote
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15192
Credit: 251,092,871
RAC: 321,191
United Kingdom
Message 1899320 - Posted: 5 Nov 2017, 15:53:32 UTC

BTW if Linux special app is so much more efficient, why it isn't ported to Windows app? Is app so much reliant to underling OS, since CPU instruction set is the same and GPU drivers are probably very similar?


People have been trying to get a Windows CUDA 6.0 (or greater) application to work as well as the Linux one does. There are a number of hurdles in the way, mostly to do with the way the two O/S drivers work through their respective APIs, crudely, Windows gets in the way of the "hyper optimisation" that Petri pioneered for CUDA, while Linux sits back and let's it happen. I'm sure someone will be along soon with all the blood and guts details, but that's the headlines....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1899320 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8877
Credit: 114,934,843
RAC: 69,648
Australia
Message 1899413 - Posted: 5 Nov 2017, 22:13:48 UTC - in response to Message 1899287.  
Last modified: 5 Nov 2017, 22:15:56 UTC

Q: How on Earth then does the GTX get more credit?

It processes more work.
As I mentioned previously, the software has to make use of what the hardware has to offer.

EDIT- just to confuse things even more- what really matters is how long it takes to crunch the (many) different types of WUs. Unfortunately the allocation of Credit, particularly for GPU work, is rather random with a very wide range of variability.
Grant
Darwin NT
ID: 1899413 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8877
Credit: 114,934,843
RAC: 69,648
Australia
Message 1899414 - Posted: 5 Nov 2017, 22:17:55 UTC

Shaggie, would it be possible to have a 2nd pair of graphs that look at just Linux Special application hosts, or even just Linux hosts including those with the Special Application?
Grant
Darwin NT
ID: 1899414 · Report as offensive     Reply Quote
Dimitar Stoynev
Avatar

Send message
Joined: 7 Jan 09
Posts: 19
Credit: 333,920
RAC: 2,244
Bulgaria
Message 1899564 - Posted: 6 Nov 2017, 18:11:14 UTC - in response to Message 1899413.  

Q: How on Earth then does the GTX get more credit?

It processes more work.
As I mentioned previously, the software has to make use of what the hardware has to offer.


And this means workers for AMD cards are very under-optimized. My experience with Lunatics' app is not so stellar thus far (pun not intended).

2 things here:
a.) if Mike's app is really fast, then how come it's not pushed upstream for main cruncher?
b.) if AMD workers are really not so well optimized, then how come nobody tuned them? It's "free" performance after all!

Anyhow, for me personally, the case is closed. I dropped AMD a few lines and I hope we hear from them soon. In the meantime, I urge everyone who's really interested in better science through performance, to drop AMD a line as well.
ID: 1899564 · Report as offensive     Reply Quote
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15192
Credit: 251,092,871
RAC: 321,191
United Kingdom
Message 1899568 - Posted: 6 Nov 2017, 18:27:33 UTC

It's actually Petri's application.
Answering your questions
1 - There are still some issues that need to be resolved that are resulting in a very high "inconclusive" count (not "errors" or "invalids"). Work is being done in resolving this, but it is being very stubborn It is currently only available to run on recent versions of Linux, the Windows development is some months behind the Linux one.
2 - Nobody has (successfully) applied the same techniques as Petri has to an AMD based system. That is not to say it can't, but just that nobody has succeeded yet. However I suspect there are other issues around that make its success less likely.
AMD don't appear to be very interested in developing their hardware and drivers for the sort of use that the SETI, and similar, applications require.

Good luck with your approach to AMD, I hope it does bear fruit.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1899568 · Report as offensive     Reply Quote
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1899570 - Posted: 6 Nov 2017, 18:38:16 UTC - in response to Message 1899414.  

Still not interested, sorry.
ID: 1899570 · Report as offensive     Reply Quote
Dimitar Stoynev
Avatar

Send message
Joined: 7 Jan 09
Posts: 19
Credit: 333,920
RAC: 2,244
Bulgaria
Message 1899577 - Posted: 6 Nov 2017, 19:07:49 UTC - in response to Message 1899568.  

Thanks, Rob!

I was somehow afraid that you'd say that. I'm bamboozled how only volunteers are trying to optimize the code. After all, SETI is a project with huge significance for the entire mankind. It's hard for me to believe that some AMD engineer can't spend a couple of hours and produce a nearly-optimal code for crunching SETI WU.

My box is sitting idly approx. 85% of the time, generating noting more than an electricity bill and waste heat. It really hurts me seeing it's potential wasted. On the other hand, I was always fascinated with the deep space and its secrets, and that is why I chose to donate my "firepower" (an otherwise really decently performing rig) to SETI. I could have easily put a few more GPUs there and mine Ether, but I didn't.

I guess I'll have to go "full retard" on AMD, hoping they would take a look at this. On a side note, I wonder how SETI project managers and owners don't go politely to AMD and ask them to spend some time on the project.
ID: 1899577 · Report as offensive     Reply Quote
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1900619 - Posted: 12 Nov 2017, 0:46:20 UTC

I ran a few scans over the last two weeks and combined the data together to give a picture of some rarer cards that might not otherwise have enough valid results to get a picture from. There aren't enough Pascal Titans yet but this scan has some RX Vega parts in mix (sadly it does not seem well optimized for SETI by default). There aren't enough 1070 Ti hosts yet either (2 by my count).

I took a stab at hooking up data for Intel IGPs -- the TDP values are probably wrong so don't pay too much attention to the CPWH chart for them -- I was mostly curious about throughput and it's pretty abysmal as you'd expect. I probably won't include those again since the CPWH is misleading.

To try to keep the charts manageable I've started omitting some of the older generation cards; I assume that since you're probably looking at this data to help guide setting up machines it's unlikely that you'll be worried about vintage parts but I could be wrong.

ID: 1900619 · Report as offensive     Reply Quote
Profile ML1
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 9370
Credit: 7,020,203
RAC: 5,226
United Kingdom
Message 1900630 - Posted: 12 Nov 2017, 1:23:25 UTC - in response to Message 1900619.  

Very good thanks... Eying up for 'Black Friday'...

(Hope you're not being 'sponsored' ;-) ;-) )


Happy fast crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1900630 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : GPU FLOPS: Theory vs Reality


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.