Message boards :
Number crunching :
GPU FLOPS: Theory vs Reality
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 20 · Next
Author | Message |
---|---|
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Fanboy-ism and trolling aside, we all know historically AMD cards have (almost) always had better raw computational power in the consumer market. Professional offerings are almost neck-and-neck (drivers and SW support not taken into account). I'm not sure why you would compare a Nvidia GTX 580 to a Radeon RX 580. Given the GTX 580 was released in 2010 and the RX 580 in 2017. Really you would want to compare a GeForce GTX 1070 to the Radeon RX 580. Since those are both current models that came out this year and are rated for about 6 TFLOPS in Single Precision. With the GTX 1070 rated at 5783/6462 GFLOPS Base/Boost and the RX 580 at 5792/6175 GFLOPS Base/Boost. Some of the notable differences between the two I can think of right now are: Cost: The GTX 1070 lists for~$400USD. The RX 580 lists for ~$200USD Double Precision performance: GeForce GPUs are limited to 1/32 SP FLOPS. Radon GPUs are limited to 1/16 SP FLOPS. (This is not currently relevant to SETI@home use) Power: The GTX 1070 has a TDP of 150W. The RX 580 is rated for 185W. (Actual power usage varies but is often ~70% of TDP while running SETI@home) SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Dimitar Stoynev Send message Joined: 7 Jan 09 Posts: 19 Credit: 336,531 RAC: 0 |
According to Shaggie's leftmost chart a GTX580 generates more credit than the average RX 580, and so does the 770, and so on. More credit per hour means more crunching done, right? That is what I can't get. a. If more credit per hour means more numbers crunched, then how come the GTX580s crunch more numbers than a 4x faster card (in GFLOPS aspect)? b. If more credit per hour DOESN'T mean more numbers crunched, then what does it mean? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
a. If more credit per hour means more numbers crunched, then how come the GTX580s crunch more numbers than a 4x faster card (in GFLOPS aspect)? A cards GFLOPs rating is a very much a theoretical number. The software being run has to actually be able to take advantage of that potential performance. Which is why for a given Nvidia card the SoG application leaves the older CUDA applications way behind. And for the same card the Linux special application leaves the SoG application way behind. And as good as that application is, it still doesn't come close to the theoretical (ie claimed) SP (Single Precision) performance of that card. There are lies, damned lies, statistics, benchmarks, and finally claimed Floating/Integer performance numbers. Grant Darwin NT |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
a. If more credit per hour means more numbers crunched, then how come the GTX580s crunch more numbers than a 4x faster card (in GFLOPS aspect)? I believe that it was said that the original CUDA app was ~3-4% efficient. And it was basically written by Nvidia for SETI@home. I seem to recall that the SoG app is better when running GBT tasks but the CUDA app is slightly better for Arecibo tasks? I don't really watch my 750 ti that closely to be sure. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196 |
I don't know for sure but I suspect that mining tasks use integer arithmetic but SETI@home uses floating-point math; AMD cards might offer better integer performance per watt which could explain the miners' enthusiasm. |
M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8 |
[quote] So it is about how well is application is suited for particular architecture but mostly how well is written to use potential performance - and at the end it seems that currently, in general, nVidia is a bit better in SETI, and AMD is bit better in coin mining? BTW if Linux special app is so much more efficient, why it isn't ported to Windows app? Is app so much reliant to underling OS, since CPU instruction set is the same and GPU drivers are probably very similar? |
Mike Send message Joined: 17 Feb 01 Posts: 34354 Credit: 79,922,639 RAC: 80 |
[quote] It also depends on the compiler in use. Some instructions have to be redefined and not each compiler produces the fastest code possible. With each crime and every kindness we birth our future. |
Dimitar Stoynev Send message Joined: 7 Jan 09 Posts: 19 Credit: 336,531 RAC: 0 |
I don't know for sure but I suspect that mining tasks use integer arithmetic but SETI@home uses floating-point math; AMD cards might offer better integer performance per watt which could explain the miners' enthusiasm. Hashing (coin mining) is mostly integer operations, while SETI and most other sci(-fi?) projects are relying on floating point arithmetic. Not all floating points are created equal, enter PRECISION. Things are well explained here: https://arrayfire.com/explaining-fp64-performance-on-gpus/ https://steemit.com/gridcoin/@vortac/gridcoin-gpu-mining-6-obtaining-the-maximum-performance-out-of-your-gpus Minus the code optimizations, it seems the most relevant metric here is the FP32/FP64 performance, which is usually reserved for top professional cards. If only we knew what type of loads SETI is sending us for crunching... Speaking of which, can we get a dev to shed more light on the topic? Even with ALL said till this point, GTX580 has "anemic" (for modern standards) computing performance thorough and through: http://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/ At the same time, the RX 580 is a good all-rounder, outmatching every aspect of the GTX by orders of maginitudehttp://www.relaxedtech.com/reviews/amd/radeon-rx-580-rx-570/1 Something's really off here, per se... Q: How on Earth then does the GTX get more credit? |
rob smith Send message Joined: 7 Mar 03 Posts: 22461 Credit: 416,307,556 RAC: 380 |
BTW if Linux special app is so much more efficient, why it isn't ported to Windows app? Is app so much reliant to underling OS, since CPU instruction set is the same and GPU drivers are probably very similar? People have been trying to get a Windows CUDA 6.0 (or greater) application to work as well as the Linux one does. There are a number of hurdles in the way, mostly to do with the way the two O/S drivers work through their respective APIs, crudely, Windows gets in the way of the "hyper optimisation" that Petri pioneered for CUDA, while Linux sits back and let's it happen. I'm sure someone will be along soon with all the blood and guts details, but that's the headlines.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Q: How on Earth then does the GTX get more credit? It processes more work. As I mentioned previously, the software has to make use of what the hardware has to offer. EDIT- just to confuse things even more- what really matters is how long it takes to crunch the (many) different types of WUs. Unfortunately the allocation of Credit, particularly for GPU work, is rather random with a very wide range of variability. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Shaggie, would it be possible to have a 2nd pair of graphs that look at just Linux Special application hosts, or even just Linux hosts including those with the Special Application? Grant Darwin NT |
Dimitar Stoynev Send message Joined: 7 Jan 09 Posts: 19 Credit: 336,531 RAC: 0 |
Q: How on Earth then does the GTX get more credit? And this means workers for AMD cards are very under-optimized. My experience with Lunatics' app is not so stellar thus far (pun not intended). 2 things here: a.) if Mike's app is really fast, then how come it's not pushed upstream for main cruncher? b.) if AMD workers are really not so well optimized, then how come nobody tuned them? It's "free" performance after all! Anyhow, for me personally, the case is closed. I dropped AMD a few lines and I hope we hear from them soon. In the meantime, I urge everyone who's really interested in better science through performance, to drop AMD a line as well. |
rob smith Send message Joined: 7 Mar 03 Posts: 22461 Credit: 416,307,556 RAC: 380 |
It's actually Petri's application. Answering your questions 1 - There are still some issues that need to be resolved that are resulting in a very high "inconclusive" count (not "errors" or "invalids"). Work is being done in resolving this, but it is being very stubborn It is currently only available to run on recent versions of Linux, the Windows development is some months behind the Linux one. 2 - Nobody has (successfully) applied the same techniques as Petri has to an AMD based system. That is not to say it can't, but just that nobody has succeeded yet. However I suspect there are other issues around that make its success less likely. AMD don't appear to be very interested in developing their hardware and drivers for the sort of use that the SETI, and similar, applications require. Good luck with your approach to AMD, I hope it does bear fruit. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196 |
Still not interested, sorry. |
Dimitar Stoynev Send message Joined: 7 Jan 09 Posts: 19 Credit: 336,531 RAC: 0 |
Thanks, Rob! I was somehow afraid that you'd say that. I'm bamboozled how only volunteers are trying to optimize the code. After all, SETI is a project with huge significance for the entire mankind. It's hard for me to believe that some AMD engineer can't spend a couple of hours and produce a nearly-optimal code for crunching SETI WU. My box is sitting idly approx. 85% of the time, generating noting more than an electricity bill and waste heat. It really hurts me seeing it's potential wasted. On the other hand, I was always fascinated with the deep space and its secrets, and that is why I chose to donate my "firepower" (an otherwise really decently performing rig) to SETI. I could have easily put a few more GPUs there and mine Ether, but I didn't. I guess I'll have to go "full retard" on AMD, hoping they would take a look at this. On a side note, I wonder how SETI project managers and owners don't go politely to AMD and ask them to spend some time on the project. |
Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196 |
I ran a few scans over the last two weeks and combined the data together to give a picture of some rarer cards that might not otherwise have enough valid results to get a picture from. There aren't enough Pascal Titans yet but this scan has some RX Vega parts in mix (sadly it does not seem well optimized for SETI by default). There aren't enough 1070 Ti hosts yet either (2 by my count). I took a stab at hooking up data for Intel IGPs -- the TDP values are probably wrong so don't pay too much attention to the CPWH chart for them -- I was mostly curious about throughput and it's pretty abysmal as you'd expect. I probably won't include those again since the CPWH is misleading. To try to keep the charts manageable I've started omitting some of the older generation cards; I assume that since you're probably looking at this data to help guide setting up machines it's unlikely that you'll be worried about vintage parts but I could be wrong. |
ML1 Send message Joined: 25 Nov 01 Posts: 21019 Credit: 7,508,002 RAC: 20 |
Very good thanks... Eying up for 'Black Friday'... (Hope you're not being 'sponsored' ;-) ;-) ) Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196 |
I ran another scan today and there still aren't enough 1070 Ti's in circulation yet. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I did my part and put in my GTX 1070 Ti 3 weeks ago. Surprised that there are still so few users of the Ti? It is still $100 cheaper than the cheapest 1080. I think even the Black Friday deals couldn't approach the price of the 1070Ti. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
folks over at GPUgrd are reporting 1070ti gpus ... Ed F |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.