GPU FLOPS: Theory vs Reality

Message boards : Number crunching : GPU FLOPS: Theory vs Reality
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 17 · Next

AuthorMessage
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1800336 - Posted: 3 Jul 2016, 14:01:48 UTC

I'm trying to collect data to make the best computation/power-usage choices possible for upgrading my modest farm. I was hoping to get some help to fill in the blanks.

Here's my observed / theoretical performance for my cards on SETI@home tasks:

  • 980 TI ~1000GF / 5632GF (18%)
  • 780 ~650GF / 3977GF (16%)
  • 960 ~385GF / 2308GF (17%)


The theoretical FLOPS is from the Wikipedia entries for the GeForce 700 and 900 series parts and I compared it to the observed FLOPS in a bunch of my completed work-units.

I trawled through recent stats submitted by other people and found one for a GeForce 1080 that suggests the ratio is much higher for those parts: ~2400GF / 8873GF (27%). Could it really be that a 1080 can crunch more than 2x the tasks as a 980Ti? This seems unlikely to me.

If you have a single GeForce 1080 crunching SETI tasks and have more data-points to share I'd really appreciate getting more numbers.

I was also quite excited by the news of AMD's RX 480 because it's a relatively low-power part and priced at a point that makes fitting 2 or 3 of them in a PC cheaper than a single high-end part of lower theoretical performance.

There's just one problem: the theoretical FLOPS on Wikipedia are evidently calculated differently for AMD parts than NVidia parts.

So again, if you have a single RX 480 crunching SETI tasks I'd love to see your work-unit numbers.

ID: 1800336 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6466
Credit: 175,680,382
RAC: 50,578
United States
Message 1800351 - Posted: 3 Jul 2016, 14:30:29 UTC - in response to Message 1800336.  
Last modified: 3 Jul 2016, 14:33:17 UTC

I'm trying to collect data to make the best computation/power-usage choices possible for upgrading my modest farm. I was hoping to get some help to fill in the blanks.

Here's my observed / theoretical performance for my cards on SETI@home tasks:

  • 980 TI ~1000GF / 5632GF (18%)
  • 780 ~650GF / 3977GF (16%)
  • 960 ~385GF / 2308GF (17%)


The theoretical FLOPS is from the Wikipedia entries for the GeForce 700 and 900 series parts and I compared it to the observed FLOPS in a bunch of my completed work-units.

I trawled through recent stats submitted by other people and found one for a GeForce 1080 that suggests the ratio is much higher for those parts: ~2400GF / 8873GF (27%). Could it really be that a 1080 can crunch more than 2x the tasks as a 980Ti? This seems unlikely to me.

If you have a single GeForce 1080 crunching SETI tasks and have more data-points to share I'd really appreciate getting more numbers.

I was also quite excited by the news of AMD's RX 480 because it's a relatively low-power part and priced at a point that makes fitting 2 or 3 of them in a PC cheaper than a single high-end part of lower theoretical performance.

There's just one problem: the theoretical FLOPS on Wikipedia are evidently calculated differently for AMD parts than NVidia parts.

So again, if you have a single RX 480 crunching SETI tasks I'd love to see your work-unit numbers.


I'm not sure using Device peak FLOPS in a task result is the best way to determine the application efficiency. Taking the value displayed in Flopcounter: and diving by the number of seconds the task took might be a more accurate value to compare to the manufacture mex theoretical FLOPs. If you are running multiple tasks per GPU you would also want to correct for that.

I don't see anything in the Radeon 400 series wiki page that points out they will be using a different way to calculate Single Precision FLOPs. For many years (Shaders*2)*clock has been used for Nvidia & Radeon GPUs.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1800351 · Report as offensive
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1800361 - Posted: 3 Jul 2016, 15:15:04 UTC - in response to Message 1800351.  

I'm not sure using Device peak FLOPS in a task result is the best way to determine the application efficiency. Taking the value displayed in Flopcounter: and diving by the number of seconds the task took might be a more accurate value to compare to the manufacture mex theoretical FLOPs.


Ok -- so how about if I took the credit for a task and divided by the run-time? I can go scrape that together and see. That's probably closer to what I want anyway -- Credit/kWh.

If you are running multiple tasks per GPU you would also want to correct for that.


How would I tell if this were happening? BOINC shows at most one task active on my GPU at any given time. This was also why I was hoping to get single-GPU task stats from other people to form a basis of comparison.

I don't see anything in the Radeon 400 series wiki page that points out they will be using a different way to calculate Single Precision FLOPs. For many years (Shaders*2)*clock has been used for Nvidia & Radeon GPUs.


The Nvidia page says "Single precision performance is calculated as 2 times the number of shaders multiplied by the base core clock speed" but the AMD page says "Single precision performance is calculated as based on a FMA operation" which is probably a single-cycle instruction. Same thing? I don't know -- but that's beside the point really: I want work-unit stats so I can work out throughput/electricity estimates!
ID: 1800361 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 240
Credit: 1,509,018
RAC: 104
Australia
Message 1800393 - Posted: 3 Jul 2016, 16:24:53 UTC - in response to Message 1800361.  
Last modified: 3 Jul 2016, 16:25:15 UTC

OK I'll answer your question on credit per kwh, you really can't determine that as credit is assigned differently and what hardware your wingmen has.
For the multiple WU running concurrently that is done by anonymous platform and the possible usage of lunatics app package.
The current correct formula is:
(Shaders*2)*clock speed.
I.e Rx 480 (2304*2)*1120000000 Hz = 5.16096e12 = 5.1609 tflops
ID: 1800393 · Report as offensive
AlProject Donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1387
Credit: 258,079,977
RAC: 457,125
United States
Message 1800397 - Posted: 3 Jul 2016, 16:32:58 UTC - in response to Message 1800336.  


If you have a single GeForce 1080 crunching SETI tasks and have more data-points to share I'd really appreciate getting more numbers..


Check out the bottom of this thread, I just posted a wealth of data from my 1080 FTW card that I have been running for a week now. Hope it is helpful to you.

ID: 1800397 · Report as offensive
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1800407 - Posted: 3 Jul 2016, 16:46:38 UTC

Thanks Al, I think I'm going to try to write a perl script to generate CR/W and I'll see if I can aim it at your stats; it looks like just the data-point I'm after!
ID: 1800407 · Report as offensive
AlProject Donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1387
Credit: 258,079,977
RAC: 457,125
United States
Message 1800417 - Posted: 3 Jul 2016, 17:09:08 UTC - in response to Message 1800407.  

If you would like to PM me your email addy, I will just send you the text file and save you all the work. :-)

ID: 1800417 · Report as offensive
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1800506 - Posted: 3 Jul 2016, 23:10:18 UTC
Last modified: 3 Jul 2016, 23:30:33 UTC

I wrote a quick script for aggregating stats results. It's not fancy but it gave me some perplexing data:

CPUs


GPUs


  • GeForce GTX 980ti ~ 699 cr/h
  • GeForce GTX 780 ~ 607 cr/h
  • GeForce GTX 960 ~ 410 cr/h


I ran Al's machine through the script:



There are some obvious irregularities here:


  • Is my Core2 Quad actually more productive than the hex i7-970?
  • Is Al's 1080 less than half as productive as my 980ti?



I must be missing something.

Edit: I forgot about the i9-970's hyperthreading and corrected the stats. This only leaves the mystery of the GTX 1080

ID: 1800506 · Report as offensive
Profile Brent Norman
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1821
Credit: 105,505,602
RAC: 449,638
Canada
Message 1800510 - Posted: 3 Jul 2016, 23:47:59 UTC - in response to Message 1800506.  

If you are using Users' stats page to calculate runtimes - They are wrong.

You have no idea how many tasks an individual is running simultaneously. Greatly affecting the times shown.
ID: 1800510 · Report as offensive
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1800514 - Posted: 4 Jul 2016, 0:10:19 UTC - in response to Message 1800510.  

If you are using Users' stats page to calculate runtimes - They are wrong.

You have no idea how many tasks an individual is running simultaneously. Greatly affecting the times shown.

Ok well I know how many tasks I'm running (1 GPU) and it looks Al is running 4 at once for some reason so I can account for that (so maybe 1028 cr/hr for his GTX 1080 which is consistent).

This is why I'm looking for people with other hardware to help me build a picture here -- if anyone is running the default of one task at once on a 1080, RX 480 or R9 Nano I'd love to analyze your stats!
ID: 1800514 · Report as offensive
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1800521 - Posted: 4 Jul 2016, 0:54:02 UTC

I scrolled through the top 250 hosts to fish out host IDs for single-GPU systems and ran them through my script; the results were remarkably consistent!



Sadly all the AMD GPUs were named by only their series name (Hawaii, Fiji, etc) so I couldn't cross-reference the observed results with a specific video-card. It would be super helpful if I could have people call out their single-GPU hosts with the model of their card so I can fill in the blanks.
ID: 1800521 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 9634
Credit: 5,114,600
RAC: 135
United States
Message 1800544 - Posted: 4 Jul 2016, 2:21:30 UTC

I have a AMD Radeon FX 260X GPU (Bonaire). Feel free to check my results.

ID: 1800544 · Report as offensive
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1800548 - Posted: 4 Jul 2016, 2:28:00 UTC - in response to Message 1800544.  

I have a AMD Radeon FX 260X GPU (Bonaire). Feel free to check my results.


Thanks! It looks like you're getting about 260 CR/h which seems low. How many GPU tasks do you have running at once?
ID: 1800548 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 9634
Credit: 5,114,600
RAC: 135
United States
Message 1800550 - Posted: 4 Jul 2016, 2:33:51 UTC - in response to Message 1800548.  
Last modified: 4 Jul 2016, 2:41:17 UTC

I have a AMD Radeon FX 260X GPU (Bonaire). Feel free to check my results.


Thanks! It looks like you're getting about 260 CR/h which seems low. How many GPU tasks do you have running at once?

One WU.
Three CPU.



ID: 1800550 · Report as offensive
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 103,645,519
RAC: 232,860
Taiwan
Message 1800558 - Posted: 4 Jul 2016, 3:31:18 UTC - in response to Message 1800514.  

If you are using Users' stats page to calculate runtimes - They are wrong.

You have no idea how many tasks an individual is running simultaneously. Greatly affecting the times shown.

Ok well I know how many tasks I'm running (1 GPU) and it looks Al is running 4 at once for some reason so I can account for that (so maybe 1028 cr/hr for his GTX 1080 which is consistent).

This is why I'm looking for people with other hardware to help me build a picture here -- if anyone is running the default of one task at once on a 1080, RX 480 or R9 Nano I'd love to analyze your stats!

My main system has 4 Nano each running only one task at a time. I did have a system crash and issues with Crimson 16.6.1, so only got it stable again last night. Really look forward to see how the Nano compares. I chose them since HBM should give a power advantage.
YouTube Channel: Rick's Performance Computing
ID: 1800558 · Report as offensive
AlProject Donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1387
Credit: 258,079,977
RAC: 457,125
United States
Message 1800575 - Posted: 4 Jul 2016, 4:56:45 UTC - in response to Message 1800521.  

Shaggie, just for complete accuracy, in your chart, you may want to add FTW to the 1080, it isn't a massive difference, but it should probably be noted, as it is a factory overclocked version, which I have then bumped up even higher. I am still amazed at how cool it is running compared to my 980Ti, night and day. I wonder if I have a bit more headroom on it to OC it some more? I thought that heat was the limiting factor when overclocking, and if so, it appears there is more room to run. But, I don't feel the need to go nuts with it, I would like it to do long term, reliable crunching service for me. Now that these are starting to appear out in the real world, I'll have to do a little looking around to see what others experiences are and compare notes.

ID: 1800575 · Report as offensive
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1800639 - Posted: 4 Jul 2016, 11:51:50 UTC - in response to Message 1800558.  

My main system has 4 Nano each running only one task at a time. I did have a system crash and issues with Crimson 16.6.1, so only got it stable again last night. Really look forward to see how the Nano compares. I chose them since HBM should give a power advantage.


Thanks! The script puts that at about 1000 cr/hr for each card which is pretty amazing!
ID: 1800639 · Report as offensive
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 103,645,519
RAC: 232,860
Taiwan
Message 1800652 - Posted: 4 Jul 2016, 13:31:00 UTC - in response to Message 1800639.  

And it has a TDP of only 175W! It will be thermally throttled with the original cooling solution, so they have to be waterblock'ed.
YouTube Channel: Rick's Performance Computing
ID: 1800652 · Report as offensive
Profile Shaggie76Project Donor
Avatar

Send message
Joined: 9 Oct 09
Posts: 243
Credit: 85,547,729
RAC: 225,516
Canada
Message 1800671 - Posted: 4 Jul 2016, 14:39:30 UTC - in response to Message 1800652.  

Yeah I think I actually found your video about that before you'd mentioned -- it's pretty awesome but I'm kind of leery about doing that much surgery myself. I'm surprised nobody is selling R9 Nanos with waterblocks pre-installed like the ASUS ROG Poseidons.
ID: 1800671 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7471
Credit: 91,022,221
RAC: 11,305
Australia
Message 1800674 - Posted: 4 Jul 2016, 14:43:58 UTC - in response to Message 1800671.  

Yeah I think I actually found your video about that before you'd mentioned -- it's pretty awesome but I'm kind of leery about doing that much surgery myself. I'm surprised nobody is selling R9 Nanos with waterblocks pre-installed like the ASUS ROG Poseidons.


Isn't that basically what a fury X is ? (I don't actually know, just what I thought it was...)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1800674 · Report as offensive
1 · 2 · 3 · 4 . . . 17 · Next

Message boards : Number crunching : GPU FLOPS: Theory vs Reality


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.