amd gflops: theoretical vs. "real world"

Message boards : Number crunching : amd gflops: theoretical vs. "real world"
Message board moderation

To post messages, you must log in.

AuthorMessage
merle van osdol

Send message
Joined: 23 Oct 02
Posts: 809
Credit: 1,980,117
RAC: 0
United States
Message 1738822 - Posted: 1 Nov 2015, 16:35:24 UTC
Last modified: 1 Nov 2015, 16:38:26 UTC

Wikipedia has a page for amd r9 200 series cards (it includes r7). In there you will find a comparison of each cards speeds according to single and double precision gflops (theoretical).

I remember someone last year, perhaps Hal, saying that there is a big difference between theoretical and actual gflops. Where do I find out where the actual gflops are detailed. I mean like a nice tabulated table where you can make an intelligent decision on what card to get. I suppose too that it also depends on the mfg. eg. sapphire, gigabyte, etc. Is there such a place? I usually buy Sapphire but I have one machine (I bought used) where I have gigabyte and I try to keep each machine all from the same mfg. When I first started crunching I had been buying XFX.

The reason I ask is because I have an r7 265 that is just as fast as my r9 270x.
I don't recall the mfg.


Thanks
merle - vote yes for freedom of speech
ID: 1738822 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1738837 - Posted: 1 Nov 2015, 17:55:59 UTC - in response to Message 1738822.  

That depends on what applications you wish to compare for that hardware.

For example for s@h, look at what RAC others get for your card. For games, compare the games benchmarks (anything above 50fps should be more than fast enough).

And if you take a look at some of the primegrid RACs, that is likely as close as you can get to theoretical 100% performance utilisation.


Let us know what you find!

Happy fast crunchin
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1738837 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1738844 - Posted: 1 Nov 2015, 18:17:20 UTC - in response to Message 1738822.  
Last modified: 1 Nov 2015, 18:19:36 UTC

Hi Merle,
Yeah the theoretical peak GFlops ratings do apply directly only in pretty controlled laboratory conditions, that don;t really match practical code.

For most of the computation on this project, that nature of the tasks being mostly single precision, and on the biggest/fastest GPUs somewhat memory bound in large parts of computation, a reasonable starting ballpark guesstimate is around 5% (1/20th) of the theoretical peak.

For a better figure, you can look inside a task file , at the wu_rsc_fpops_est (or similar, I forget the exact tag name at the moment) figure, and divide that by the elapsed time. It's not a perfect figure for a number of reasons, mainly that estimate is generic and doesn't specifically cater to the way a given GPU application might process differently than another, and some parts of CPU time will be contaminating the reading, but it should reflect actual consistently enough to extract a more precise figure than the ballpark 5% starting figure. There will also be minor variation depending on the data content and running conditions, so gathering enough runs of different tasks to build a picture of average and variance might be a better method that single runs.

In general you should see elapsed times of the least polluted runs approach an imaginary best possible time, which you can use for best case comparisons.

I have some designs put away for a simpler gui based tool and online database submission for generic and specific tests, though that's on the shelf until after a major Cuda multibeam update, and work lets up a bit.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1738844 · Report as offensive
Profile Graham Middleton

Send message
Joined: 1 Sep 00
Posts: 1520
Credit: 86,815,638
RAC: 0
United Kingdom
Message 1738865 - Posted: 1 Nov 2015, 19:25:47 UTC

I always view GFLOPS, FLOPS, etc as extensions of the old MIPS acronym for Meaningless Indicator of Processor Speed.


:-)
Happy Crunching,

Graham

ID: 1738865 · Report as offensive
merle van osdol

Send message
Joined: 23 Oct 02
Posts: 809
Credit: 1,980,117
RAC: 0
United States
Message 1739703 - Posted: 4 Nov 2015, 20:56:48 UTC - in response to Message 1738837.  

That depends on what applications you wish to compare for that hardware.

For example for s@h, look at what RAC others get for your card. For games, compare the games benchmarks (anything above 50fps should be more than fast enough).

And if you take a look at some of the primegrid RACs, that is likely as close as you can get to theoretical 100% performance utilisation.


Let us know what you find!

Happy fast crunchin
Martin


Under tasks it says device peak gflops for devices 250x and 270x are both the same at 211.73. Makes no sense at all?
merle - vote yes for freedom of speech
ID: 1739703 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1739769 - Posted: 5 Nov 2015, 0:50:06 UTC - in response to Message 1739703.  

That depends on what applications you wish to compare for that hardware.

For example for s@h, look at what RAC others get for your card. For games, compare the games benchmarks (anything above 50fps should be more than fast enough).

And if you take a look at some of the primegrid RACs, that is likely as close as you can get to theoretical 100% performance utilisation.


Let us know what you find!

Happy fast crunchin
Martin


Under tasks it says device peak gflops for devices 250x and 270x are both the same at 211.73. Makes no sense at all?

I'm sure others (such as Jason or Ageless or others) can comment better than me for Boinc... ;-)

If you have two GPU cards in the same host, then Boinc reports only the Boinc number for the first card found. All your other GPUs are assumed to be the same.

For a mixed bunch, could give cause for some confused scheduling numbers!



Happy fast crunchin
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1739769 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1739843 - Posted: 5 Nov 2015, 8:35:24 UTC - in response to Message 1739769.  
Last modified: 5 Nov 2015, 8:39:18 UTC

There's that, and that I see some call on the boinc developer mailing lists related to computing estimated peak flops on generic OpenCL devices that don't support Cuda (which has a 'relatively' consistent API). I don't have a response for them yet, but do have some ideas how it might be achieved (good enough for government work at least). Will have to sort through those ideas down the road and check if they already came up with something workable
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1739843 · Report as offensive
merle van osdol

Send message
Joined: 23 Oct 02
Posts: 809
Credit: 1,980,117
RAC: 0
United States
Message 1739879 - Posted: 5 Nov 2015, 12:29:02 UTC - in response to Message 1739843.  

There's that, and that I see some call on the boinc developer mailing lists related to computing estimated peak flops on generic OpenCL devices that don't support Cuda (which has a 'relatively' consistent API). I don't have a response for them yet, but do have some ideas how it might be achieved (good enough for government work at least). Will have to sort through those ideas down the road and check if they already came up with something workable


One more reason for me to switch to Nividia I guess.
merle - vote yes for freedom of speech
ID: 1739879 · Report as offensive
ChrisD
Volunteer tester

Send message
Joined: 25 Sep 99
Posts: 158
Credit: 2,496,342
RAC: 0
Denmark
Message 1740883 - Posted: 9 Nov 2015, 17:36:46 UTC
Last modified: 9 Nov 2015, 17:53:08 UTC

Have You tried to locate a machine with just one Graphics Card model 270X, for example, and checked how fast that machine crunches MB WU's?

One of my crunchers have a HD7970 Graphics card, and this card turns out MB WU's in 4+ or 9 Mins approximately.

How many GFlops? I am not sure, but by comparing the throughput of different cards, some kind of performance table should be constructable.

Btw. Why do You want to buy an NVidia Card? What is wrong with Your ATI?? :)

ChrisD

edit:

just checked the user next to me in the list. He uses an NVidia Card.
WU 4507948519 is processed in 23 minutes and was awarded 93 credits.

my Tahiti cruncher processed WU 4508938169 in 8 mins 40 secs and was awarded 84 credits.

my Hawaii equipped cruncher processed WU 4509138835 in 7 mins and 45 secs and was awarded 99 credits.

You do the Math ;)
ID: 1740883 · Report as offensive
catavalon21

Send message
Joined: 2 Nov 01
Posts: 13
Credit: 7,238,152
RAC: 48
United States
Message 1746385 - Posted: 2 Dec 2015, 23:09:37 UTC - in response to Message 1740883.  

I guess I am surprised at how SETI credits do seem to be less than many other projects, though I understand it depends on what processing actually is taking place.

I have a stock HD7850, and on some (likely integer only) apps it gets more than one unit of credit per second (greater than 3600 per hour of run time). For S@H it is significantly less.

For this card running S@H V7, recent WUs have taken roughly 698 seconds to complete (run time, not cpu time), and grant 45 or so units of credit. For a GPU that is running at 90% saturation (with practically nothing else running, THOUGH it's an older Core 2 Duo 6550 @2.33 GHz), it just must not be doing something that SETI likes.

Milkyway@home, which supposedly likes double precision, same box generates points FAR better (and more points per minute of run time) than GTX 760 in wife's box with a E8500.

Still stuck on SETI, but ...
ID: 1746385 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1746405 - Posted: 3 Dec 2015, 1:21:00 UTC - in response to Message 1746385.  

I guess I am surprised at how SETI credits do seem to be less than many other projects, though I understand it depends on what processing actually is taking place.

Seti uses creditnew most other projects just award credit per task and many of them are inflated. If you want max credit Seti is not the place to be.
ID: 1746405 · Report as offensive

Message boards : Number crunching : amd gflops: theoretical vs. "real world"


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.