GPU FLOPS: Theory vs Reality

Message boards : Number crunching : GPU FLOPS: Theory vs Reality
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1870424 - Posted: 31 May 2017, 23:58:51 UTC - in response to Message 1870420.  

Three of these?
Looks like you are getting about 22k per card. So three would give me 66k?

Wow they're a lot shorter than mine, but they'll work.

Cheers.
ID: 1870424 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1870430 - Posted: 1 Jun 2017, 0:18:00 UTC - in response to Message 1870411.  
Last modified: 1 Jun 2017, 0:18:32 UTC

I was going to tell him to go with the 6GB 1060 but I don't own either so take that for what it's worth, lol...

I've spoken with several people here who also run 1060's about the memory size difference and unless you're gaming with them you'll see no difference with them here (it maybe different with other projects though), in fact most of those with the 6GB versions now wish that they'd with the cheaper option. ;-)

Cheers.

On paper the 6GB version with 1280 cores vs the 3GB version with 1152 cores has 11% more performance, but is about 25% more expensive.
So in GFLOPs per $ the 3GB wins out. In real world performance the 11% more performance would probably be closer to 5% too.
GPU		GFLOPS	MSRP	TDP	GFLOPS/$	GFLOPS/Watt
GT  1030	942	$70	30	13.457143	31.40
GTX 1050	1733	$109	75	15.899083	23.11
GTX 1050 Ti	1981	$139	75	14.251799	26.41
GTX 1060 3GB	3470	$199	120	17.437186	28.92
GTX 1060 6GB	3855	$249	120	15.481928	32.13
GTX 1070	5783	$379	150	15.258575	38.55
GTX 1080	8228	$599	180	13.736227	45.71
GTX 1080 Ti	10609	$699.00	250	15.177396	42.44

Given the base specs the 1060 3GB seems like a better overall choice than the 1050 ti for the most efficient cruncher.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1870430 · Report as offensive
Profile Carlos
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 29753
Credit: 57,275,487
RAC: 157
United States
Message 1870463 - Posted: 1 Jun 2017, 2:05:08 UTC

Cool thanks guys. That really helps.
ID: 1870463 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1872353 - Posted: 11 Jun 2017, 2:20:27 UTC

It's been a while since I ran a scan and there are enough 1080 Ti's in circulation now to get a sense of how fast they are:

ID: 1872353 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1872356 - Posted: 11 Jun 2017, 2:37:05 UTC - in response to Message 1872353.  

It's been a while since I ran a scan and there are enough 1080 Ti's in circulation now to get a sense of how fast they are:

And, wow!
Imagine what a TitanXp can do.

After all this time, it's interesting to see the GTX 750Ti still in the top section of the Credit/WH list. And it's good the see the Radeon RX470 and 460/480s have improved AMD's Credit/WH position considerably.
Grant
Darwin NT
ID: 1872356 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1872358 - Posted: 11 Jun 2017, 2:48:36 UTC - in response to Message 1870407.  

Assume that I will put $700 in to upgrading. What would give me the best credit per hour? I can only fit 3 cards into my main cruncher. Suggestions?

I'd suggest 3x 3GB 1060's, but I'm probably a bit biased there (just be sure to get proper 2 slot design jobs as quite a few around now are 2.1-2.3 designs are won't fit together in 2 slot spacings). ;-)

Cheers.


. . Yep, they are very good producers. But maybe 3 of the single slot 1050tis would give him very good bang for his buck!

. . Also the cooler width would be a non-event if he chose water cooling :)

Stephen

:)
ID: 1872358 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1872359 - Posted: 11 Jun 2017, 2:50:59 UTC - in response to Message 1870408.  

I was going to tell him to go with the 6GB 1060 but I don't own either so take that for what it's worth, lol...


. . I have a brace of 1060-6GB and Wiggo's 1060-3GB do very well by comparison. For crunching (and bang for your buck) they would be the better value.

Stephen

:)
ID: 1872359 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1872361 - Posted: 11 Jun 2017, 2:55:04 UTC - in response to Message 1870411.  

I was going to tell him to go with the 6GB 1060 but I don't own either so take that for what it's worth, lol...

I've spoken with several people here who also run 1060's about the memory size difference and unless you're gaming with them you'll see no difference with them here (it maybe different with other projects though), in fact most of those with the 6GB versions now wish that they'd with the cheaper option. ;-)

Cheers.


. . It was not the memory difference that lead me to the 6GB versions but the extra CU and 128 more CUDA cores. That should have made them much more productive. But in the real world there seems to be little benefit from them. So for the dollar value just for crunching the 3GB would be the go.

Stephen

<shrug>
ID: 1872361 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1872362 - Posted: 11 Jun 2017, 3:00:19 UTC - in response to Message 1870420.  
Last modified: 11 Jun 2017, 3:01:25 UTC

Three of these?
Looks like you are getting about 22k per card. So three would give me 66k?


. . That would make a pretty hefty crunching machine. But remember they draw about 80W to 90W each at full crunch, probably a bit more than double your 750ti's, so be sure your PSU can cope. The good thing is that, unlike many of the more powerful GPUs, they only require one external power connector each so your current hardware can probably support them.

Stephen

:)
ID: 1872362 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1872364 - Posted: 11 Jun 2017, 3:03:54 UTC - in response to Message 1870424.  

Three of these?
Looks like you are getting about 22k per card. So three would give me 66k?

Wow they're a lot shorter than mine, but they'll work.

Cheers.


. . The benefit(?) of single fan design over twin fan.

Stephen

:)
ID: 1872364 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1873135 - Posted: 15 Jun 2017, 13:46:07 UTC - in response to Message 1872353.  

Hey Shaggie, is there by chance anyway to pull Linux cuda results out of your dataset for a chart?
ID: 1873135 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1873154 - Posted: 15 Jun 2017, 15:20:25 UTC - in response to Message 1873135.  

Hey Shaggie, is there by chance anyway to pull Linux cuda results out of your dataset for a chart?

I'm guessing you mean Petri's special app and want to know just how much faster it is.

As I've said before including the anonymous platform would defeat the purpose of this comparison; I deliberately filter for only the stock app running one job at a time so that you can make meaningful comparisons and get a sense of the relative performance and power consumption for each.

The other problem with the anonymous platform is that not clear how many jobs are being run concurrently per card; the regular CUDA app only really performs if you double or triple-job it it but the data I have to work with can't see the concurrency so I can't tell if it's 'really slow' (because concurrent), really fast (because Petri's app), or just a normal (Lunatics build). People running stock tend not to mess around with multiple jobs so those that do are eliminated as outliers by the median-window (plus there's a clue in the output from the OpenCL app that I can use to sometimes detect when they're doubling up so I can reject them).

I'm also opposed to encouraging what I see as basically cheating -- if Petri's app isn't accurate enough for everybody to use then the extra credit it awards those that use it comes at the extra validation cost of those of us running stock who have to double (and possibly triple) check the work that it does.

When it's part of the stock app set I'll be happy to report on the relative performance of the OpenCL SoG vs CUDA apps (as I've done before).
ID: 1873154 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1873159 - Posted: 15 Jun 2017, 15:50:58 UTC - in response to Message 1873154.  
Last modified: 15 Jun 2017, 15:56:16 UTC

It's really not as bad as one might think. Looking at your computers you're at 3-5% inconclusive, and the 2 computers I looked throuh did not have 1 anonymous platform in the list. is seems quite typical for the latest CUDA8 app to be in the 4-7% range. So it really is not far off the mark from stock apps/Lunatics.

Heck my Astropulse is sitting at 10.8% right now, and those are very well accepted apps.

It would just be nice to see a Linux cuda8 comparison vs openCL SoG. But if it is not easy, I understand.
ID: 1873159 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1873199 - Posted: 15 Jun 2017, 20:01:55 UTC - in response to Message 1873154.  

Hey Shaggie, is there by chance anyway to pull Linux cuda results out of your dataset for a chart?

I'm guessing you mean Petri's special app and want to know just how much faster it is.

As I've said before including the anonymous platform would defeat the purpose of this comparison; I deliberately filter for only the stock app running one job at a time so that you can make meaningful comparisons and get a sense of the relative performance and power consumption for each.

The other problem with the anonymous platform is that not clear how many jobs are being run concurrently per card; the regular CUDA app only really performs if you double or triple-job it it but the data I have to work with can't see the concurrency so I can't tell if it's 'really slow' (because concurrent), really fast (because Petri's app), or just a normal (Lunatics build). People running stock tend not to mess around with multiple jobs so those that do are eliminated as outliers by the median-window (plus there's a clue in the output from the OpenCL app that I can use to sometimes detect when they're doubling up so I can reject them).

I'm also opposed to encouraging what I see as basically cheating -- if Petri's app isn't accurate enough for everybody to use then the extra credit it awards those that use it comes at the extra validation cost of those of us running stock who have to double (and possibly triple) check the work that it does.

When it's part of the stock app set I'll be happy to report on the relative performance of the OpenCL SoG vs CUDA apps (as I've done before).


And to the cheating.. I'm doing that. I do not have 16 1080Tu graphics cards. I have only 4: 3x1080+1x1080Ti.
But It would still be interesting to know, since the top 10 hosts is full of linux anonymous apps, how do they perform...
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1873199 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1873202 - Posted: 15 Jun 2017, 20:12:12 UTC - in response to Message 1873199.  

Hey Petri,
I was about to PM you. I was wondering if you thought it would be a good idea to submit the Linux zi3v to Beta. Along with zi3t2b it appears to be well within the 5% Inconclusive rate requested by the project. Looking over a few Hosts it would appear the Gross rate is around 3.5% with the Net Inconclusive rate a bit lower. So far the only problem is that zi3v uses a little more vRam and I'm seeing problems on my Mac again with the 2 GB card.
Any Ideas?
ID: 1873202 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1873204 - Posted: 15 Jun 2017, 20:29:04 UTC - in response to Message 1873202.  

Hey Petri,
I was about to PM you. I was wondering if you thought it would be a good idea to submit the Linux zi3v to Beta. Along with zi3t2b it appears to be well within the 5% Inconclusive rate requested by the project. Looking over a few Hosts it would appear the Gross rate is around 3.5% with the Net Inconclusive rate a bit lower. So far the only problem is that zi3v uses a little more vRam and I'm seeing problems on my Mac again with the 2 GB card.
Any Ideas?


The t2b is kind of an original. It is my code and it does not try to recheck the pulses.
The zi3v scans the Wu and if it finds any suspects it runs the part pf the wu again with unroll 1. That idea came from jason_gee. I tried and coded it and That is what I'm running now. It may be more accurate and a bit slower. Just keep testing.

To tell you all,

I'd like to stay as a developer/experimenter/propel hat/tin foil hat escapee/a man; and let the others do the political decisions. I release my code and you can do what ever you want to.

This is a hobby for me. I'd like to keep it that way. I was a SW/DB engineer for 20 years. Now I'm a teacher and a teacher for children/adults with special needs.

So TBar, it is entirely up to You. A <5% level is good enough. You decide. I'll provide the updates when I feel to.

Thank you TBar for all the testing.
p.s. I read that there is a V9 MB coming. I'll wait for that and do whatever is needed.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1873204 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1873223 - Posted: 15 Jun 2017, 23:32:15 UTC - in response to Message 1873204.  

I'd like to stay as a developer/experimenter/propel hat/tin foil hat escapee/a man; and let the others do the political decisions. I release my code and you can do what ever you want to.

This is totally fine (and appreciated!) -- I'm just a little vexed at the people's enthusiasm for the glory of more internet points rather than getting your version finished and certified to be in the stock set by checking their inconclusives and getting diagnostics to make it conform.
ID: 1873223 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1873260 - Posted: 16 Jun 2017, 4:09:38 UTC
Last modified: 16 Jun 2017, 4:10:48 UTC

Even if it doesn't make it as a stock application (does it run on pre Maxwell or pre Kepler hardware, and minimum VRAM requirements?), it would be good if it were available for general use under Anonymous Platform for all OSs. But it does need to keep the Inconclusives below 5% to be able to make it available for general use.
If the current version is good for less than 5% Inconclusives, it would be nice to see a Windows version made available for some testing to see if under the many versions of Windows and the many versions of video drivers it's able to keep the Inconclusives below that 5% threshold.
Grant
Darwin NT
ID: 1873260 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1873759 - Posted: 18 Jun 2017, 7:11:39 UTC - in response to Message 1873260.  
Last modified: 18 Jun 2017, 7:12:28 UTC

Even if it doesn't make it as a stock application (does it run on pre Maxwell or pre Kepler hardware, and minimum VRAM requirements?), it would be good if it were available for general use under Anonymous Platform for all OSs. But it does need to keep the Inconclusives below 5% to be able to make it available for general use.
If the current version is good for less than 5% Inconclusives, it would be nice to see a Windows version made available for some testing to see if under the many versions of Windows and the many versions of video drivers it's able to keep the Inconclusives below that 5% threshold.


Those are the current rubs for stock, mainly Boinc server limitations on distribution side. That's where I step in when I can. I'm confident most of the refinements can be propagated back through the generations, with varying levels of benefit. With the majority of validation concerns apparently addressed, that helps a lot. in the meantime It'll be suitable for 'Advanced-User' anonymous platform distribution until appropriate dispatch code can be embedded to support all Cuda devices at some level. Once it does though, options open up for, in no particular order, stock distribution (via Beta test first), retooling for Cuda 9 inclusion, and then incorporating some more modern feature recognition methods.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1873759 · Report as offensive
Previous · 1 . . . 14 · 15 · 16 · 17

Message boards : Number crunching : GPU FLOPS: Theory vs Reality


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.