GPU Wars 2016: GTX 1050 Ti & GTX 1050: October 25th

Message boards : Number crunching : GPU Wars 2016: GTX 1050 Ti & GTX 1050: October 25th
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 19 · Next

AuthorMessage
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1788693 - Posted: 19 May 2016, 13:51:36 UTC - in response to Message 1788691.  

That sounds like a great plan if it can be achieved. Go Man Go! ;-)

ID: 1788693 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1788698 - Posted: 19 May 2016, 14:01:54 UTC - in response to Message 1788693.  
Last modified: 19 May 2016, 14:03:35 UTC

That sounds like a great plan if it can be achieved. Go Man Go! ;-)


I'm doing some testing and trying and I think that Jason will do the whole thing right.

You can get a sneak peek preview here. It is an ar 0.42 task in 164 seconds. There are some high ar tasks that take less than 60 seconds. The guppi vlars take about 700 seconds and need some more optimizing.

Oh how I wish I could get one of those GTX1080's ....
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1788698 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1788709 - Posted: 19 May 2016, 14:28:52 UTC - in response to Message 1788698.  
Last modified: 19 May 2016, 14:31:08 UTC

That sounds like a great plan if it can be achieved. Go Man Go! ;-)


I'm doing some testing and trying and I think that Jason will do the whole thing right.

You can get a sneak peek preview here. It is an ar 0.42 task in 164 seconds. There are some high ar tasks that take less than 60 seconds. The guppi vlars take about 700 seconds and need some more optimizing.

Oh how I wish I could get one of those GTX1080's ....


Yeah, watching PC Perspective's Interview with Tom Petersen (nVidia engineer) now.
Probably the extra planning now (frustrating as a wait is) will pay the best dividends.

Definitely looks like the streaming and configurability will be the way to go.

If you get a chance to watch it, there's a part where he talks about 'fastsync', where he goes into how (for graphics) Vsync-On creates latency through backpressure, similar as we see, while Vsync-Off has tearing, and their fastsync thing lets the engine run as fast as it wants while the display syncs. While discarding frames as with their fastsync, isn't an option for compute, it did give some great hints on the buffer management and ways to relieve the slowdown. Probably we can just stick PID controls around launches and scale GPU load to whatever setpoint we like, before initiating a buffer swap and CPU side reduction. Uncoupling from the kernel level sync, and scaling to the right frequency for the system Should reduce CPU load for the same result, and add a 'free' throttle for the GPU.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1788709 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19062
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1788911 - Posted: 20 May 2016, 5:54:16 UTC

Nvidia have put details up on their site, you can set it so they "Notify Me" when the card of your choice is available in your area.
Nvidia 1080
Choose 1070 in products if that is the GPU you wish for.
ID: 1788911 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1788926 - Posted: 20 May 2016, 7:07:33 UTC - in response to Message 1788911.  
Last modified: 20 May 2016, 7:09:35 UTC

Just to add to Grant's post, you can also go here to see the specs as well:

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_Series

The thing that concerns me with the new cards is the memory bandwidth.
ID: 1788926 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1788930 - Posted: 20 May 2016, 7:26:33 UTC - in response to Message 1788926.  
Last modified: 20 May 2016, 7:31:55 UTC

The thing that concerns me with the new cards is the memory bandwidth.

The GTX 1080 is slightly less than the GTX 980Ti (320GB/s v 336.5 GB/s) but it is considerably more than the GTX 980 (224GB/s) which is the card it is replacing.
A 43% improvement is a good thing IMHO.

EDIT- although the GTX 1070 memory bandwidth is only 14% better than the GTX 970 (256GB/s v 224GB/s).
Grant
Darwin NT
ID: 1788930 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1788931 - Posted: 20 May 2016, 7:39:34 UTC - in response to Message 1788930.  

I understand where you are coming from but for the single precision speed of the gpu, in combination with the cuda cores, it seems somewhat chocked.

The 980 Ti has 5632 GFlops and 336.5 GB/s; the 1080 has 8228 GFlops and 320 GB/s.

It's circa 60% increase in GFlops but basically the same bandwidth.

I hope I am wrong but I get the feeling that these cards are not going to be as great as some are hoping. Mind you, it does depend on where you're coming from. By that I mean whether you may be coming up the scale or looking at a like to like type thing.
ID: 1788931 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1788933 - Posted: 20 May 2016, 7:56:46 UTC - in response to Message 1788931.  

A few complications. First, from what I've seen, the rated GBps seems to be at base memory clocks, and there seems to be quite some headroom in the boost logic (even before OC)

Second, I'm mostly concerned with the cache architecture, for which I've yet to see any details on sizes etc. Signs are pointing to considerable reduced latencies right through the core. That should amount to simple Cuda kernels being able to leverage more bandwidth than the ~80% theoretical the earlier gens achieve with the best non-overlapped streams, so less thrashing, we'll see.

Last, since heavy use of memory compression became a thing, from few clues in gaming benches etc, it looks like they've improved that considerably in Pascal. That means GBps using the traditional spec formula, may not be particularly accurate depending on the data going through it.

It's going to be tough to make myself sit on my wallet for a bit
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1788933 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1788934 - Posted: 20 May 2016, 7:59:31 UTC - in response to Message 1788931.  

I hope I am wrong but I get the feeling that these cards are not going to be as great as some are hoping. Mind you, it does depend on where you're coming from. By that I mean whether you may be coming up the scale or looking at a like to like type thing.

Maxwell was meant to be released on 22nm but had to be re-done for release on 28nm.
Pascal was designed specifically for 16/14 nm and was able to be released on that process node. In addition to that, it's architecture was based on the lessons learned from Maxwell so not only does it have the benefit of the much smaller process node, but it also has the benefit of architectural tweaks over Maxwell, in addition to some significant new features with regards to it's programmability.

As things stand for Seti, I suspect that the main benefit will be the faster clock speeds & bandwidth as well as it's much faster context switching.
Like Maxwell, it will require an application that makes full use of it's abilities in order to really see just what the hardware is really capable of.
Grant
Darwin NT
ID: 1788934 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1788935 - Posted: 20 May 2016, 8:06:33 UTC - in response to Message 1788934.  
Last modified: 20 May 2016, 8:10:20 UTC

I hope I am wrong but I get the feeling that these cards are not going to be as great as some are hoping. Mind you, it does depend on where you're coming from. By that I mean whether you may be coming up the scale or looking at a like to like type thing.

Maxwell was meant to be released on 22nm but had to be re-done for release on 28nm.
Pascal was designed specifically for 16/14 nm and was able to be released on that process node. In addition to that, it's architecture was based on the lessons learned from Maxwell so not only does it have the benefit of the much smaller process node, but it also has the benefit of architectural tweaks over Maxwell, in addition to some significant new features with regards to it's programmability.

As things stand for Seti, I suspect that the main benefit will be the faster clock speeds & bandwidth as well as it's much faster context switching.
Like Maxwell, it will require an application that makes full use of it's abilities in order to really see just what the hardware is really capable of.


Probably tomorrow I start gathering my various test pieces, so that there is some sortof organised pack for whoever the lucky first to get one is, can extract some meaningful numbers. In all, I'm expecting that simpler kernels do better than some of the hoops that had to be jumped through for Pre-Fermi and Fermi. [i.e. fewer architectural quirks with each generation]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1788935 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1788936 - Posted: 20 May 2016, 8:14:37 UTC - in response to Message 1788935.  

Probably tomorrow I start gathering my various test pieces, so that there is some sortof organised pack for whoever the lucky first to get one is, can extract some meaningful numbers.

Good to hear.
Grant
Darwin NT
ID: 1788936 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1788937 - Posted: 20 May 2016, 8:15:48 UTC - in response to Message 1788936.  

Probably tomorrow I start gathering my various test pieces, so that there is some sortof organised pack for whoever the lucky first to get one is, can extract some meaningful numbers.

Good to hear.

+1
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1788937 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1788941 - Posted: 20 May 2016, 8:27:17 UTC - in response to Message 1788936.  
Last modified: 20 May 2016, 8:34:06 UTC

Probably tomorrow I start gathering my various test pieces, so that there is some sortof organised pack for whoever the lucky first to get one is, can extract some meaningful numbers.

Good to hear.


Yeah, it's a bit frustrating guessing from dodgy marketing slides and gaming benchmarks. Most probably I have enough code floating around to compare achievable memory bandwidth, compute, and latencies with code of varying complexity (simple reference through my Fermi+Kepler class to Petri-Style cudastreams+handKernels), just a matter of finding all the bits and putting them together. If the simplest code works best in some cases, it would be a sign of massive architectural improvements (which we're not really expecting over Maxwell). Existing complex code winning would be an indicator they focussed more on raw performance than efficiency [and ease of implementation].

Comparison example from CPU history: Core2 memory access over a large array requires only a simple loop, as hardware is there to trigger and feed the simple code. Pentium4, on the other hand, requires nested loops of block prefetch code to strike anywhere near its capability. Naturally not much hand Pentium4 specific code of that calibre was implemented, because it's very time consuming to do [and very cpu specific].
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1788941 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1788945 - Posted: 20 May 2016, 8:40:57 UTC - in response to Message 1788941.  

If the simplest code works best in some cases, it would be a sign of massive architectural improvements (which we're not really expecting over Maxwell).

But Maxwell over Kepler...?
Grant
Darwin NT
ID: 1788945 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1788948 - Posted: 20 May 2016, 8:50:32 UTC - in response to Message 1788945.  

If the simplest code works best in some cases, it would be a sign of massive architectural improvements (which we're not really expecting over Maxwell).

But Maxwell over Kepler...?


Maxwell over Kepler was largely incremental, mostly efficiency related, rather than raw horsepower. The last architectural leap that looked this large in raw performance numbers, was Kepler over Fermi.

It's more of a continuum than that, although does seem to tie more to process nodes than anything.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1788948 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1788953 - Posted: 20 May 2016, 9:33:14 UTC - in response to Message 1788930.  
Last modified: 20 May 2016, 9:34:23 UTC

The thing that concerns me with the new cards is the memory bandwidth.

The GTX 1080 is slightly less than the GTX 980Ti (320GB/s v 336.5 GB/s) but it is considerably more than the GTX 980 (224GB/s) which is the card it is replacing.
A 43% improvement is a good thing IMHO.

EDIT- although the GTX 1070 memory bandwidth is only 14% better than the GTX 970 (256GB/s v 224GB/s).


OK, found something of interest relating to GTX 10xx series cards using the new GDDR5x memory.

GDDR5x memory production
Earlier this year Micron began to sample GDDR5X chips rated to operate at 10 Gb/s, 11 Gb/s and 12 Gb/s in quad data rate (QDR) mode with 16n prefetch. However, it looks like NVIDIA decided to be conservative and only run the chips at the minimum frequency.

So it looks like there is the potential for even greater bandwidth than even the current highend cards have, without having to overclock the memory to get it.
Grant
Darwin NT
ID: 1788953 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1788954 - Posted: 20 May 2016, 9:35:29 UTC - in response to Message 1788953.  

So it looks like there is the potential for even greater bandwidth than even the current highend cards have, without having to overclock the memory to get it.


Yep, looks like considerable overclockability built in right through. Will be fascinating to see what happens when the crazy LN2 guys get hold of it, lol
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1788954 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1788955 - Posted: 20 May 2016, 9:47:08 UTC - in response to Message 1788934.  
Last modified: 20 May 2016, 9:49:19 UTC

I hope I am wrong but I get the feeling that these cards are not going to be as great as some are hoping. Mind you, it does depend on where you're coming from. By that I mean whether you may be coming up the scale or looking at a like to like type thing.

There aren't a lot of compute benchmarks around as of yet, but I found these comparisons at Annandtech.

CompuBench 1.5 - Optical Flow
One that AMD used to rule on, is now almost matched by the GTX 1080

CompuBench 1.5 - Face Detection
LuxMark 3.1 - Hotel
Those that Nvidia ruled on, the GTX 1080 just moves them even further ahead.

Folding @ Home Double Precision
Anything involving Double Precision they really want you to use Tesla.


It looks very promising, especially when you consider that the generally released cards almost invariably are clocked faster than the reference model.
Grant
Darwin NT
ID: 1788955 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1788958 - Posted: 20 May 2016, 10:04:01 UTC - in response to Message 1788955.  
Last modified: 20 May 2016, 10:15:47 UTC

FuryX has HBM memory and closed loop watercooler is that right? [Edit:] Checked, seems so as stock. Will be interesting to see what the watercooling people can extract from a 1080
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1788958 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1789581 - Posted: 22 May 2016, 20:23:49 UTC - in response to Message 1788958.  

We all just need one of these filled up with 1080's (or whatever the Titan or Ti version looks like) and we wouldn't have any RAC issues... Power bill might be a little frightening though...lol

https://youtu.be/uKJw8IKVYQ8

Chris
ID: 1789581 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 19 · Next

Message boards : Number crunching : GPU Wars 2016: GTX 1050 Ti & GTX 1050: October 25th


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.