GPU Wars 2016: GTX 1050 Ti & GTX 1050: October 25th

Author	Message
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482	Message 1788693 - Posted: 19 May 2016, 13:51:36 UTC - in response to Message 1788691. That sounds like a great plan if it can be achieved. Go Man Go! ;-) ID: 1788693 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1788698 - Posted: 19 May 2016, 14:01:54 UTC - in response to Message 1788693. Last modified: 19 May 2016, 14:03:35 UTC That sounds like a great plan if it can be achieved. Go Man Go! ;-) I'm doing some testing and trying and I think that Jason will do the whole thing right. You can get a sneak peek preview here. It is an ar 0.42 task in 164 seconds. There are some high ar tasks that take less than 60 seconds. The guppi vlars take about 700 seconds and need some more optimizing. Oh how I wish I could get one of those GTX1080's .... To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1788698 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1788709 - Posted: 19 May 2016, 14:28:52 UTC - in response to Message 1788698. Last modified: 19 May 2016, 14:31:08 UTC That sounds like a great plan if it can be achieved. Go Man Go! ;-) I'm doing some testing and trying and I think that Jason will do the whole thing right. You can get a sneak peek preview here. It is an ar 0.42 task in 164 seconds. There are some high ar tasks that take less than 60 seconds. The guppi vlars take about 700 seconds and need some more optimizing. Oh how I wish I could get one of those GTX1080's .... Yeah, watching PC Perspective's Interview with Tom Petersen (nVidia engineer) now. Probably the extra planning now (frustrating as a wait is) will pay the best dividends. Definitely looks like the streaming and configurability will be the way to go. If you get a chance to watch it, there's a part where he talks about 'fastsync', where he goes into how (for graphics) Vsync-On creates latency through backpressure, similar as we see, while Vsync-Off has tearing, and their fastsync thing lets the engine run as fast as it wants while the display syncs. While discarding frames as with their fastsync, isn't an option for compute, it did give some great hints on the buffer management and ways to relieve the slowdown. Probably we can just stick PID controls around launches and scale GPU load to whatever setpoint we like, before initiating a buffer swap and CPU side reduction. Uncoupling from the kernel level sync, and scaling to the right frequency for the system Should reduce CPU load for the same result, and add a 'free' throttle for the GPU. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1788709 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19062 Credit: 40,757,560 RAC: 67	Message 1788911 - Posted: 20 May 2016, 5:54:16 UTC Nvidia have put details up on their site, you can set it so they "Notify Me" when the card of your choice is available in your area. Nvidia 1080 Choose 1070 in products if that is the GPU you wish for. ID: 1788911 ·

Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597	Message 1788926 - Posted: 20 May 2016, 7:07:33 UTC - in response to Message 1788911. Last modified: 20 May 2016, 7:09:35 UTC Just to add to Grant's post, you can also go here to see the specs as well: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_Series The thing that concerns me with the new cards is the memory bandwidth. ID: 1788926 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1788930 - Posted: 20 May 2016, 7:26:33 UTC - in response to Message 1788926. Last modified: 20 May 2016, 7:31:55 UTC The thing that concerns me with the new cards is the memory bandwidth. The GTX 1080 is slightly less than the GTX 980Ti (320GB/s v 336.5 GB/s) but it is considerably more than the GTX 980 (224GB/s) which is the card it is replacing. A 43% improvement is a good thing IMHO. EDIT- although the GTX 1070 memory bandwidth is only 14% better than the GTX 970 (256GB/s v 224GB/s). Grant Darwin NT ID: 1788930 ·

Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597	Message 1788931 - Posted: 20 May 2016, 7:39:34 UTC - in response to Message 1788930. I understand where you are coming from but for the single precision speed of the gpu, in combination with the cuda cores, it seems somewhat chocked. The 980 Ti has 5632 GFlops and 336.5 GB/s; the 1080 has 8228 GFlops and 320 GB/s. It's circa 60% increase in GFlops but basically the same bandwidth. I hope I am wrong but I get the feeling that these cards are not going to be as great as some are hoping. Mind you, it does depend on where you're coming from. By that I mean whether you may be coming up the scale or looking at a like to like type thing. ID: 1788931 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1788933 - Posted: 20 May 2016, 7:56:46 UTC - in response to Message 1788931. A few complications. First, from what I've seen, the rated GBps seems to be at base memory clocks, and there seems to be quite some headroom in the boost logic (even before OC) Second, I'm mostly concerned with the cache architecture, for which I've yet to see any details on sizes etc. Signs are pointing to considerable reduced latencies right through the core. That should amount to simple Cuda kernels being able to leverage more bandwidth than the ~80% theoretical the earlier gens achieve with the best non-overlapped streams, so less thrashing, we'll see. Last, since heavy use of memory compression became a thing, from few clues in gaming benches etc, it looks like they've improved that considerably in Pascal. That means GBps using the traditional spec formula, may not be particularly accurate depending on the data going through it. It's going to be tough to make myself sit on my wallet for a bit "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1788933 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1788934 - Posted: 20 May 2016, 7:59:31 UTC - in response to Message 1788931. I hope I am wrong but I get the feeling that these cards are not going to be as great as some are hoping. Mind you, it does depend on where you're coming from. By that I mean whether you may be coming up the scale or looking at a like to like type thing. Maxwell was meant to be released on 22nm but had to be re-done for release on 28nm. Pascal was designed specifically for 16/14 nm and was able to be released on that process node. In addition to that, it's architecture was based on the lessons learned from Maxwell so not only does it have the benefit of the much smaller process node, but it also has the benefit of architectural tweaks over Maxwell, in addition to some significant new features with regards to it's programmability. As things stand for Seti, I suspect that the main benefit will be the faster clock speeds & bandwidth as well as it's much faster context switching. Like Maxwell, it will require an application that makes full use of it's abilities in order to really see just what the hardware is really capable of. Grant Darwin NT ID: 1788934 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1788935 - Posted: 20 May 2016, 8:06:33 UTC - in response to Message 1788934. Last modified: 20 May 2016, 8:10:20 UTC I hope I am wrong but I get the feeling that these cards are not going to be as great as some are hoping. Mind you, it does depend on where you're coming from. By that I mean whether you may be coming up the scale or looking at a like to like type thing. Maxwell was meant to be released on 22nm but had to be re-done for release on 28nm. Pascal was designed specifically for 16/14 nm and was able to be released on that process node. In addition to that, it's architecture was based on the lessons learned from Maxwell so not only does it have the benefit of the much smaller process node, but it also has the benefit of architectural tweaks over Maxwell, in addition to some significant new features with regards to it's programmability. As things stand for Seti, I suspect that the main benefit will be the faster clock speeds & bandwidth as well as it's much faster context switching. Like Maxwell, it will require an application that makes full use of it's abilities in order to really see just what the hardware is really capable of. Probably tomorrow I start gathering my various test pieces, so that there is some sortof organised pack for whoever the lucky first to get one is, can extract some meaningful numbers. In all, I'm expecting that simpler kernels do better than some of the hoops that had to be jumped through for Pre-Fermi and Fermi. [i.e. fewer architectural quirks with each generation] "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1788935 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1788936 - Posted: 20 May 2016, 8:14:37 UTC - in response to Message 1788935. Probably tomorrow I start gathering my various test pieces, so that there is some sortof organised pack for whoever the lucky first to get one is, can extract some meaningful numbers. Good to hear. Grant Darwin NT ID: 1788936 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1788937 - Posted: 20 May 2016, 8:15:48 UTC - in response to Message 1788936. Probably tomorrow I start gathering my various test pieces, so that there is some sortof organised pack for whoever the lucky first to get one is, can extract some meaningful numbers. Good to hear. +1 To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1788937 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1788941 - Posted: 20 May 2016, 8:27:17 UTC - in response to Message 1788936. Last modified: 20 May 2016, 8:34:06 UTC Probably tomorrow I start gathering my various test pieces, so that there is some sortof organised pack for whoever the lucky first to get one is, can extract some meaningful numbers. Good to hear. Yeah, it's a bit frustrating guessing from dodgy marketing slides and gaming benchmarks. Most probably I have enough code floating around to compare achievable memory bandwidth, compute, and latencies with code of varying complexity (simple reference through my Fermi+Kepler class to Petri-Style cudastreams+handKernels), just a matter of finding all the bits and putting them together. If the simplest code works best in some cases, it would be a sign of massive architectural improvements (which we're not really expecting over Maxwell). Existing complex code winning would be an indicator they focussed more on raw performance than efficiency [and ease of implementation]. Comparison example from CPU history: Core2 memory access over a large array requires only a simple loop, as hardware is there to trigger and feed the simple code. Pentium4, on the other hand, requires nested loops of block prefetch code to strike anywhere near its capability. Naturally not much hand Pentium4 specific code of that calibre was implemented, because it's very time consuming to do [and very cpu specific]. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1788941 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1788945 - Posted: 20 May 2016, 8:40:57 UTC - in response to Message 1788941. If the simplest code works best in some cases, it would be a sign of massive architectural improvements (which we're not really expecting over Maxwell). But Maxwell over Kepler...? Grant Darwin NT ID: 1788945 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1788948 - Posted: 20 May 2016, 8:50:32 UTC - in response to Message 1788945. If the simplest code works best in some cases, it would be a sign of massive architectural improvements (which we're not really expecting over Maxwell). But Maxwell over Kepler...? Maxwell over Kepler was largely incremental, mostly efficiency related, rather than raw horsepower. The last architectural leap that looked this large in raw performance numbers, was Kepler over Fermi. It's more of a continuum than that, although does seem to tie more to process nodes than anything. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1788948 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1788953 - Posted: 20 May 2016, 9:33:14 UTC - in response to Message 1788930. Last modified: 20 May 2016, 9:34:23 UTC The thing that concerns me with the new cards is the memory bandwidth. The GTX 1080 is slightly less than the GTX 980Ti (320GB/s v 336.5 GB/s) but it is considerably more than the GTX 980 (224GB/s) which is the card it is replacing. A 43% improvement is a good thing IMHO. EDIT- although the GTX 1070 memory bandwidth is only 14% better than the GTX 970 (256GB/s v 224GB/s). OK, found something of interest relating to GTX 10xx series cards using the new GDDR5x memory. GDDR5x memory production Earlier this year Micron began to sample GDDR5X chips rated to operate at 10 Gb/s, 11 Gb/s and 12 Gb/s in quad data rate (QDR) mode with 16n prefetch. However, it looks like NVIDIA decided to be conservative and only run the chips at the minimum frequency. So it looks like there is the potential for even greater bandwidth than even the current highend cards have, without having to overclock the memory to get it. Grant Darwin NT ID: 1788953 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1788954 - Posted: 20 May 2016, 9:35:29 UTC - in response to Message 1788953. So it looks like there is the potential for even greater bandwidth than even the current highend cards have, without having to overclock the memory to get it. Yep, looks like considerable overclockability built in right through. Will be fascinating to see what happens when the crazy LN2 guys get hold of it, lol "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1788954 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1788955 - Posted: 20 May 2016, 9:47:08 UTC - in response to Message 1788934. Last modified: 20 May 2016, 9:49:19 UTC I hope I am wrong but I get the feeling that these cards are not going to be as great as some are hoping. Mind you, it does depend on where you're coming from. By that I mean whether you may be coming up the scale or looking at a like to like type thing. There aren't a lot of compute benchmarks around as of yet, but I found these comparisons at Annandtech. CompuBench 1.5 - Optical Flow One that AMD used to rule on, is now almost matched by the GTX 1080 CompuBench 1.5 - Face Detection LuxMark 3.1 - Hotel Those that Nvidia ruled on, the GTX 1080 just moves them even further ahead. Folding @ Home Double Precision Anything involving Double Precision they really want you to use Tesla. It looks very promising, especially when you consider that the generally released cards almost invariably are clocked faster than the reference model. Grant Darwin NT ID: 1788955 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1788958 - Posted: 20 May 2016, 10:04:01 UTC - in response to Message 1788955. Last modified: 20 May 2016, 10:15:47 UTC FuryX has HBM memory and closed loop watercooler is that right? [Edit:] Checked, seems so as stock. Will be interesting to see what the watercooling people can extract from a 1080 "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1788958 ·

Chris Adamek Volunteer tester Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236	Message 1789581 - Posted: 22 May 2016, 20:23:49 UTC - in response to Message 1788958. We all just need one of these filled up with 1080's (or whatever the Titan or Ti version looks like) and we wouldn't have any RAC issues... Power bill might be a little frightening though...lol https://youtu.be/uKJw8IKVYQ8 Chris ID: 1789581 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.