GPU Wars 2014: Postponed to 2015?

Author	Message
jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1481383 - Posted: 24 Feb 2014, 22:02:21 UTC - in response to Message 1481240. Last modified: 24 Feb 2014, 22:04:29 UTC With my relatively new (and self-proclaimed) role as Cassandra of these boards I'm going to go ahead and guess that Boinc is just "seeing" the flops wrong (it's not like it runs any benchmarks to get that number, just blurts out whatever it's told AFAIK). The Gflops for the 448 are spot on but the Gflops for the 750ti should be around 1300 (according to Wiki anyway, which can be a bit dodgy of course). I do hope I'm wrong though. I was thinking the same. The speed (peak flops) of an NVidia card depends - critically - on the number of shaders or cuda cores (pick your own terminology). For some absurd reason, NVidia makes it stupidly difficult to detect that number programmatically. You can ask the API for the number of 'Streaming Multiprocessors' on the card, and it'll tell you. But ask it how many shaders per SM? Nada. Instead, you have to know (in advance) that it's some function of the Compute Capability of the card. The original NV cards were 8 shaders per SM: Fermis jumped it to 32, some Keplers went up to 48 - but now it seems to have become variable. Accordingly to the (possibly the same) Wiki article, these little Maxwells seem to be back down to 16 S per SM - in which case, BOINC might over-estimate them by 2x or 3x. Unless our resident NV guru can tell us that NVidia have introduced a new API call at last? Edit - COPROC_NVIDIA::set_peak_flops() in /lib/coproc.cpp seems to be stuck in a timewarp. Don't believe anything you read there. From the whitepaper ( at http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce-GTX-750-Ti-Whitepaper.pdf It's an evolution from Fermi, through Kepler, BigK and into a slightly more refined beast again. The ti model is the 'full' GM107 chip with 640 cores total, on 5 x SMMs ( a welcome apparent return to power of two cores per SM, easier to program) and superscaler doubling with independent warp schedulers as per BigK. This paper states this one is stil 28nm, though I suppose could be outdated. What makes the performance of roughly A GTX 480, at one quarter the power, look feasible, is the 2meg L2, coupled with memory controllers vastly improved by nVidia since that time. So 2x560ti's equivalent performance is feasible under certain situations, and x41zc should scale pretty well on it. It does look like they've refined upon BigK's internal crossbars, which will eliminate some of the current bottlenecks seen in older cards, Once I can actively use them properly. Importantly though, the programming model details of compute cap 5.0 aren't exposed yet, so how much is an evolution that will fairly naturally scale with my existing code, like Kepler has in some places while not others, is not clear. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1481383 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1481385 - Posted: 24 Feb 2014, 22:09:21 UTC - in response to Message 1481383. Last modified: 24 Feb 2014, 22:22:31 UTC With my relatively new (and self-proclaimed) role as Cassandra of these boards I'm going to go ahead and guess that Boinc is just "seeing" the flops wrong (it's not like it runs any benchmarks to get that number, just blurts out whatever it's told AFAIK). The Gflops for the 448 are spot on but the Gflops for the 750ti should be around 1300 (according to Wiki anyway, which can be a bit dodgy of course). I do hope I'm wrong though. I was thinking the same. The speed (peak flops) of an NVidia card depends - critically - on the number of shaders or cuda cores (pick your own terminology). For some absurd reason, NVidia makes it stupidly difficult to detect that number programmatically. You can ask the API for the number of 'Streaming Multiprocessors' on the card, and it'll tell you. But ask it how many shaders per SM? Nada. Instead, you have to know (in advance) that it's some function of the Compute Capability of the card. The original NV cards were 8 shaders per SM: Fermis jumped it to 32, some Keplers went up to 48 - but now it seems to have become variable. Accordingly to the (possibly the same) Wiki article, these little Maxwells seem to be back down to 16 S per SM - in which case, BOINC might over-estimate them by 2x or 3x. Unless our resident NV guru can tell us that NVidia have introduced a new API call at last? Edit - COPROC_NVIDIA::set_peak_flops() in /lib/coproc.cpp seems to be stuck in a timewarp. Don't believe anything you read there. From the whitepaper ( at http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce-GTX-750-Ti-Whitepaper.pdf It's an evolution from Fermi, through Kepler, BigK and into a slightly more refined beast again. The ti model is the 'full' GM107 chip with 640 cores total, on 5 x SMMs ( a welcome apparent return to power of two cores per SM, easier to program) and superscaler doubling with independent warp schedulers as per BigK. This paper states this one is stil 28nm, though I suppose could be outdated. What makes the performance of roughly A GTX 480, at one quarter the power, look feasible, is the 2meg L2, coupled with memory controllers vastly improved by nVidia since that time. So 2x560ti's equivalent performance is feasible under certain situations, and x41zc should scale pretty well on it. It does look like they've refined upon BigK's internal crossbars, which will eliminate some of the current bottlenecks seen in older cards, Once I can actively use them properly. Importantly though, the programming model details of compute cap 5.0 aren't exposed yet, so how much is an evolution that will fairly naturally scale with my existing code, like Kepler has in some places while not others, is not clear. Er, would you happen to know whether NVidia has exposed that through the API? Or will we have to teach the BOINC client to read Wikipedia to discover that the base model 750 has 4 x SMMs and 512 cores total? Edit - although I read the Wiki this morning, I'd forgotten what I read there. My mistake, corrected base model to 512 cores. ID: 1481385 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1481387 - Posted: 24 Feb 2014, 22:13:49 UTC - in response to Message 1481385. Last modified: 24 Feb 2014, 22:14:32 UTC Er, would you happen to know whether NVidia has exposed that through the API? Or will we have to teach the BOINC client to read Wikipedia to discover that the base model 750 has 4 x SMMs and 480 cores total? # of SM/SMX/SMMs has always been exposed and is used by the current Boinc code. The # of Cuda cores per unit, is by Compute capability (also exposed), and of course is at the whim of the silicon gods. So it would take Boinc devs to read the whitepapers. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1481387 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1481388 - Posted: 24 Feb 2014, 22:17:15 UTC - in response to Message 1481387. Er, would you happen to know whether NVidia has exposed that through the API? Or will we have to teach the BOINC client to read Wikipedia to discover that the base model 750 has 4 x SMMs and 480 cores total? # of SM/SMX/SMMs has always been exposed and is used by the current Boinc code. The # of Cuda cores per unit, is by Compute capability (also exposed), and of course is at the whim of the silicon gods. So it would take Boinc devs to read the whitepapers. If you could tell me the CC (major/minor) of the GM107, and the number of cores per SMM for that CC, I'll happily read it out loud to the BOINC devs so they can update /lib/coproc.cpp ID: 1481388 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1481390 - Posted: 24 Feb 2014, 22:20:15 UTC - in response to Message 1481388. Last modified: 24 Feb 2014, 22:22:15 UTC Er, would you happen to know whether NVidia has exposed that through the API? Or will we have to teach the BOINC client to read Wikipedia to discover that the base model 750 has 4 x SMMs and 480 cores total? # of SM/SMX/SMMs has always been exposed and is used by the current Boinc code. The # of Cuda cores per unit, is by Compute capability (also exposed), and of course is at the whim of the silicon gods. So it would take Boinc devs to read the whitepapers. If you could tell me the CC (major/minor) of the GM107, and the number of cores per SMM for that CC, I'll happily read it out loud to the BOINC devs so they can update /lib/coproc.cpp Compute capability 5.0, 5 x SMMs, 640 Cuda cores total. From Big Reg's machine: Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1481390 ·

shizaru Volunteer tester Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0	Message 1481393 - Posted: 24 Feb 2014, 22:27:30 UTC - in response to Message 1481383. Last modified: 24 Feb 2014, 22:29:22 UTC So the focus really is back on compute! Awesome:) This paper states this one is stil 28nm, though I suppose could be outdated. No, no! Not outdated:) TSMC has yet to ship anything 20nm. SoCs are usually first out the gate and then come the AMD/NVIDIA GPUs. Estimates for the latter are second half of the year, likely late summer. ID: 1481393 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1481396 - Posted: 24 Feb 2014, 22:29:37 UTC - in response to Message 1481393. So the focus really is back on compute! Awesome:) This paper states this one is stil 28nm, though I suppose could be outdated. No, no! Not outdated:) TSCM has yet to ship anything 20nm. SoCs are usually first out the gate and then come the AMD/NVIDIA GPUs. Estimates for the latter are second half of the year, likely late summer. Hmmm, I'm not 100% sure how they dumped the power demand so low then. more reading I guess ;) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1481396 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1481401 - Posted: 24 Feb 2014, 22:35:08 UTC - in response to Message 1481390. Er, would you happen to know whether NVidia has exposed that through the API? Or will we have to teach the BOINC client to read Wikipedia to discover that the base model 750 has 4 x SMMs and 480 cores total? # of SM/SMX/SMMs has always been exposed and is used by the current Boinc code. The # of Cuda cores per unit, is by Compute capability (also exposed), and of course is at the whim of the silicon gods. So it would take Boinc devs to read the whitepapers. If you could tell me the CC (major/minor) of the GM107, and the number of cores per SMM for that CC, I'll happily read it out loud to the BOINC devs so they can update /lib/coproc.cpp Compute capability 5.0, 5 x SMMs, 640 Cuda cores total. From Big Reg's machine: Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 Thanks. You have mail. The current default handler has flops_per_clock = 2 - could we check if that's still appropriate, please? ID: 1481401 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1481405 - Posted: 24 Feb 2014, 22:39:06 UTC - in response to Message 1481401. Last modified: 24 Feb 2014, 22:44:13 UTC Er, would you happen to know whether NVidia has exposed that through the API? Or will we have to teach the BOINC client to read Wikipedia to discover that the base model 750 has 4 x SMMs and 480 cores total? # of SM/SMX/SMMs has always been exposed and is used by the current Boinc code. The # of Cuda cores per unit, is by Compute capability (also exposed), and of course is at the whim of the silicon gods. So it would take Boinc devs to read the whitepapers. If you could tell me the CC (major/minor) of the GM107, and the number of cores per SMM for that CC, I'll happily read it out loud to the BOINC devs so they can update /lib/coproc.cpp Compute capability 5.0, 5 x SMMs, 640 Cuda cores total. From Big Reg's machine: Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 Thanks. You have mail. The current default handler has flops_per_clock = 2 - could we check if that's still appropriate, please? Alright waking up still. Probably still correct looking at the whitepaper spec of ~1305 GFlops, will check. [Edit:] yes same instruction (warp) schedulers as Kepler class, so same instructions per clock, just more of them and streamlined. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1481405 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1481464 - Posted: 25 Feb 2014, 0:04:56 UTC - in response to Message 1481405. Er, would you happen to know whether NVidia has exposed that through the API? Or will we have to teach the BOINC client to read Wikipedia to discover that the base model 750 has 4 x SMMs and 480 cores total? # of SM/SMX/SMMs has always been exposed and is used by the current Boinc code. The # of Cuda cores per unit, is by Compute capability (also exposed), and of course is at the whim of the silicon gods. So it would take Boinc devs to read the whitepapers. If you could tell me the CC (major/minor) of the GM107, and the number of cores per SMM for that CC, I'll happily read it out loud to the BOINC devs so they can update /lib/coproc.cpp Compute capability 5.0, 5 x SMMs, 640 Cuda cores total. From Big Reg's machine: Device 2: GeForce GTX 750 Ti, 2048 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 Thanks. You have mail. The current default handler has flops_per_clock = 2 - could we check if that's still appropriate, please? Alright waking up still. Probably still correct looking at the whitepaper spec of ~1305 GFlops, will check. [Edit:] yes same instruction (warp) schedulers as Kepler class, so same instructions per clock, just more of them and streamlined. http://boinc.berkeley.edu/trac/changeset/3edb124ab4b16492d58ce5a6f6e40c2244c97ed6/boinc-v2 Fix committed, we await new client version. Note the second line of the checkin notes. ID: 1481464 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1481521 - Posted: 25 Feb 2014, 4:36:33 UTC - in response to Message 1481464. Last modified: 25 Feb 2014, 4:47:23 UTC ... Note the second line of the checkin notes. Hahahaha, I see. Well that number, if available, to me by extension would compute-wise be as much use as that marketing GFLops rating (i.e. none whatsoever). I can imagine how meetings might have gone if it came up. Something like, Marketing: "We need you to put this number into the API, so that applications can work out our fictitious numbers for themselves". Engineering: "No, go away.". We mostly care about multiprocessors, what generation architecture they are, memory type and bus width, and any caching. Unlike AMD ones which, by my understanding, are mostly concerned with shader count. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1481521 ·

shizaru Volunteer tester Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0	Message 1532414 - Posted: 26 Jun 2014, 17:04:37 UTC Last modified: 26 Jun 2014, 17:05:43 UTC Quoting myself: TSMC has yet to ship anything 20nm. SoCs are usually first out the gate and then come the AMD/NVIDIA GPUs. Estimates for the latter are second half of the year, likely late summer. All just rumors, but here's the latest from the grapevine: "Nvidia and AMD were expected to introduce 20nm GPUs sometime in the second half of 2014, but it is becoming increasingly apparent that we wonâ€™t see them until a bit later, with volume production slated for 2015". First products based on 20nm chips ship this week ------- Off topic, CafÃ© style post: Been super-busy (in a good way). Miss you guys. Y'all have a wonderful Summer! ID: 1532414 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1532672 - Posted: 27 Jun 2014, 7:31:24 UTC - in response to Message 1532414. Last modified: 27 Jun 2014, 7:32:02 UTC Yeah, the last couple of articles I found (a month or 2 a go now) were forecasting 20Nm for next year. Apparently TSMC just couldn't get their act together, although noises are that they are finally producing 20nM product (as in your link), but not a lot of it. There are rumours that Nvida may release a GTX 880/880Ti product late this year (sometime in Dec for Christmas maybe???)- but in very limited numbers, with a matching price tag. 1st half of next year before we see any real product availability, 1st quarter if we're lucky. Grant Darwin NT ID: 1532672 ·

tbret Volunteer tester Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40	Message 1532904 - Posted: 27 Jun 2014, 20:06:44 UTC - in response to Message 1532672. . There are rumours that Nvida may release a GTX 880/880Ti product late this year (sometime in Dec for Christmas maybe???)- but in very limited numbers, with a matching price tag. I look forward to the next round of cards, although I probably won't be an early adopter. (because of the prices) I'm not finished making comparisons yet, so please don't quote me, but it looks as-though the 750Ti cards I have perform about the same as a factory overclocked 550Ti. If, and I want to emphasize IF, that's true, then there seems to be a significant power and heat savings per unit crunched in that "range" of cards. If that holds true of the more powerful cards, this might be a real motivator to replace some older cards, even if the old ones are still working well. ID: 1532904 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1532982 - Posted: 27 Jun 2014, 23:43:18 UTC - in response to Message 1532904. Last modified: 27 Jun 2014, 23:48:31 UTC I'm not finished making comparisons yet, so please don't quote me, but it looks as-though the 750Ti cards I have perform about the same as a factory overclocked 550Ti. I've found my GTX 750Ti cards to be slightly slower than the GTX 560Ti/GTX 460 they replaced. When I was playing around with 1, 2 or 3 WUs at a time, the GTX750Ti killed the other cards at processing 3 longer running WUs at a time, but the slowdown with shorties was so severe that 2 at a time is still the optimum number. Once Jason gets a handle of the latency issue & we get an application that can take advantage of Maxwell, their throughput will be well in excess of the previous generation of cards (even though that isn't that case at present with the current application). Given the fact that my GTX 560Ti & 460 used 200W each, I could run 6 GTX 750Tis & produce more work & use less power- even with the still less than optimal application. With a suitable application Maxwells will be untouchable for both performance, and the number of WUs they can crunch per Watt. As it is they still produce more work per watt than previous generations of cards, a new application will just make that lead even greater. EDIT I look forward to the next round of cards, although I probably won't be an early adopter. (because of the prices) Along with the rumours of a highend card right at the end of the year, the noises are that next year once the full range is released they should be priced similarly, or even slightly less than, the equivalent current series of cards. This is partly because of the collapse in demand for GPUs for bitcoin mining, and because of the smaller manufacturing process, resulting in higher yields per wafer (which of course is dependant on TSMC actually producing the numbers they're expected too/supposed to...). Grant Darwin NT ID: 1532982 ·

Cliff Harding Volunteer tester Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67	Message 1533262 - Posted: 28 Jun 2014, 15:54:07 UTC I've found my GTX 750Ti cards to be slightly slower than the GTX 560Ti/GTX 460 they replaced. I've found that my GTX750Ti FTW is faster than the GTX460SE and GTX660SC in that WUs are cut almost in half. But, that would must also take in effect the machines on which they operate on. The GTX750Ti FTX resides on a i7/4770K sitting on top of an ASRock Extreme4 mobo & 2 x 4GB sticks; and the GTX460SE & GTX660SC resided on both an i7/930 & 950 sitting on top of a DX58SO board & 3 x 2GB sticks. Don't have much of a problem with the shorter WUs maybe taking longer to run in some cases. But the AP WUs are something else, what used to take 11~13 hrs. are now doing the same work in 5~6. Given the fact that my GTX 560Ti & 460 used 200W each, I could run 6 GTX 750Tis & produce more work & use less power- even with the still less than optimal application. With a suitable application Maxwells will be untouchable for both performance, and the number of WUs they can crunch per Watt. As it is they still produce more work per watt than previous generations of cards, a new application will just make that lead even greater. I totally agree and hopefully will be adding another 750Ti before the yearend. I did have a concern about the heat emulating from the 750Ti as the LCS for the CPU is drawing air from inside of the case, but that has been proven not to be an issue, which is warrant enough to add another. I'm eagerly waiting for the next version of Lunatics, for if this beast is this good without being properly optimized, I don't want to see what it can do once it is. LIONS, TIGERS & BEARS - OH MY! I don't buy computers, I build them!! ID: 1533262 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1543647 - Posted: 18 Jul 2014, 3:02:30 UTC - in response to Message 1533262. Hmm..interesting article if it is true.. Any thoughts people? http://www.kitguru.net/components/graphic-cards/anton-shilov/nvidia-may-skip-20nm-process-technology-jump-straight-to-16nm/ ID: 1543647 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1543690 - Posted: 18 Jul 2014, 6:05:00 UTC - in response to Message 1543647. Hmm..interesting article if it is true.. Any thoughts people? http://www.kitguru.net/components/graphic-cards/anton-shilov/nvidia-may-skip-20nm-process-technology-jump-straight-to-16nm/ Hard to imagine TSMC getting 16nm working at high yields when they can't get 20nm to work. I guess it would all depend of just how much work is required to redesign for 16nm, and just how good TSMCs 16nm yields actually are. Grant Darwin NT ID: 1543690 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1552717 - Posted: 5 Aug 2014, 22:21:27 UTC - in response to Message 1543690. Rumours, rumours & yet more rumours. Now there are noises about a mid September launch, which will consist of Nvidia saying they now have a GTX 880. Actual hardware isn't expected to be available until October, sometime, maybe. GTX 880 rumours. Grant Darwin NT ID: 1552717 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1556416 - Posted: 13 Aug 2014, 22:58:30 UTC - in response to Message 1552717. Looks like March 2015 http://www.kdramastars.com/articles/32965/20140813/gtx-800-series-release-date.htm ID: 1556416 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.