GPU FLOPS: Theory vs Reality

Author	Message
Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1827617 - Posted: 31 Oct 2016, 0:59:52 UTC It's been a few weeks since 8.19 was released so I've run another scan. New cards in the charts today: the NVIDIA GTX Titan Black and the AMD Baffin (ie: RX 470?) Here are the median 60% of the work units scanned: I'm not sure if it's the new GPU optimizations that made it into stock 8.19 or if the recent mix of work-units has somehow been favorable to the slightly higher memory bandwidth in the 980 Ti -- either way I'm surprised to see it at the top. I'll probably run an incremental scan next weekend and see if it's still ahead. Here are the number of hosts and work-units aggregated is posted as well if you want some idea of the confidence: ID: 1827617 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1827627 - Posted: 31 Oct 2016, 1:39:36 UTC - in response to Message 1827617. I'm not sure if it's the new GPU optimizations that made it into stock 8.19 or if the recent mix of work-units has somehow been favorable to the slightly higher memory bandwidth in the 980 Ti -- either way I'm surprised to see it at the top. I'll probably run an incremental scan next weekend and see if it's still ahead. I'm not surprised. I've always said the 980Ti was more productive than the 1080 or 1070s based on my testing of them. What will be interesting to see is the 1080Tis in January. If they proved to be as productive as they sound, might be an option to upgrade a few machines. ID: 1827627 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1827635 - Posted: 31 Oct 2016, 2:30:07 UTC I'm not so sure about the 1080 Ti's -- my guess is they'll still be a 250W card and perform somewhere between the 1080's and the Pascal Titans. From what I've seen in my scans the Titans are between 1200 and 1500 CR/hr but there aren't enough of them to qualify for the charts. Personally I'm thinking about a set of 4 1070's @ 600W rather than a pair of 1080 Tis. My guess is it'll be about the same price as a pair of 1080 Ti's, a bit faster, and a bit better credit/watt. Of course if they're a 200W card instead I might be willing to bite the extra cost and just run 3. ID: 1827635 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1827643 - Posted: 31 Oct 2016, 3:20:19 UTC Hey Shaggie! I SOooo love to see the updates to GPU outputs. Great work as always! Have you given any thought about putting that talent of yours towards promoting the importance of optimization with Lunatics v0.45 and MrK's prog? ...since running stock is very inefficient (especially because of the CPU stock app)! My GTX1060 and my 2 GTX750Ti seem to have a 30%-45% better throughput than what I would do with stock...and that's without mentioning the almost 100% improvement on the CPU tasks with the Lunatics CPU app for my CPUs (Xeon W3550). The way I see it: by improving my throughput I'm improving the project's overall throughput ...and if more SETIzens could see that in some charts, that would be visual data worth acting upon with what they already have. I think your current charts are still incredible at showing the better buys since the electricity consumption is likely the greatest cost for dedicated crunchers. But optimization seems to me so important that I think it worthwhile to mention it to you again. Just me submitting my wish list...again...for the greater good (aka throughput)! ;-} Cheers, RobG :-D ID: 1827643 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1827665 - Posted: 31 Oct 2016, 5:19:43 UTC - in response to Message 1827617. It's been a few weeks since 8.19 was released so I've run another scan. New cards in the charts today: the NVIDIA GTX Titan Black and the AMD Baffin (ie: RX 470?) Here are the median 60% of the work units scanned: I'm not sure if it's the new GPU optimizations that made it into stock 8.19 or if the recent mix of work-units has somehow been favorable to the slightly higher memory bandwidth in the 980 Ti -- either way I'm surprised to see it at the top. I'll probably run an incremental scan next weekend and see if it's still ahead. Here are the number of hosts and work-units aggregated is posted as well if you want some idea of the confidence: Always look forward to your summary table! One question about the approach. Are you including only stock apps for 8.19? Since stock and optimized for 8.19 are identical at this time, maybe it is better to look at all work using r3528 apps. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1827665 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1827709 - Posted: 31 Oct 2016, 12:26:35 UTC - in response to Message 1827665. Are you including only stock apps for 8.19? Since stock and optimized for 8.19 are identical at this time, maybe it is better to look at all work using r3528 apps. The crude way I crawl the SETI website is quite limited -- it takes a few hours to scrape out a report and to get enough detail to discriminate specific versions would require literally 20x the queries. The easiest (and probably most consistent) filter is to just take stock apps running single-GPUs (ie: the vast majority). Sampling just the common case means I'm unlikely to have confounding results from weird custom builds like petri33's, command-line tweaks, concurrent tasks, overclocked parts, and hacks like the GUPPI Rescheduler. It's a good picture of baseline performance which I find helpful as a basis of comparison when selecting new parts and comparing the results of system tweaks> It's not trying to be a leaderboard of the 'best' systems because we have that already. There is a periodic dump of some of the database that I use to push some aspects offline (eg hosts.gz) but there's no export of the tasks table; if I could get that I'd be able to do a lot more detailed analysis without grinding the SETI servers. ID: 1827709 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1827843 - Posted: 1 Nov 2016, 8:22:02 UTC - in response to Message 1827709. Are you including only stock apps for 8.19? Since stock and optimized for 8.19 are identical at this time, maybe it is better to look at all work using r3528 apps. The crude way I crawl the SETI website is quite limited -- it takes a few hours to scrape out a report and to get enough detail to discriminate specific versions would require literally 20x the queries. The easiest (and probably most consistent) filter is to just take stock apps running single-GPUs (ie: the vast majority). Sampling just the common case means I'm unlikely to have confounding results from weird custom builds like petri33's, command-line tweaks, concurrent tasks, overclocked parts, and hacks like the GUPPI Rescheduler. It's a good picture of baseline performance which I find helpful as a basis of comparison when selecting new parts and comparing the results of system tweaks> It's not trying to be a leaderboard of the 'best' systems because we have that already. There is a periodic dump of some of the database that I use to push some aspects offline (eg hosts.gz) but there's no export of the tasks table; if I could get that I'd be able to do a lot more detailed analysis without grinding the SETI servers. Just kind of a bummer to know my systems aren't in the mix. With the quick release of Lunatics into stock, I think it is more likely the case now that people running optimized apps would be behind stock. I have recommended two people I interact with to upgrade to stock! Lunatics only keeps you ahead if you go through the work of manual installs when a new app is available. That's probably a small number of people. Also, I noticed recently that stock apps with no arguments for Fury are giving nearly the same output as optimized arguments. Have defaults changed in the latest release? GitHub: Ricks-Lab Instagram: ricks_labs ID: 1827843 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1827845 - Posted: 1 Nov 2016, 8:33:53 UTC - in response to Message 1827843. Have defaults changed in the latest release? Now defaults only starting point. App adapts to real GPU performance in PulseFind area. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1827845 ·

RueiKe Volunteer tester Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785	Message 1827861 - Posted: 1 Nov 2016, 11:12:13 UTC - in response to Message 1827845. Have defaults changed in the latest release? Now defaults only starting point. App adapts to real GPU performance in PulseFind area. Very cool! Thanks for the update. GitHub: Ricks-Lab Instagram: ricks_labs ID: 1827861 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1828730 - Posted: 6 Nov 2016, 2:01:34 UTC I did a few scan this weekend; taking only the results from this scan I get: But if I aggregate both this week's and last week's scans I still get the 980 Ti in the lead. Even when just including one week's worth of data the top cards are including over a hundred different hosts and around 10,000 work-units so the sampling is pretty thorough. ID: 1828730 ·

M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8	Message 1828803 - Posted: 6 Nov 2016, 6:42:54 UTC - in response to Message 1828730. Last modified: 6 Nov 2016, 6:43:16 UTC Thanks Shaggie. Any ideas on 980ti/1080 case? Accoding to nVidia, 980ti is around 6TFLOPS and 1080 is around 9TFLOPS. Raw memory bandwidth wise they are almost the same but 1080 should alse have a benefits of better memory compression of around 20% as claimed by nVidia. ID: 1828803 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1828866 - Posted: 6 Nov 2016, 13:37:19 UTC - in response to Message 1828803. Last modified: 6 Nov 2016, 13:37:55 UTC Any ideas on 980ti/1080 case? Accoding to nVidia, 980ti is around 6TFLOPS and 1080 is around 9TFLOPS. Raw memory bandwidth wise they are almost the same but 1080 should alse have a benefits of better memory compression of around 20% as claimed by nVidia. My guess is that you've got the answer right there: bandwidth. AFAIK memory compression is a trick for render-target transfers and won't help OpenCL. I think if you look at the relative bandwidth of the 1070 vs the 1080 the SETI credit/hour is suspiciously proportional, too. But that's just my guess. ID: 1828866 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1828876 - Posted: 6 Nov 2016, 14:45:59 UTC - in response to Message 1828866. Parts like Pulse, Triplet, Autocorr are basically summation so memory-restrained in most cases. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1828876 ·

M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8	Message 1828885 - Posted: 6 Nov 2016, 16:09:33 UTC - in response to Message 1828866. To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :( ID: 1828885 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1828892 - Posted: 6 Nov 2016, 16:58:49 UTC - in response to Message 1828885. To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :( And easy to overcome with available tools like NvidiaInspector. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1828892 ·

M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8	Message 1828969 - Posted: 7 Nov 2016, 5:40:16 UTC - in response to Message 1828892. To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :( And easy to overcome with available tools like NvidiaInspector. I have tried it (Win10 and GTX1080) but I could't make it work. This was possible on Maxwell but not on Pascal I think... ID: 1828969 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1829053 - Posted: 7 Nov 2016, 22:01:43 UTC - in response to Message 1828969. To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :( And easy to overcome with available tools like NvidiaInspector. I have tried it (Win10 and GTX1080) but I could't make it work. This was possible on Maxwell but not on Pascal I think... Don't know about that. I find it interesting that it doesn't work on Win10 and Pascal. It's worked on all my machines so far but the Win10 machine has my old Maxwell cards and not the newer Pascal cards. It works fine on my GTX970's on the Win10 machine. Just what kind of troubles or issues did you run into on Win10 and Pascal. I'd like to know because in the future I might upgrade the Win10 machine to Pascal cards. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1829053 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1829055 - Posted: 7 Nov 2016, 22:39:50 UTC - in response to Message 1829053. Just be wary of the Pascals from EVGA, lots of issues with temps in their forums lately ID: 1829055 ·

AMDave Volunteer tester Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0	Message 1829080 - Posted: 8 Nov 2016, 0:15:48 UTC - in response to Message 1829055. Last modified: 8 Nov 2016, 0:16:52 UTC Just be wary of the Pascals from EVGA, lots of issues with temps in their forums latelyJust be wary of the Pascals from EVGA, lots of issues with temps in their forums lately EVGA has problems with GTX 1080/1070 FTW â–ºÂ Â "... the company also says that the temperature of the VRM and memory, "in extreme circumstances", was Â Â Â Â marginally within spec and needed to be addressed. Â Â Â Â Â To fix the bug, EVGA will be rolling out a VBIOS update, which should adjust the fan speed curve to ensure Â Â Â Â sufficient cooling of all components. EVGA claims that this will resolve potential thermal problems..." â–ºÂ Â "EVGA also notes that all graphics cards shipped from EVGA after 1st of November will have the VBIOS Â Â Â Â update applied." ID: 1829080 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1829086 - Posted: 8 Nov 2016, 1:16:01 UTC - in response to Message 1829080. Last modified: 8 Nov 2016, 1:16:22 UTC They will also send you thermal pads that the user has to applied to both the VRAM and the VRMs.. I'm waiting on them to send me both before I tear mine apart to place these on there. ID: 1829086 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.