GPU FLOPS: Theory vs Reality

Message boards : Number crunching : GPU FLOPS: Theory vs Reality
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 · Next

AuthorMessage
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1827617 - Posted: 31 Oct 2016, 0:59:52 UTC

It's been a few weeks since 8.19 was released so I've run another scan. New cards in the charts today: the NVIDIA GTX Titan Black and the AMD Baffin (ie: RX 470?)

Here are the median 60% of the work units scanned:



I'm not sure if it's the new GPU optimizations that made it into stock 8.19 or if the recent mix of work-units has somehow been favorable to the slightly higher memory bandwidth in the 980 Ti -- either way I'm surprised to see it at the top. I'll probably run an incremental scan next weekend and see if it's still ahead.

Here are the number of hosts and work-units aggregated is posted as well if you want some idea of the confidence:

ID: 1827617 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1827627 - Posted: 31 Oct 2016, 1:39:36 UTC - in response to Message 1827617.  

I'm not sure if it's the new GPU optimizations that made it into stock 8.19 or if the recent mix of work-units has somehow been favorable to the slightly higher memory bandwidth in the 980 Ti -- either way I'm surprised to see it at the top. I'll probably run an incremental scan next weekend and see if it's still ahead.


I'm not surprised.

I've always said the 980Ti was more productive than the 1080 or 1070s based on my testing of them.

What will be interesting to see is the 1080Tis in January. If they proved to be as productive as they sound, might be an option to upgrade a few machines.
ID: 1827627 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1827635 - Posted: 31 Oct 2016, 2:30:07 UTC

I'm not so sure about the 1080 Ti's -- my guess is they'll still be a 250W card and perform somewhere between the 1080's and the Pascal Titans. From what I've seen in my scans the Titans are between 1200 and 1500 CR/hr but there aren't enough of them to qualify for the charts.

Personally I'm thinking about a set of 4 1070's @ 600W rather than a pair of 1080 Tis. My guess is it'll be about the same price as a pair of 1080 Ti's, a bit faster, and a bit better credit/watt.

Of course if they're a 200W card instead I might be willing to bite the extra cost and just run 3.
ID: 1827635 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1827643 - Posted: 31 Oct 2016, 3:20:19 UTC

Hey Shaggie!

I SOooo love to see the updates to GPU outputs. Great work as always!

Have you given any thought about putting that talent of yours towards promoting the importance of optimization with Lunatics v0.45 and MrK's prog?
...since running stock is very inefficient (especially because of the CPU stock app)!

My GTX1060 and my 2 GTX750Ti seem to have a 30%-45% better throughput than what I would do with stock...and that's without mentioning the almost 100% improvement on the CPU tasks with the Lunatics CPU app for my CPUs (Xeon W3550).

The way I see it: by improving my throughput I'm improving the project's overall throughput
...and if more SETIzens could see that in some charts, that would be visual data worth acting upon with what they already have.

I think your current charts are still incredible at showing the better buys since the electricity consumption is likely the greatest cost for dedicated crunchers. But optimization seems to me so important that I think it worthwhile to mention it to you again.

Just me submitting my wish list...again...for the greater good (aka throughput)! ;-}

Cheers,
RobG :-D
ID: 1827643 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1827665 - Posted: 31 Oct 2016, 5:19:43 UTC - in response to Message 1827617.  

It's been a few weeks since 8.19 was released so I've run another scan. New cards in the charts today: the NVIDIA GTX Titan Black and the AMD Baffin (ie: RX 470?)

Here are the median 60% of the work units scanned:


I'm not sure if it's the new GPU optimizations that made it into stock 8.19 or if the recent mix of work-units has somehow been favorable to the slightly higher memory bandwidth in the 980 Ti -- either way I'm surprised to see it at the top. I'll probably run an incremental scan next weekend and see if it's still ahead.

Here are the number of hosts and work-units aggregated is posted as well if you want some idea of the confidence:


Always look forward to your summary table! One question about the approach. Are you including only stock apps for 8.19? Since stock and optimized for 8.19 are identical at this time, maybe it is better to look at all work using r3528 apps.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1827665 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1827709 - Posted: 31 Oct 2016, 12:26:35 UTC - in response to Message 1827665.  

Are you including only stock apps for 8.19? Since stock and optimized for 8.19 are identical at this time, maybe it is better to look at all work using r3528 apps.


The crude way I crawl the SETI website is quite limited -- it takes a few hours to scrape out a report and to get enough detail to discriminate specific versions would require literally 20x the queries. The easiest (and probably most consistent) filter is to just take stock apps running single-GPUs (ie: the vast majority). Sampling just the common case means I'm unlikely to have confounding results from weird custom builds like petri33's, command-line tweaks, concurrent tasks, overclocked parts, and hacks like the GUPPI Rescheduler.

It's a good picture of baseline performance which I find helpful as a basis of comparison when selecting new parts and comparing the results of system tweaks> It's not trying to be a leaderboard of the 'best' systems because we have that already.

There is a periodic dump of some of the database that I use to push some aspects offline (eg hosts.gz) but there's no export of the tasks table; if I could get that I'd be able to do a lot more detailed analysis without grinding the SETI servers.
ID: 1827709 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1827843 - Posted: 1 Nov 2016, 8:22:02 UTC - in response to Message 1827709.  

Are you including only stock apps for 8.19? Since stock and optimized for 8.19 are identical at this time, maybe it is better to look at all work using r3528 apps.


The crude way I crawl the SETI website is quite limited -- it takes a few hours to scrape out a report and to get enough detail to discriminate specific versions would require literally 20x the queries. The easiest (and probably most consistent) filter is to just take stock apps running single-GPUs (ie: the vast majority). Sampling just the common case means I'm unlikely to have confounding results from weird custom builds like petri33's, command-line tweaks, concurrent tasks, overclocked parts, and hacks like the GUPPI Rescheduler.

It's a good picture of baseline performance which I find helpful as a basis of comparison when selecting new parts and comparing the results of system tweaks> It's not trying to be a leaderboard of the 'best' systems because we have that already.

There is a periodic dump of some of the database that I use to push some aspects offline (eg hosts.gz) but there's no export of the tasks table; if I could get that I'd be able to do a lot more detailed analysis without grinding the SETI servers.


Just kind of a bummer to know my systems aren't in the mix. With the quick release of Lunatics into stock, I think it is more likely the case now that people running optimized apps would be behind stock. I have recommended two people I interact with to upgrade to stock! Lunatics only keeps you ahead if you go through the work of manual installs when a new app is available. That's probably a small number of people. Also, I noticed recently that stock apps with no arguments for Fury are giving nearly the same output as optimized arguments. Have defaults changed in the latest release?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1827843 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1827845 - Posted: 1 Nov 2016, 8:33:53 UTC - in response to Message 1827843.  

Have defaults changed in the latest release?

Now defaults only starting point. App adapts to real GPU performance in PulseFind area.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1827845 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1827861 - Posted: 1 Nov 2016, 11:12:13 UTC - in response to Message 1827845.  

Have defaults changed in the latest release?

Now defaults only starting point. App adapts to real GPU performance in PulseFind area.

Very cool! Thanks for the update.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1827861 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1828730 - Posted: 6 Nov 2016, 2:01:34 UTC

I did a few scan this weekend; taking only the results from this scan I get:



But if I aggregate both this week's and last week's scans I still get the 980 Ti in the lead.



Even when just including one week's worth of data the top cards are including over a hundred different hosts and around 10,000 work-units so the sampling is pretty thorough.
ID: 1828730 · Report as offensive
Profile M_M
Avatar

Send message
Joined: 20 May 04
Posts: 76
Credit: 45,752,966
RAC: 8
Serbia
Message 1828803 - Posted: 6 Nov 2016, 6:42:54 UTC - in response to Message 1828730.  
Last modified: 6 Nov 2016, 6:43:16 UTC

Thanks Shaggie.

Any ideas on 980ti/1080 case?

Accoding to nVidia, 980ti is around 6TFLOPS and 1080 is around 9TFLOPS. Raw memory bandwidth wise they are almost the same but 1080 should alse have a benefits of better memory compression of around 20% as claimed by nVidia.
ID: 1828803 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1828866 - Posted: 6 Nov 2016, 13:37:19 UTC - in response to Message 1828803.  
Last modified: 6 Nov 2016, 13:37:55 UTC

Any ideas on 980ti/1080 case?

Accoding to nVidia, 980ti is around 6TFLOPS and 1080 is around 9TFLOPS. Raw memory bandwidth wise they are almost the same but 1080 should alse have a benefits of better memory compression of around 20% as claimed by nVidia.

My guess is that you've got the answer right there: bandwidth. AFAIK memory compression is a trick for render-target transfers and won't help OpenCL. I think if you look at the relative bandwidth of the 1070 vs the 1080 the SETI credit/hour is suspiciously proportional, too. But that's just my guess.
ID: 1828866 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1828876 - Posted: 6 Nov 2016, 14:45:59 UTC - in response to Message 1828866.  

Parts like Pulse, Triplet, Autocorr are basically summation so memory-restrained in most cases.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1828876 · Report as offensive
Profile M_M
Avatar

Send message
Joined: 20 May 04
Posts: 76
Credit: 45,752,966
RAC: 8
Serbia
Message 1828885 - Posted: 6 Nov 2016, 16:09:33 UTC - in response to Message 1828866.  

To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :(
ID: 1828885 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1828892 - Posted: 6 Nov 2016, 16:58:49 UTC - in response to Message 1828885.  

To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :(

And easy to overcome with available tools like NvidiaInspector.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1828892 · Report as offensive
Profile M_M
Avatar

Send message
Joined: 20 May 04
Posts: 76
Credit: 45,752,966
RAC: 8
Serbia
Message 1828969 - Posted: 7 Nov 2016, 5:40:16 UTC - in response to Message 1828892.  

To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :(

And easy to overcome with available tools like NvidiaInspector.


I have tried it (Win10 and GTX1080) but I could't make it work.

This was possible on Maxwell but not on Pascal I think...
ID: 1828969 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1829053 - Posted: 7 Nov 2016, 22:01:43 UTC - in response to Message 1828969.  

To be even worse, nVidia is on Pascal limiting computation to P2 power state, i.e. throttling back memory clock by approx 10%, without any proper reason given to users. This is easy to check with GPUZ or similar tool, while GPU tasks are running. :(

And easy to overcome with available tools like NvidiaInspector.


I have tried it (Win10 and GTX1080) but I could't make it work.

This was possible on Maxwell but not on Pascal I think...

Don't know about that. I find it interesting that it doesn't work on Win10 and Pascal. It's worked on all my machines so far but the Win10 machine has my old Maxwell cards and not the newer Pascal cards. It works fine on my GTX970's on the Win10 machine.

Just what kind of troubles or issues did you run into on Win10 and Pascal. I'd like to know because in the future I might upgrade the Win10 machine to Pascal cards.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1829053 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1829055 - Posted: 7 Nov 2016, 22:39:50 UTC - in response to Message 1829053.  

Just be wary of the Pascals from EVGA, lots of issues with temps in their forums lately
ID: 1829055 · Report as offensive
AMDave
Volunteer tester

Send message
Joined: 9 Mar 01
Posts: 234
Credit: 11,671,730
RAC: 0
United States
Message 1829080 - Posted: 8 Nov 2016, 0:15:48 UTC - in response to Message 1829055.  
Last modified: 8 Nov 2016, 0:16:52 UTC

Just be wary of the Pascals from EVGA, lots of issues with temps in their forums latelyJust be wary of the Pascals from EVGA, lots of issues with temps in their forums lately

EVGA has problems with GTX 1080/1070 FTW
    ►  "... the company also says that the temperature of the VRM and memory, "in extreme circumstances", was
         marginally within spec and needed to be addressed.

         To fix the bug, EVGA will be rolling out a VBIOS update, which should adjust the fan speed curve to ensure
         sufficient cooling of all components. EVGA claims that this will resolve potential thermal problems..."

    ►  "EVGA also notes that all graphics cards shipped from EVGA after 1st of November will have the VBIOS
         update applied."

ID: 1829080 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1829086 - Posted: 8 Nov 2016, 1:16:01 UTC - in response to Message 1829080.  
Last modified: 8 Nov 2016, 1:16:22 UTC

They will also send you thermal pads that the user has to applied to both the VRAM and the VRMs.. I'm waiting on them to send me both before I tear mine apart to place these on there.
ID: 1829086 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 · Next

Message boards : Number crunching : GPU FLOPS: Theory vs Reality


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.