GPU FLOPS: Theory vs Reality

Message boards : Number crunching : GPU FLOPS: Theory vs Reality
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 17 · Next

AuthorMessage
Micky Badgero

Send message
Joined: 26 Jul 16
Posts: 44
Credit: 21,373,673
RAC: 83
United States
Message 1818132 - Posted: 19 Sep 2016, 4:39:58 UTC - in response to Message 1816265.  

It is not clear that the credit is a meaningful measure. For instance, I had one AstroPulse that ran for two days and got 500 credits. I also had one that ran for ten hours and got 500 credits. The credits per hour are nowhere near constant, even for the SETI@Home tasks.
ID: 1818132 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1818181 - Posted: 19 Sep 2016, 9:58:33 UTC - in response to Message 1818100.  
Last modified: 19 Sep 2016, 10:01:47 UTC

Cheers. I had thought the stock OpenCL CPU usage issue had been solved with a default -use_sleep option to more resemble Cuda's ... guess not.

In any case, my Win+GTX 980 host is now running pre-Alpha Petri's optimisations,. single instance. Looks like a fair bit to generalise, though looking at those figures some care would bring Cuda back on the charts. It would be interesting to have some idea of where mine+new code might fall on the charts, even if the heavy guppi bias subsides.

On the tricky moving target issue of Power for the Credit/Whr comparisons, I'm noticing there is still decent headroom available in terms of Power%, temperature, and any other metric I look at. My guess is that Cuda+OpenCL will just end up trading blows until it becomes splitting hairs. At that point we probably switch to other techniques anyway.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1818181 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1818258 - Posted: 19 Sep 2016, 18:12:51 UTC - in response to Message 1818181.  

Cheers. I had thought the stock OpenCL CPU usage issue had been solved with a default -use_sleep option to more resemble Cuda's ... guess not.

In any case, my Win+GTX 980 host is now running pre-Alpha Petri's optimisations,. single instance. Looks like a fair bit to generalise, though looking at those figures some care would bring Cuda back on the charts. It would be interesting to have some idea of where mine+new code might fall on the charts, even if the heavy guppi bias subsides.

On the tricky moving target issue of Power for the Credit/Whr comparisons, I'm noticing there is still decent headroom available in terms of Power%, temperature, and any other metric I look at. My guess is that Cuda+OpenCL will just end up trading blows until it becomes splitting hairs. At that point we probably switch to other techniques anyway.


Can you put -unroll 16 to your options?
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1818258 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1818282 - Posted: 19 Sep 2016, 21:03:44 UTC - in response to Message 1818258.  

Cheers. I had thought the stock OpenCL CPU usage issue had been solved with a default -use_sleep option to more resemble Cuda's ... guess not.

In any case, my Win+GTX 980 host is now running pre-Alpha Petri's optimisations,. single instance. Looks like a fair bit to generalise, though looking at those figures some care would bring Cuda back on the charts. It would be interesting to have some idea of where mine+new code might fall on the charts, even if the heavy guppi bias subsides.

On the tricky moving target issue of Power for the Credit/Whr comparisons, I'm noticing there is still decent headroom available in terms of Power%, temperature, and any other metric I look at. My guess is that Cuda+OpenCL will just end up trading blows until it becomes splitting hairs. At that point we probably switch to other techniques anyway.


Can you put -unroll 16 to your options?


did that for a bit, though needed to lighten the load for unrelated reasons. Will be able to wind out the settings for today, though bear in mind I'm going for comfort, lol
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1818282 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1818283 - Posted: 19 Sep 2016, 21:05:49 UTC - in response to Message 1818282.  

Cheers. I had thought the stock OpenCL CPU usage issue had been solved with a default -use_sleep option to more resemble Cuda's ... guess not.

In any case, my Win+GTX 980 host is now running pre-Alpha Petri's optimisations,. single instance. Looks like a fair bit to generalise, though looking at those figures some care would bring Cuda back on the charts. It would be interesting to have some idea of where mine+new code might fall on the charts, even if the heavy guppi bias subsides.

On the tricky moving target issue of Power for the Credit/Whr comparisons, I'm noticing there is still decent headroom available in terms of Power%, temperature, and any other metric I look at. My guess is that Cuda+OpenCL will just end up trading blows until it becomes splitting hairs. At that point we probably switch to other techniques anyway.


Can you put -unroll 16 to your options?


did that for a bit, though needed to lighten the load for unrelated reasons. Will be able to wind out the settings for today, though bear in mind I'm going for comfort, lol


Yeah,

I was hoping to see some guppi tasks go near 300-500 seconds instead of the 1000 now. :)
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1818283 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1818285 - Posted: 19 Sep 2016, 21:11:04 UTC - in response to Message 1818283.  
Last modified: 19 Sep 2016, 21:25:31 UTC

Cheers. I had thought the stock OpenCL CPU usage issue had been solved with a default -use_sleep option to more resemble Cuda's ... guess not.

In any case, my Win+GTX 980 host is now running pre-Alpha Petri's optimisations,. single instance. Looks like a fair bit to generalise, though looking at those figures some care would bring Cuda back on the charts. It would be interesting to have some idea of where mine+new code might fall on the charts, even if the heavy guppi bias subsides.

On the tricky moving target issue of Power for the Credit/Whr comparisons, I'm noticing there is still decent headroom available in terms of Power%, temperature, and any other metric I look at. My guess is that Cuda+OpenCL will just end up trading blows until it becomes splitting hairs. At that point we probably switch to other techniques anyway.


Can you put -unroll 16 to your options?


did that for a bit, though needed to lighten the load for unrelated reasons. Will be able to wind out the settings for today, though bear in mind I'm going for comfort, lol


Yeah,

I was hoping to see some guppi tasks go near 300-500 seconds instead of the 1000 now. :)


That'll be tricky while driving 2 x 27" displays watching video and underfeeding with a core2duo, haha. [Edit:] added a touch core volts and core + mem offset. Not sure it'll stay stable or be fed anywhere near fast enough over PCIe1.1, but whatever.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1818285 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1818292 - Posted: 19 Sep 2016, 21:26:26 UTC - in response to Message 1818285.  



That'll be tricky while driving 2 x 27" displays watching video and underfeeding with a core2duo, haha


My 130" video screen ...

The Infinity speakers are nowadays Genelec.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1818292 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1818298 - Posted: 19 Sep 2016, 21:37:11 UTC - in response to Message 1818292.  
Last modified: 19 Sep 2016, 21:57:43 UTC

Well jammed up the process priority to normal as well, in case Windows is sitting on the process at all. Yeah Linux latencies are likely the lowest until latest kernels+drivers, where they'll probably push for virtualised memory structure. Probably will be a situation of squeezing the quirks out here under the mismatched system, then swap the 980 into the Mac Pro to see what it can do under OSX, then Win10 and Linux.

[Edit:] bah, the GPU core clocks etc does nothing --> system bound... oh well.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1818298 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1820357 - Posted: 29 Sep 2016, 1:33:51 UTC - in response to Message 1800336.  

Hi Shaggie76

I have just jumped onto the red team with a Sapphire Radeon RX 480. Currently crunching some work units on the GPU. Feel free to look through my stats in a couple days and you should have some more 480 data to play with ;)

Cheers

Mark
ID: 1820357 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1820768 - Posted: 30 Sep 2016, 6:16:57 UTC - in response to Message 1820357.  

Just as an FYI....

Preliminary results show that the AMD 480 is chomping through work units between 10 to 15 times faster than my original dedicated nVidia 570 .... O.O

Will be interesting to see how things go over time. Note that I am also contributing to other projects and the AMD 480 is now the only card in my system (originally I had a 580 as my primary graphics card with a dedicated 570 for crunching.)
ID: 1820768 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1820777 - Posted: 30 Sep 2016, 7:40:05 UTC - in response to Message 1820768.  

opps, correction.....

the load on the cpu is 10 to 15 times less.....

total run times seem to be about the same but I am also using half the amount of electricity to do it.....

So will watch my RAC as time goes on to see what happens :)
ID: 1820777 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1820965 - Posted: 1 Oct 2016, 0:18:00 UTC

I ran another scan today; the results are now aggregated with the data from the last time which seems to fatten up the median-60% range of results.



The 1060's are currently tracked separately -- 65 hosts and 4975 work-units for 6GB cards but only 11 hosts, 666 work-units for 3GB cards so maybe after another week of tasks the numbers will even out. If there doesn't seem to be any difference I'll just combine them.

Another thing I found interesting is that the script now checks for traces of "-instances_per_device" -- from what I can tell no hosts I've scanned are running multiple instances (this is probably limited to people running Lunatics which I don't include in this scan). I know the detection works because it picks it up for my own hosts (stock, but 2 tasks per GPU).

I'm not sure I like how median-60 looks when there's a lot of data. I need to find a better native graphing module for perl so I can try more things without mucking around in Excel after every run.
ID: 1820965 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1820966 - Posted: 1 Oct 2016, 0:21:15 UTC - in response to Message 1820357.  

I have just jumped onto the red team with a Sapphire Radeon RX 480. Currently crunching some work units on the GPU. Feel free to look through my stats in a couple days and you should have some more 480 data to play with ;)

Thank you but I don't need individual hosts any more; my current collection of data includes 43 RX 4x0 cards -- it's scanned 2749 work-units now so it's got a pretty good picture (it shows up as Ellesmere in the chart below).
ID: 1820966 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820967 - Posted: 1 Oct 2016, 0:30:44 UTC - in response to Message 1820965.  
Last modified: 1 Oct 2016, 0:36:27 UTC

I'm not sure I like how median-60 looks when there's a lot of data. I need to find a better native graphing module for perl so I can try more things without mucking around in Excel after every run.


another option to throw in, though possibly more work, might be to consider generating raw .svg format straight from the scripts (easily opened standalone in browsers, or embedded in html). Practicality of that option probably depends how the svg format would compare to what you're using to parse the boinc xml files already.

[Edit:] am surprised to see nvidia 2xx series hanging in there on the charts. They weren't efficient when they were released :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820967 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1820969 - Posted: 1 Oct 2016, 0:46:23 UTC - in response to Message 1820967.  

Yeah I've considered pooping out html with embedded JS for Google Charts which I've used before but it's not something I can embed here. I just skimmed the state of GD::Graph and it hasn't gotten any prettier in the 10 years it's been since I used it last.

I'm wondering about showing Normal distribution for GUPPIs vs Aerecibo; separating the two should greatly reduce the variance so it might look nice if I can excel-fu it.
ID: 1820969 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1820970 - Posted: 1 Oct 2016, 1:04:30 UTC - in response to Message 1820967.  

[Edit:] am surprised to see nvidia 2xx series hanging in there on the charts. They weren't efficient when they were released :)

And while the GTX 750/750Tis don't produce much in the way for Credit/Hour, they still rank up near the top of the chart for work done per Watt Hour.
As good as Pascal is, I reckon the GTX 750/750Ti would rank as one of (if not the) Nvidia's best achievements.
Grant
Darwin NT
ID: 1820970 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1820972 - Posted: 1 Oct 2016, 1:05:37 UTC

Fascinating stuff. Thanks for doing this!
ID: 1820972 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1820983 - Posted: 1 Oct 2016, 1:47:29 UTC

One more graph tonight: left side of the bar is the mean Credit/Hr for GUPPIs, right hand is the mean for Arecibo tasks. I tried overlaying error-bars at each end for the std-dev for each they were shockingly long and overlapped.

ID: 1820983 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820990 - Posted: 1 Oct 2016, 2:02:13 UTC - in response to Message 1820970.  
Last modified: 1 Oct 2016, 2:05:32 UTC

[Edit:] am surprised to see nvidia 2xx series hanging in there on the charts. They weren't efficient when they were released :)

And while the GTX 750/750Tis don't produce much in the way for Credit/Hour, they still rank up near the top of the chart for work done per Watt Hour.
As good as Pascal is, I reckon the GTX 750/750Ti would rank as one of (if not the) Nvidia's best achievements.


Yeah architecturally, the tipping point toward efficiency was actually the 460, which was a reversal on some mistakes with the 480, then slightly refined with 560ti (which had some new problems to refine). refining/optimising that balance in the 750ti tested Maxwell generation while the process nodes matured, then Pascal is pretty much Maxwell die shrunk and on speed (clock for clock similar performance, lower power, higher clocks)

So the evolution definitely seems to show in the graphs as is IMO, which may be more than enough for a lot of people to make informed choices for their needs. The 1060s are looking like pretty good value right now (total cost of ownership), and I'll be very curious to see if the 1050/1050ti stack up, acheive even better balance, or sit behind the 1060s in performance per Watt (and $)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820990 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1821184 - Posted: 2 Oct 2016, 0:15:51 UTC

@Shaggie76 This thread is 10-kinds-of-awesome. Thank you and all the Seti wizards sooo much :)
- - - - -
@petri33
My 130" video screen ...
Don't get me wrong, I'm a JBL/Infinity man myself but when I saw that pic I was about to shoot my mouth off and say, "Why you no buy Genelec in the land of Genelec!?" But then:
The Infinity speakers are nowadays Genelec.

Of COURSE they are ;) Best active speakers on the planet...
- - - - -
@jason came in to post this in the GPU-Wars thread but here'll do fine :)

The 1060s are looking like pretty good value right now (total cost of ownership), and I'll be very curious to see if the 1050/1050ti stack up, acheive even better balance, or sit behind the 1060s in performance per Watt (and $)


First of all, I'm happy to see that Shaggie has confirmed that the GTX 750/750TIs are the stuff of legend :D
Now... rumor has it that the upcoming 1050/1050TIs will be a GP107 chip so hopefully a lot of people will get their wish!
ID: 1821184 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 17 · Next

Message boards : Number crunching : GPU FLOPS: Theory vs Reality


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.