GPU FLOPS: Theory vs Reality

Message boards : Number crunching : GPU FLOPS: Theory vs Reality
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 17 · Next

AuthorMessage
I3APR

Send message
Joined: 23 Apr 16
Posts: 99
Credit: 70,717,488
RAC: 0
Italy
Message 1802721 - Posted: 15 Jul 2016, 16:19:55 UTC - in response to Message 1802617.  




The results weren't too crazy:




First of all, Shaggie, really nice work we have here !! Big service to the community, as deciding what to use to crunch is easyer now!!

But you scared me : my system has 3x GTX660ti, 1x GTX780ti and 1x GTX1080, so, by visually extrapolating my data from your graph :

- Worst scenario : 2700 WU/h
- Best scenario : 3010 WU/h

And my average credit per day now is about 1950, and with medium OC !!!

Dunno what to say...

Care to run a report on last 5 crunching days of my system, or is it too much work ? http://setiathome.berkeley.edu/results.php?hostid=8035198

Anyway, thank you !!

A.
ID: 1802721 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1802777 - Posted: 15 Jul 2016, 22:21:08 UTC - in response to Message 1802721.  




The results weren't too crazy:




First of all, Shaggie, really nice work we have here !! Big service to the community, as deciding what to use to crunch is easyer now!!

But you scared me : my system has 3x GTX660ti, 1x GTX780ti and 1x GTX1080, so, by visually extrapolating my data from your graph :

- Worst scenario : 2700 WU/h
- Best scenario : 3010 WU/h

And my average credit per day now is about 1950, and with medium OC !!!

Dunno what to say...

Care to run a report on last 5 crunching days of my system, or is it too much work ? http://setiathome.berkeley.edu/results.php?hostid=8035198

Anyway, thank you !!

A.


Nice graphs! Thank You!

Where (with a red X) would a random computer (with an experimental software) be?
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1802777 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1802797 - Posted: 15 Jul 2016, 23:42:11 UTC - in response to Message 1802679.  

Say Shaggie, if you're interested, I have a _ton_ more data that has been logged since I installed emfers program, just let me know and I'll send it to you to review.


Thanks but I think it's probably better to cast a wider net -- I'm sure that it's still checking your results because there aren't many 1080's out there yet, though.
ID: 1802797 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1802801 - Posted: 15 Jul 2016, 23:48:05 UTC - in response to Message 1802718.  

How much knowledge would be needed to adapt your script to find out how many of the top 10,000 top computers/hosts have an "anonymous platform" on the PC's "apps" page (at the bottom)?


A bit of perl fu to wget down the leaderboard then a bunch more wgets to pull down each page; maybe half an hour of farting around if you knew what you were doing.

I'd be a little hesitant to inflict 10,000 queries on the server, though -- I feel bad enough with the hundreds I've already bounced off of it.

Why does this interest you?
ID: 1802801 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1802805 - Posted: 15 Jul 2016, 23:51:20 UTC - in response to Message 1802801.  

Why does this interest you?

There has always been a lot of speculation about the number of systems that run stock, and the numbers that use the anonymous platform.
It's generally considered that anonymous platforms are only a very small percentage of the total number of active systems, but probably produce the largest amount of work (credit).
Grant
Darwin NT
ID: 1802805 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1802822 - Posted: 16 Jul 2016, 0:16:15 UTC - in response to Message 1802721.  


But you scared me : my system has 3x GTX660ti, 1x GTX780ti and 1x GTX1080, so, by visually extrapolating my data from your graph :

- Worst scenario : 2700 WU/h
- Best scenario : 3010 WU/h

And my average credit per day now is about 1950, and with medium OC !!!

Dunno what to say...

Care to run a report on last 5 crunching days of my system, or is it too much work ? http://setiathome.berkeley.edu/results.php?hostid=8035198


I scanned your host 8035198 for GPU tasks and because I can't tell which GPU did which task it looks like your rig is averaging 158 credit/hr per GPU or 790 cr/hr or 18,960 cr/day.

The other factor that could skew the stats down is if you run more than one work-unit concurrently on the same GPU -- it would look like individual tasks take longer and I can't tell if there's something else running at the same time.

I also noticed that you're running a mix of AstroPulse tasks -- I've been limiting the scan to SETI v8 tasks for consistency.

Finally your host total will include CPU credit -- I'd be surprised in your case if it was comparable to your GPUs but maybe it would account for some.

As a basis of comparison I run this host exclusively for SETI exactly 12-hours a day when the electricity is the cheapest.

Host, Device, Credit/Hour, Work Units
8030900, Intel Core i7 970 @ 3.20GHz, 266.783505255511, 31
8030900, NVIDIA GeForce GTX 780, 579.762210515106, 69


Which should be a combined average of 10159 credit/12-hrs -- this is pretty close to what its RAC is on the SETI page which says 9654.
ID: 1802822 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1802826 - Posted: 16 Jul 2016, 0:22:45 UTC - in response to Message 1802777.  

Where (with a red X) would a random computer (with an experimental software) be?

I assume you're talking about your monster?

It looks like average of 1873cr/hr per GPU -- looking at your RAC that's about right. I'm not really sure you're your managing to do it but you're somehow out-performing the average by nearly a factor of 2.
ID: 1802826 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1802834 - Posted: 16 Jul 2016, 0:38:28 UTC - in response to Message 1802805.  

Why does this interest you?

There has always been a lot of speculation about the number of systems that run stock, and the numbers that use the anonymous platform.
It's generally considered that anonymous platforms are only a very small percentage of the total number of active systems, but probably produce the largest amount of work (credit).


This is interesting -- as a programmer I'm puzzled why these optimizations wouldn't make their way back up into the main distribution.

I can maybe take a swing at this when I'm done the wrangling I'm doing with the GPUs -- if the server admins haven't gotten tired of my scripts abusing them of course.
ID: 1802834 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1802841 - Posted: 16 Jul 2016, 1:00:06 UTC - in response to Message 1802834.  

This is interesting -- as a programmer I'm puzzled why these optimizations wouldn't make their way back up into the main distribution.

They do, most of the current GPU applications have been developed or tweaked by the Lunatics team.
However a stock application needs to be suitable for a very wide range of hardware, and optimisations for one architecture often result in worse performance for another.


The most important thing for the stock general release is for the application to return valid work, next is the support for the widest possible range of hardware & operating systems, with the minimum impact on system usability. Extracting the maximum possible performance for given hardware & software & drivers should be way down the list IMHO.

Ideally, there would be one just one stock application for each type of GPU (Intel, AMD, NVidia), as it is there are almost half a dozen for AMD & NVidia each.
Those that want better performance would then make use of the anonymous platform and chose the application that is best suited for their architecture (eg, Kepler, Fermi, Maxwell, Pascal etc). They can choose to go for the absolute maximum possible performance, which is only suitable for dedicated crunchers as it can often result in display & keyboard input lags that make the system unusable for day to day work. Or they can detune it so that it's better than stock performance, but still suitable for use on a computer that is used daily.
Grant
Darwin NT
ID: 1802841 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1802878 - Posted: 16 Jul 2016, 4:09:49 UTC - in response to Message 1802834.  

Why does this interest you?
There has always been a lot of speculation about the number of systems that run stock, and the numbers that use the anonymous platform.
It's generally considered that anonymous platforms are only a very small percentage of the total number of active systems, but probably produce the largest amount of work (credit).
This is interesting -- as a programmer I'm puzzled why these optimizations wouldn't make their way back up into the main distribution.
I can maybe take a swing at this when I'm done the wrangling I'm doing with the GPUs -- if the server admins haven't gotten tired of my scripts abusing them of course.

Here's a thread where I ask the Q and then partially answered it by looking manually at 6 page (*20hosts/page=180host).
I was shocked at how low the results are:
hosts with "Anonymous platform"
host rank 1-20 : 18
host rank 241-260: 9
host rank 741-760: 6
host rank 1241-1260: 5
host rank 2541-2560: 1
host rank 5000-5020: 2

It seems the general approach is to use brute force (buy better hardware) than to do things better with what you've got (such as: install Lunatics)

From my experience and of a few others, I find that Lunatics ***should*** be better marketed. Here's one example (msg 1802750) from yesterday :
As for Lunatics 0.44, I really don't want to install third party programs yet. I don't know this guy. I'd rather stay with the official BOINC client.

If we could convince more SETIzens on the top 10k pages (participants & hosts) to use Lunatics, I think it would increase the turnaround speed for the apps to make their way to stock...and more of us currently using Lunatics could help more with Beta testing.

If you can make a script work for the top 2,000 hosts, that should give us a much better picture.
Let me know if you're more interested with my explanation and Grant's
Cheers,
RobG :-D

PS1: I'm interested in helping you (if you need)...but my limited Perl experience is from almost 20 years ago.

PS2: As for Grant's description just above, it is what I assumed (thanks for the details Grant).
ID: 1802878 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1802908 - Posted: 16 Jul 2016, 9:27:23 UTC - in response to Message 1802826.  

Where (with a red X) would a random computer (with an experimental software) be?

I assume you're talking about your monster?

It looks like average of 1873cr/hr per GPU -- looking at your RAC that's about right. I'm not really sure you're your managing to do it but you're somehow out-performing the average by nearly a factor of 2.


As a programmer I'm used to make the most out of any given hardware.

a) The guppi wu's do not have more work in them, they just happen to have low ar that makes the current software not to parallelize the pulse find calculations. I've fixed that.
b) Cuda streams can be used to utilize the GPU more efficiently
c) Memory access pattern and cache utilization can be improved
d) Instruction level parallelism can be increased
e) The autocorrelation can use the nvidia R2C fft implementation more efficiently than the current C2C fft.
e) As the performance grows so does the heat production too. My cpu's are running at high 60's C even with a HVAC duct blower aimed directly at them.

Sat Jul 16 12:23:04 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27                 Driver Version: 367.27                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     On   | 0000:01:00.0      On |                  N/A |
|100%   67C    P0   165W / 230W |   1271MiB /  4036MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    On   | 0000:02:00.0     Off |                  N/A |
|100%   61C    P2   129W / 215W |   1116MiB /  8113MiB |     84%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 980     On   | 0000:03:00.0     Off |                  N/A |
|100%   71C    P0   153W / 230W |   1067MiB /  4037MiB |     87%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 780     On   | 0000:04:00.0     N/A |                  N/A |
|100%   65C    P0    N/A /  N/A |   1047MiB /  3020MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0       880    G   /usr/bin/X                                     150MiB |
|    0      1478    G   compiz                                          54MiB |
|    0     28730    C   ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8  1061MiB |
|    1     28791    C   ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8  1113MiB |
|    2     28660    C   ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8  1061MiB |
|    3                  Not Supported                                         |
+-----------------------------------------------------------------------------+


To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1802908 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1802925 - Posted: 16 Jul 2016, 12:49:32 UTC - in response to Message 1802878.  

I'm interested in helping you (if you need)...but my limited Perl experience is from almost 20 years ago.


I can do it pretty easily - just give me a few evenings to finish the hacks I have in progress. Part of me wants to rule out anon hosts because it'll mess with the averages I'm trying to get (especially given how much of a difference it can make).
ID: 1802925 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1802928 - Posted: 16 Jul 2016, 12:53:35 UTC - in response to Message 1802908.  


a) The guppi wu's do not have more work in them, they just happen to have low ar that makes the current software not to parallelize the pulse find calculations. I've fixed that.
b) Cuda streams can be used to utilize the GPU more efficiently
c) Memory access pattern and cache utilization can be improved
d) Instruction level parallelism can be increased
e) The autocorrelation can use the nvidia R2C fft implementation more efficiently than the current C2C fft.


That's pretty impressive -- and looking at your dump your 1080 isn't quite saturated yet. It would be fantastic to get some of those optimizations integrated back into the main release.
ID: 1802928 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1802930 - Posted: 16 Jul 2016, 13:11:32 UTC - in response to Message 1802928.  
Last modified: 16 Jul 2016, 13:15:02 UTC


a) The guppi wu's do not have more work in them, they just happen to have low ar that makes the current software not to parallelize the pulse find calculations. I've fixed that.
b) Cuda streams can be used to utilize the GPU more efficiently
c) Memory access pattern and cache utilization can be improved
d) Instruction level parallelism can be increased
e) The autocorrelation can use the nvidia R2C fft implementation more efficiently than the current C2C fft.


That's pretty impressive -- and looking at your dump your 1080 isn't quite saturated yet. It would be fantastic to get some of those optimizations integrated back into the main release.


Stock integration will happen. More slowly than 3rd party test and final variants, because stock distribution has quite a few other considerations (like the small example of cooking poorly maintained systems, among other issues). From what I can see pretty close to 'advanced user' wide testing depending on how much trouble the Windows + Mac builds give over the next few days (presuming Linux fairly straightforward). There are other general issues to solve not specifically related to Petri's massive contribution, but those likely will have to come out of the woodwork on their own, since the reliability is up (at least on the Linux variant so far).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1802930 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1802958 - Posted: 16 Jul 2016, 14:27:25 UTC

I downloaded an updated hosts export and re-ran the scan with slightly relaxed constraints: it now will take GPUs that are less popular but as you can see the standard deviation is much higher. The variance is amplified for AMD cards because not only are there relatively few of them in the stats but I also can't discriminate specific models within a family (ie: R9 Nano and R9 Pro Duo are both "Fiji"). I also had to guess the TDP for AMD cards for the same reason (I chose optimistically which may have been unfair). So take these numbers with a healthy grain of salt:



NOTE: These are only SETI@Home v8 tasks and I should be filtering out multi-GPU setups.

I'm pleased to see preliminary results from GeForce 1070's -- hopefully I'll re-run this scan in a few weeks and get an even better picture.

I'm seeing more Ellsemere (Rx480 parts) in the scan but unfortunately there still aren't enough to make the cut.

ID: 1802958 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1802959 - Posted: 16 Jul 2016, 14:30:38 UTC

I should add that this scan included more hosts and I changed the average to only include the top 50% fastest hosts to try to eliminate hosts running multiple-tasks at once; I'm not married to this approach and might try Winsorized means instead.
ID: 1802959 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1802968 - Posted: 16 Jul 2016, 15:47:16 UTC - in response to Message 1802959.  

I'm surprised but happy to see Fermi class (4x0/5x0) hanging in there, considering NV may be deprecating their support for anything after Cuda8. It would seem to confirm my suspicion than it may be too early to consider leaving these behind for us, so some inventive means might have to be adopted with integration of the new code, so as to avoid losing them.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1802968 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1803167 - Posted: 17 Jul 2016, 19:10:49 UTC

Shaggie, I think I've polluted my data pool, you can't go off of it any longer for my GTX 1080, as I just installed the 980Ti FTW/Hybrid that I assembled on Friday evening into that system, and I believe you were looking for single card setups for your charting. It appeared that my systems RAC had began to plateau, and I had just finished the first of a couple 980Ti Hybrid conversions, so I wanted to toss it in and check it out.

Ran into a big issue with the new Precision X OC, as when they put it together, apparently they hadn't ever considered that an end user would be foolish enough to put a card as lowly as a GTX 980Ti FTW into the same system as the vaunted 10x0 series, so after installation, it was recoginized fine everywhere, but when I went into Precision and clicked on the 980 in the list, it said Sorry, this version of Precision only supports 1070 & 1080 cards. I thought Well! What A Snob!, so uninstalled it and tried installing the X 16 version, in a different sub dir, and then the OC version, so I could use the 'correct' version for each card.

Nope, the program is too smart for it's own good, it deletes the old version before installing the new one. I contacted EVGA and asked them what was up with that, they said they hadn't thought that anyone would do that, and that I am probably one of 10 ppl in the country who are attempting it. I said yeah, that may be true, but I am one out of 100 people in the country who actually have one of these cards, considering the supply constraints, and that I can assure you that people are not going to be tossing their less than 2 year old, $450-600 video cards when they upgrade, and as you finally get more product pushed out the door for sale, this will occur more often, trust me.

He logged my concerns, and said that they were actually aware of it, and supposedly working on it. Not sure how much weight I'd put in that statement, but there it is. I ended up using the latest version of the X 16 software, it somehow makes the 1080 run about 8-10 degrees hotter which I believe accurate when putting my hand on it, but it does seem to overclock it (according to the screen, anyway), as well as obviously working fine with the 980, but the information isn't about the 1080 isn't nearly a good as the OC version.

So anywho, just thought I'd let you know now that the results from Friday evening onward are 'polluted' with 2 cards now, the 980 Hybrid, and the 1080 FTW.

ID: 1803167 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1803175 - Posted: 17 Jul 2016, 20:16:06 UTC - in response to Message 1803167.  
Last modified: 17 Jul 2016, 20:17:04 UTC

Ran into a big issue with the new Precision X OC, as when they put it together, apparently they hadn't ever considered that an end user would be foolish enough to put a card as lowly as a GTX 980Ti FTW into the same system as the vaunted 10x0 series, so after installation, it was recoginized fine everywhere, but when I went into Precision and clicked on the 980 in the list, it said Sorry, this version of Precision only supports 1070 & 1080 cards. I thought Well! What A Snob!, so uninstalled it and tried installing the X 16 version, in a different sub dir, and then the OC version, so I could use the 'correct' version for each card.


Welcome to my world Al, lol

I'm sure there are more than 10 people, we just didn't care to contact them to hear they won't do anything about it, lol
ID: 1803175 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1803185 - Posted: 17 Jul 2016, 21:19:22 UTC - in response to Message 1803167.  

Shaggie, I think I've polluted my data pool, you can't go off of it any longer for my GTX 1080, as I just installed the 980Ti FTW/Hybrid that I assembled on Friday evening into that system, and I believe you were looking for single card setups for your charting. It appeared that my systems RAC had began to plateau, and I had just finished the first of a couple 980Ti Hybrid conversions, so I wanted to toss it in and check it out.


Ok thanks, I can filter it out because the script logs the hardware in the host at time of download.

Good luck with your system; I'll be sure to get matched cards when I upgrade!
ID: 1803185 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 17 · Next

Message boards : Number crunching : GPU FLOPS: Theory vs Reality


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.