GPU FLOPS: Theory vs Reality

Author	Message
I3APR Send message Joined: 23 Apr 16 Posts: 99 Credit: 70,717,488 RAC: 0	Message 1802721 - Posted: 15 Jul 2016, 16:19:55 UTC - in response to Message 1802617. The results weren't too crazy: First of all, Shaggie, really nice work we have here !! Big service to the community, as deciding what to use to crunch is easyer now!! But you scared me : my system has 3x GTX660ti, 1x GTX780ti and 1x GTX1080, so, by visually extrapolating my data from your graph : - Worst scenario : 2700 WU/h - Best scenario : 3010 WU/h And my average credit per day now is about 1950, and with medium OC !!! Dunno what to say... Care to run a report on last 5 crunching days of my system, or is it too much work ? http://setiathome.berkeley.edu/results.php?hostid=8035198 Anyway, thank you !! A. ID: 1802721 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1802777 - Posted: 15 Jul 2016, 22:21:08 UTC - in response to Message 1802721. The results weren't too crazy: First of all, Shaggie, really nice work we have here !! Big service to the community, as deciding what to use to crunch is easyer now!! But you scared me : my system has 3x GTX660ti, 1x GTX780ti and 1x GTX1080, so, by visually extrapolating my data from your graph : - Worst scenario : 2700 WU/h - Best scenario : 3010 WU/h And my average credit per day now is about 1950, and with medium OC !!! Dunno what to say... Care to run a report on last 5 crunching days of my system, or is it too much work ? http://setiathome.berkeley.edu/results.php?hostid=8035198 Anyway, thank you !! A. Nice graphs! Thank You! Where (with a red X) would a random computer (with an experimental software) be? To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1802777 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1802797 - Posted: 15 Jul 2016, 23:42:11 UTC - in response to Message 1802679. Say Shaggie, if you're interested, I have a _ton_ more data that has been logged since I installed emfers program, just let me know and I'll send it to you to review. Thanks but I think it's probably better to cast a wider net -- I'm sure that it's still checking your results because there aren't many 1080's out there yet, though. ID: 1802797 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1802801 - Posted: 15 Jul 2016, 23:48:05 UTC - in response to Message 1802718. How much knowledge would be needed to adapt your script to find out how many of the top 10,000 top computers/hosts have an "anonymous platform" on the PC's "apps" page (at the bottom)? A bit of perl fu to wget down the leaderboard then a bunch more wgets to pull down each page; maybe half an hour of farting around if you knew what you were doing. I'd be a little hesitant to inflict 10,000 queries on the server, though -- I feel bad enough with the hundreds I've already bounced off of it. Why does this interest you? ID: 1802801 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1802805 - Posted: 15 Jul 2016, 23:51:20 UTC - in response to Message 1802801. Why does this interest you? There has always been a lot of speculation about the number of systems that run stock, and the numbers that use the anonymous platform. It's generally considered that anonymous platforms are only a very small percentage of the total number of active systems, but probably produce the largest amount of work (credit). Grant Darwin NT ID: 1802805 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1802822 - Posted: 16 Jul 2016, 0:16:15 UTC - in response to Message 1802721. But you scared me : my system has 3x GTX660ti, 1x GTX780ti and 1x GTX1080, so, by visually extrapolating my data from your graph : - Worst scenario : 2700 WU/h - Best scenario : 3010 WU/h And my average credit per day now is about 1950, and with medium OC !!! Dunno what to say... Care to run a report on last 5 crunching days of my system, or is it too much work ? http://setiathome.berkeley.edu/results.php?hostid=8035198 I scanned your host 8035198 for GPU tasks and because I can't tell which GPU did which task it looks like your rig is averaging 158 credit/hr per GPU or 790 cr/hr or 18,960 cr/day. The other factor that could skew the stats down is if you run more than one work-unit concurrently on the same GPU -- it would look like individual tasks take longer and I can't tell if there's something else running at the same time. I also noticed that you're running a mix of AstroPulse tasks -- I've been limiting the scan to SETI v8 tasks for consistency. Finally your host total will include CPU credit -- I'd be surprised in your case if it was comparable to your GPUs but maybe it would account for some. As a basis of comparison I run this host exclusively for SETI exactly 12-hours a day when the electricity is the cheapest. Host, Device, Credit/Hour, Work Units 8030900, Intel Core i7 970 @ 3.20GHz, 266.783505255511, 31 8030900, NVIDIA GeForce GTX 780, 579.762210515106, 69 Which should be a combined average of 10159 credit/12-hrs -- this is pretty close to what its RAC is on the SETI page which says 9654. ID: 1802822 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1802826 - Posted: 16 Jul 2016, 0:22:45 UTC - in response to Message 1802777. Where (with a red X) would a random computer (with an experimental software) be? I assume you're talking about your monster? It looks like average of 1873cr/hr per GPU -- looking at your RAC that's about right. I'm not really sure you're your managing to do it but you're somehow out-performing the average by nearly a factor of 2. ID: 1802826 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1802834 - Posted: 16 Jul 2016, 0:38:28 UTC - in response to Message 1802805. Why does this interest you? There has always been a lot of speculation about the number of systems that run stock, and the numbers that use the anonymous platform. It's generally considered that anonymous platforms are only a very small percentage of the total number of active systems, but probably produce the largest amount of work (credit). This is interesting -- as a programmer I'm puzzled why these optimizations wouldn't make their way back up into the main distribution. I can maybe take a swing at this when I'm done the wrangling I'm doing with the GPUs -- if the server admins haven't gotten tired of my scripts abusing them of course. ID: 1802834 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1802841 - Posted: 16 Jul 2016, 1:00:06 UTC - in response to Message 1802834. This is interesting -- as a programmer I'm puzzled why these optimizations wouldn't make their way back up into the main distribution. They do, most of the current GPU applications have been developed or tweaked by the Lunatics team. However a stock application needs to be suitable for a very wide range of hardware, and optimisations for one architecture often result in worse performance for another. The most important thing for the stock general release is for the application to return valid work, next is the support for the widest possible range of hardware & operating systems, with the minimum impact on system usability. Extracting the maximum possible performance for given hardware & software & drivers should be way down the list IMHO. Ideally, there would be one just one stock application for each type of GPU (Intel, AMD, NVidia), as it is there are almost half a dozen for AMD & NVidia each. Those that want better performance would then make use of the anonymous platform and chose the application that is best suited for their architecture (eg, Kepler, Fermi, Maxwell, Pascal etc). They can choose to go for the absolute maximum possible performance, which is only suitable for dedicated crunchers as it can often result in display & keyboard input lags that make the system unusable for day to day work. Or they can detune it so that it's better than stock performance, but still suitable for use on a computer that is used daily. Grant Darwin NT ID: 1802841 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1802878 - Posted: 16 Jul 2016, 4:09:49 UTC - in response to Message 1802834. Why does this interest you? There has always been a lot of speculation about the number of systems that run stock, and the numbers that use the anonymous platform. It's generally considered that anonymous platforms are only a very small percentage of the total number of active systems, but probably produce the largest amount of work (credit). This is interesting -- as a programmer I'm puzzled why these optimizations wouldn't make their way back up into the main distribution. I can maybe take a swing at this when I'm done the wrangling I'm doing with the GPUs -- if the server admins haven't gotten tired of my scripts abusing them of course. Here's a thread where I ask the Q and then partially answered it by looking manually at 6 page (20hosts/page=180host). I was shocked at how low the results are: hosts with "Anonymous platform" host rank 1-20 : 18 host rank 241-260: 9 host rank 741-760: 6 host rank 1241-1260: 5 host rank 2541-2560: 1 host rank 5000-5020: 2 It seems the general approach is to use brute force (buy better hardware) than to do things better with what you've got (such as: install Lunatics) From my experience and of a few others, I find that Lunatics should* be better marketed. Here's one example (msg 1802750) from yesterday : As for Lunatics 0.44, I really don't want to install third party programs yet. I don't know this guy. I'd rather stay with the official BOINC client. If we could convince more SETIzens on the top 10k pages (participants & hosts) to use Lunatics, I think it would increase the turnaround speed for the apps to make their way to stock...and more of us currently using Lunatics could help more with Beta testing. If you can make a script work for the top 2,000 hosts, that should give us a much better picture. Let me know if you're more interested with my explanation and Grant's Cheers, RobG :-D PS1: I'm interested in helping you (if you need)...but my limited Perl experience is from almost 20 years ago. PS2: As for Grant's description just above, it is what I assumed (thanks for the details Grant). ID: 1802878 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1802908 - Posted: 16 Jul 2016, 9:27:23 UTC - in response to Message 1802826. Where (with a red X) would a random computer (with an experimental software) be? I assume you're talking about your monster? It looks like average of 1873cr/hr per GPU -- looking at your RAC that's about right. I'm not really sure you're your managing to do it but you're somehow out-performing the average by nearly a factor of 2. As a programmer I'm used to make the most out of any given hardware. a) The guppi wu's do not have more work in them, they just happen to have low ar that makes the current software not to parallelize the pulse find calculations. I've fixed that. b) Cuda streams can be used to utilize the GPU more efficiently c) Memory access pattern and cache utilization can be improved d) Instruction level parallelism can be increased e) The autocorrelation can use the nvidia R2C fft implementation more efficiently than the current C2C fft. e) As the performance grows so does the heat production too. My cpu's are running at high 60's C even with a HVAC duct blower aimed directly at them. Sat Jul 16 12:23:04 2016 +-----------------------------------------------------------------------------+ \| NVIDIA-SMI 367.27 Driver Version: 367.27 \| \|-------------------------------+----------------------+----------------------+ \| GPU Name Persistence-M\| Bus-Id Disp.A \| Volatile Uncorr. ECC \| \| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \| \|===============================+======================+======================\| \| 0 GeForce GTX 980 On \| 0000:01:00.0 On \| N/A \| \|100% 67C P0 165W / 230W \| 1271MiB / 4036MiB \| 100% Default \| +-------------------------------+----------------------+----------------------+ \| 1 GeForce GTX 1080 On \| 0000:02:00.0 Off \| N/A \| \|100% 61C P2 129W / 215W \| 1116MiB / 8113MiB \| 84% Default \| +-------------------------------+----------------------+----------------------+ \| 2 GeForce GTX 980 On \| 0000:03:00.0 Off \| N/A \| \|100% 71C P0 153W / 230W \| 1067MiB / 4037MiB \| 87% Default \| +-------------------------------+----------------------+----------------------+ \| 3 GeForce GTX 780 On \| 0000:04:00.0 N/A \| N/A \| \|100% 65C P0 N/A / N/A \| 1047MiB / 3020MiB \| N/A Default \| +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 880 G /usr/bin/X 150MiB \| \| 0 1478 G compiz 54MiB \| \| 0 28730 C ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8 1061MiB \| \| 1 28791 C ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8 1113MiB \| \| 2 28660 C ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8 1061MiB \| \| 3 Not Supported \| +-----------------------------------------------------------------------------+ To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1802908 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1802925 - Posted: 16 Jul 2016, 12:49:32 UTC - in response to Message 1802878. I'm interested in helping you (if you need)...but my limited Perl experience is from almost 20 years ago. I can do it pretty easily - just give me a few evenings to finish the hacks I have in progress. Part of me wants to rule out anon hosts because it'll mess with the averages I'm trying to get (especially given how much of a difference it can make). ID: 1802925 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1802928 - Posted: 16 Jul 2016, 12:53:35 UTC - in response to Message 1802908. a) The guppi wu's do not have more work in them, they just happen to have low ar that makes the current software not to parallelize the pulse find calculations. I've fixed that. b) Cuda streams can be used to utilize the GPU more efficiently c) Memory access pattern and cache utilization can be improved d) Instruction level parallelism can be increased e) The autocorrelation can use the nvidia R2C fft implementation more efficiently than the current C2C fft. That's pretty impressive -- and looking at your dump your 1080 isn't quite saturated yet. It would be fantastic to get some of those optimizations integrated back into the main release. ID: 1802928 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1802930 - Posted: 16 Jul 2016, 13:11:32 UTC - in response to Message 1802928. Last modified: 16 Jul 2016, 13:15:02 UTC a) The guppi wu's do not have more work in them, they just happen to have low ar that makes the current software not to parallelize the pulse find calculations. I've fixed that. b) Cuda streams can be used to utilize the GPU more efficiently c) Memory access pattern and cache utilization can be improved d) Instruction level parallelism can be increased e) The autocorrelation can use the nvidia R2C fft implementation more efficiently than the current C2C fft. That's pretty impressive -- and looking at your dump your 1080 isn't quite saturated yet. It would be fantastic to get some of those optimizations integrated back into the main release. Stock integration will happen. More slowly than 3rd party test and final variants, because stock distribution has quite a few other considerations (like the small example of cooking poorly maintained systems, among other issues). From what I can see pretty close to 'advanced user' wide testing depending on how much trouble the Windows + Mac builds give over the next few days (presuming Linux fairly straightforward). There are other general issues to solve not specifically related to Petri's massive contribution, but those likely will have to come out of the woodwork on their own, since the reliability is up (at least on the Linux variant so far). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1802930 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1802958 - Posted: 16 Jul 2016, 14:27:25 UTC I downloaded an updated hosts export and re-ran the scan with slightly relaxed constraints: it now will take GPUs that are less popular but as you can see the standard deviation is much higher. The variance is amplified for AMD cards because not only are there relatively few of them in the stats but I also can't discriminate specific models within a family (ie: R9 Nano and R9 Pro Duo are both "Fiji"). I also had to guess the TDP for AMD cards for the same reason (I chose optimistically which may have been unfair). So take these numbers with a healthy grain of salt: NOTE: These are only SETI@Home v8 tasks and I should be filtering out multi-GPU setups. I'm pleased to see preliminary results from GeForce 1070's -- hopefully I'll re-run this scan in a few weeks and get an even better picture. I'm seeing more Ellsemere (Rx480 parts) in the scan but unfortunately there still aren't enough to make the cut. ID: 1802958 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1802959 - Posted: 16 Jul 2016, 14:30:38 UTC I should add that this scan included more hosts and I changed the average to only include the top 50% fastest hosts to try to eliminate hosts running multiple-tasks at once; I'm not married to this approach and might try Winsorized means instead. ID: 1802959 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1802968 - Posted: 16 Jul 2016, 15:47:16 UTC - in response to Message 1802959. I'm surprised but happy to see Fermi class (4x0/5x0) hanging in there, considering NV may be deprecating their support for anything after Cuda8. It would seem to confirm my suspicion than it may be too early to consider leaving these behind for us, so some inventive means might have to be adopted with integration of the new code, so as to avoid losing them. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1802968 ·

Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482	Message 1803167 - Posted: 17 Jul 2016, 19:10:49 UTC Shaggie, I think I've polluted my data pool, you can't go off of it any longer for my GTX 1080, as I just installed the 980Ti FTW/Hybrid that I assembled on Friday evening into that system, and I believe you were looking for single card setups for your charting. It appeared that my systems RAC had began to plateau, and I had just finished the first of a couple 980Ti Hybrid conversions, so I wanted to toss it in and check it out. Ran into a big issue with the new Precision X OC, as when they put it together, apparently they hadn't ever considered that an end user would be foolish enough to put a card as lowly as a GTX 980Ti FTW into the same system as the vaunted 10x0 series, so after installation, it was recoginized fine everywhere, but when I went into Precision and clicked on the 980 in the list, it said Sorry, this version of Precision only supports 1070 & 1080 cards. I thought Well! What A Snob!, so uninstalled it and tried installing the X 16 version, in a different sub dir, and then the OC version, so I could use the 'correct' version for each card. Nope, the program is too smart for it's own good, it deletes the old version before installing the new one. I contacted EVGA and asked them what was up with that, they said they hadn't thought that anyone would do that, and that I am probably one of 10 ppl in the country who are attempting it. I said yeah, that may be true, but I am one out of 100 people in the country who actually have one of these cards, considering the supply constraints, and that I can assure you that people are not going to be tossing their less than 2 year old, $450-600 video cards when they upgrade, and as you finally get more product pushed out the door for sale, this will occur more often, trust me. He logged my concerns, and said that they were actually aware of it, and supposedly working on it. Not sure how much weight I'd put in that statement, but there it is. I ended up using the latest version of the X 16 software, it somehow makes the 1080 run about 8-10 degrees hotter which I believe accurate when putting my hand on it, but it does seem to overclock it (according to the screen, anyway), as well as obviously working fine with the 980, but the information isn't about the 1080 isn't nearly a good as the OC version. So anywho, just thought I'd let you know now that the results from Friday evening onward are 'polluted' with 2 cards now, the 980 Hybrid, and the 1080 FTW. ID: 1803167 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1803175 - Posted: 17 Jul 2016, 20:16:06 UTC - in response to Message 1803167. Last modified: 17 Jul 2016, 20:17:04 UTC Ran into a big issue with the new Precision X OC, as when they put it together, apparently they hadn't ever considered that an end user would be foolish enough to put a card as lowly as a GTX 980Ti FTW into the same system as the vaunted 10x0 series, so after installation, it was recoginized fine everywhere, but when I went into Precision and clicked on the 980 in the list, it said Sorry, this version of Precision only supports 1070 & 1080 cards. I thought Well! What A Snob!, so uninstalled it and tried installing the X 16 version, in a different sub dir, and then the OC version, so I could use the 'correct' version for each card. Welcome to my world Al, lol I'm sure there are more than 10 people, we just didn't care to contact them to hear they won't do anything about it, lol ID: 1803175 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1803185 - Posted: 17 Jul 2016, 21:19:22 UTC - in response to Message 1803167. Shaggie, I think I've polluted my data pool, you can't go off of it any longer for my GTX 1080, as I just installed the 980Ti FTW/Hybrid that I assembled on Friday evening into that system, and I believe you were looking for single card setups for your charting. It appeared that my systems RAC had began to plateau, and I had just finished the first of a couple 980Ti Hybrid conversions, so I wanted to toss it in and check it out. Ok thanks, I can filter it out because the script logs the hardware in the host at time of download. Good luck with your system; I'll be sure to get matched cards when I upgrade! ID: 1803185 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.