GPU FLOPS: Theory vs Reality

Author	Message
Rune BjÃ¸rge Send message Joined: 5 Feb 00 Posts: 45 Credit: 30,508,204 RAC: 5	Message 1821511 - Posted: 3 Oct 2016, 15:44:59 UTC - in response to Message 1820967. [Edit:] am surprised to see nvidia 2xx series hanging in there on the charts. They weren't efficient when they were released :) My rig with an Nvidia and a hefty core2 cpu chugging along for a whopping RAC of around 1200... ID: 1821511 ·

Micky Badgero Send message Joined: 26 Jul 16 Posts: 44 Credit: 21,373,673 RAC: 83	Message 1821581 - Posted: 4 Oct 2016, 0:45:34 UTC I have an i7-3770S with one NVIDIA GeForce GTX 1080 and Windows 10, and I get less than 20,000 average credit. Any idea how this computer puts out so many average credits from four 1080s? User ID: petri33 Computer ID: 7475713 Rank: 1 Avg. credit: 153,343.25 Total credit: 79,404,680 CPU: GenuineIntel Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz [Family 6 Model 45 Stepping 6] (12 processors) GPU: [4] NVIDIA GeForce GTX 1080 (4095MB) OpenCL: 1.2 Operating System: Linux 4.2.0-42-generic Micky ID: 1821581 ·

woohoo Volunteer tester Send message Joined: 30 Oct 13 Posts: 972 Credit: 165,671,404 RAC: 5	Message 1821583 - Posted: 4 Oct 2016, 0:49:41 UTC petri33 runs his own cuda 8.0 app that doesn't run under windows ID: 1821583 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1821584 - Posted: 4 Oct 2016, 0:49:57 UTC - in response to Message 1821581. Petri running a special app (one that he composed) that that is much more efficient than is stock for most of us. He's working with some of the volunteer developers to get the application ready for the rest of us. Right now it's in alpha testing and he has permission to use it here on Main to see how it runs. ID: 1821584 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1821587 - Posted: 4 Oct 2016, 1:00:04 UTC - in response to Message 1821581. I have an i7-3770S with one NVIDIA GeForce GTX 1080 and Windows 10, and I get less than 20,000 average credit. Any idea how this computer puts out so many average credits from four 1080s? User ID: petri33 Computer ID: 7475713 Rank: 1 Avg. credit: 153,343.25 Total credit: 79,404,680 CPU: GenuineIntel Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz [Family 6 Model 45 Stepping 6] (12 processors) GPU: [4] NVIDIA GeForce GTX 1080 (4095MB) OpenCL: 1.2 Operating System: Linux 4.2.0-42-generic Micky So, he's running a hexa-core CPU with hyperthreading enabled, so 12 CPUs working tasks in addition to the GPUs. At the same time, he's got 4 1080s working, though it's unclear how many tasks per GPU he's running. I think I recall perhaps just 1, for an earlier discussion, but don't quote me. Also, the code he's running is Anonymous (i.e. Lunatics) Cuda (rather than OpenCL) that he has modified himself, presumably optimized to his particular setup. He probably gets a bit of a boost from the Linux vs. Windows overhead, as well. ID: 1821587 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1821608 - Posted: 4 Oct 2016, 3:08:26 UTC - in response to Message 1821587. So, he's running a hexa-core CPU with hyperthreading enabled, so 12 CPUs working tasks in addition to the GPUs. At the same time, he's got 4 1080s working, though it's unclear how many tasks per GPU he's running. I think I recall perhaps just 1, for an earlier discussion, but don't quote me. Also, the code he's running is Anonymous (i.e. Lunatics) Cuda (rather than OpenCL) that he has modified himself, presumably optimized to his particular setup. He probably gets a bit of a boost from the Linux vs. Windows overhead, as well. I'd say that's pretty close to the mark. I'm gradually ever so slightly closer to a Windows alpha that incorporates the key elements, at least for a specific set of GPUs. Being very situation specific, there have been some hurdles to overcome pushing toward wider applicability. Those hurdles included improving the behaviour on a system with under-powered CPU, on Windows (specifically to tackle display lag), as well as fighting with both Windows and nv changes. Fortunately the worst of that seems to be over with my GTX980 host ( http://setiathome.berkeley.edu/show_host_detail.php?hostid=5631447 ). The zipa2 Cuda 6.5 build running there appears to have resolved prior validation issues, I've managed to further suppress CPU usage (which I intend to make an option, and default), and identified some long standing issues that will need attention. Guppi performance, on the 980, see tasks run (1-up) some in <500 seconds <20% usage of a core2duo core, while watching youtube videos. There is headroom there, though I keep settings conservative to keep power and temps in check. I will likely apply such conservative settings as defaults, and issue some stern warning/at-own-risk type messages along with notes on how to lift them. IOW on mine I have to keep the utilisation down lower than it could be, just to keep the machine usable for other tasks. Several things remain to be done for Windows alpha, in that the new methods run slightly afoul of some boincapi limitations with respect to shutdown. That's familiar territory though. Once Alpha proceeds, other aspects to be worked on are that the end-goal is to integrate enough, into a stock-worthy regime, that older cards than Big Kepler can be supported, and get some improvement also. On top of that, working towards Win/Linux/Mac 3 way unified buildsystem is proving challenging, such that future builds can potentially happen in lockstep on all 3 platforms. At present all 3 use a mismatched buildsystem, each with their own quirks, while modern cross platform alternatives exist. Probably the Linux lower driver latencies are some, but not a huge part of the picture. Under the new code Cuda streams are used extensively. These hide latencies at various levels, so in theory at least you could see all 3 platforms more or less equalise for the same GPU+clocks. Probably Windows is the easiest to overclock the GPUs on, complete with artefact scanners, though I suspect the tools may mature on the other platforms too, since m$ is doing everything they can to drive gaming to other platforms. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1821608 ·

Micky Badgero Send message Joined: 26 Jul 16 Posts: 44 Credit: 21,373,673 RAC: 83	Message 1821641 - Posted: 4 Oct 2016, 4:55:58 UTC Thanks for all the responses. I am very much looking forward to trying a beta release of petri33's app. I am interested in CUDA programming. I have been looking for ways to increase SETI@home crunching at lower cost than the i7 + 1080 I have now, which will be retasked to deep learning when I have the time. Sooner rather than later, I hope. My i7-3770S has quad cores and 8 threads. It runs all eight threads as separate SETI@home tasks. The CPU threads appear to be working with the GPU, not separately. Most of the inactive threads say 'GPU suspended - computer is in use (0.51 CPU + 1 NVIDIA GPU)'. Some just say 'Ready to start'. I don't know if these are CPU only threads, but they all show 3+ hours of expected run time. The 'GPU suspended ...' threads are all in minutes, varying from 5:55-8:34 right now, but have been more and less in the past. Refurbished Xeon 2670 computers and chips are available now for low prices that have eight core (16 threads, that SETI@home sees as 16 processors), so my thinking is that a 2670 might be twice as fast, (and it is by itself; see the https://setiathome.berkeley.edu/cpu_list.php). I don't know how fast it would be with a new model GPU. I have been looking at AMD 470 and 480 (Ellesmere) and GTX 1070 as possible companions. A refub 2670 computer would be easy to change from Win 7 to Linux. Best regards, Micky ID: 1821641 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1821644 - Posted: 4 Oct 2016, 5:04:41 UTC - in response to Message 1821641. The deep learning approach I'm very interested in, and hope we can bounce ideas around in the future :) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1821644 ·

Micky Badgero Send message Joined: 26 Jul 16 Posts: 44 Credit: 21,373,673 RAC: 83	Message 1821676 - Posted: 4 Oct 2016, 11:25:23 UTC - in response to Message 1821644. The deep learning approach I'm very interested in, and hope we can bounce ideas around in the future :) Certainly :) ID: 1821676 ·

ausymark Send message Joined: 9 Aug 99 Posts: 95 Credit: 10,175,128 RAC: 0	Message 1822007 - Posted: 5 Oct 2016, 20:15:14 UTC - in response to Message 1820983. Hi Team Just a little update. After a week of seti crunching it seems like my new Sapphire AMD Radeon 480 - 8Gbs is crunching away at around twice the throughput of my old nVidia 570 gpu. Naturally it will take a while for the final figure to stabilise but does make me wonder a little about the graphs showing the 570 being about 90% the performance of the 480. It might be that my original setup was sub-optimal even though nothing else has changed software/hardware-wise. Anyway, just some feedback on my early experience. Cheers Mark ID: 1822007 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1822021 - Posted: 5 Oct 2016, 21:00:41 UTC - in response to Message 1822007. Last modified: 5 Oct 2016, 21:01:51 UTC ...Sapphire AMD Radeon 480 - 8Gbs is crunching away at around twice the throughput of my old nVidia 570 gpu.... Have you noticed it's not being reported by clinfo correctly? At least it's not showing up in the stderr_txt correctly. You might see what running clinfo shows; https://setiathome.berkeley.edu/result.php?resultid=5194870297 Max compute units: 14 Max clock frequency: 555Mhz It should be showing CU: 36 Clock rate: 1266. I see the same readings on all the RX480s running Kernel 4.4.0-38. One person moved the card to Windows and it showed the correct numbers. It also became much faster. My guess is it's a conflict with the AMD Driver and Kernel 4.4.0-38. You can't change the driver, but you can change the Kernel. ID: 1822021 ·

ausymark Send message Joined: 9 Aug 99 Posts: 95 Credit: 10,175,128 RAC: 0	Message 1822164 - Posted: 6 Oct 2016, 10:29:12 UTC - in response to Message 1822021. Hmmm, interesting. Mine is showing the down-rated 14 compute units etc. Ubuntu will be releasing version 16.10 within a couple weeks so I think I will wait till then, do the upgrade, and see what difference it makes. Also by then I should have enough usable data as a base-line to see the difference between the two setups. I will post results here when done. Thx for the heads up Tbar :) ID: 1822164 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1822278 - Posted: 6 Oct 2016, 19:09:06 UTC - in response to Message 1822164. Last modified: 6 Oct 2016, 19:12:35 UTC Ooookay, so, you're going to install an OS that hasn't been tested with the AMD driver. Good luck with that. I ran a search and found a couple articles saying the current driver was mostly tested with Kernel 4.2. You can get Kernel 4.2 by installing Ubuntu 14.04.4, which just happens to be the OS mentioned in the AMD driver material. I also found an article that mentioned there were problems with Kernel 4.4, which is what you are running now. I would suggest installing the software mentioned on the AMD driver page. Anything else and you're just playing Beta tester....or maybe even Alpha ;-) ID: 1822278 ·

ausymark Send message Joined: 9 Aug 99 Posts: 95 Credit: 10,175,128 RAC: 0	Message 1822368 - Posted: 7 Oct 2016, 1:42:54 UTC - in response to Message 1822278. Last modified: 7 Oct 2016, 1:46:44 UTC I am running the latest AMD driver for Ubuntu 16.04 from AMD's site: http://support.amd.com/en-us/kb-articles/Pages/AMD-Radeon-GPU-PRO-Linux-Beta-Driver%E2%80%93Release-Notes.aspx Being a beta/alpha tester has never worried me in the past. In fact I helped to test some of the early Lunatics builds back in the day.... As well as beta apps and OS's.... So, no biggy. Will let you know how I go ;) ID: 1822368 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1822374 - Posted: 7 Oct 2016, 2:05:11 UTC - in response to Message 1822368. Last modified: 7 Oct 2016, 2:16:47 UTC https://www.google.com/search?q=Ellesmere+in+Ubuntu It also doesn't work correctly with a Hawaii; Note the CUs & Clock https://setiathome.berkeley.edu/result.php?resultid=5185878285 Look familiar? Here's the Driver, AMDGPU-Pro Driver Version 16.30 for Ubuntu 14.04.4 Heck, I don't know if that setup will work either, But, it's pretty obvious 16.04 & the driver doesn't work correctly. Have fun. ID: 1822374 ·

ausymark Send message Joined: 9 Aug 99 Posts: 95 Credit: 10,175,128 RAC: 0	Message 1822703 - Posted: 8 Oct 2016, 12:19:59 UTC - in response to Message 1822374. Just installed the 16.10 Beta with the 14.8 kernal, seems as though it is miss reporting the compute units and max speed as per the 16.04 Ubuntu version so it is likely something in the AMD firmware blob. I will wait until they release an updated video driver and see if that fixes things. The system still works, is just performing under what it can do. No biggie, I am patient ;) ID: 1822703 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1822739 - Posted: 8 Oct 2016, 16:10:25 UTC - in response to Message 1822703. I had my own little episode. That big thunderstorm that moved up the other coast caused a few power outages that trashed the one Ubuntu machine with the 6800s. I decided to try a newer version of Ubuntu even though my favorite driver said it only works up to kernel 3.13. I installed Ubuntu 14.04.4 with kernel 4.2, nothing worked. The AMD drivers and the repository driver just gave a Black screen. I had to boot into root and remove the driver to even get the XServer. Same with the 14.04.3.iso and kernel 3.19. With the 14.04.2.iso the repository driver worked, but ran the tasks slowly. So, I went back to the 14.04.1.iso, which has kernel 3.13, and everything works again. For some reason the repository driver was working weird with this machine, so, I went back to the old AMD 14.20 driver. Now it's working fine again. If the AMD driver says it works with OS blah, blah, then it probably won't work very well with a different OS. Sometimes it doesn't even work with the OS it says it will. Fortunately the Ubuntu nVidia drivers don't seem to be OS dependent. ID: 1822739 ·

ausymark Send message Joined: 9 Aug 99 Posts: 95 Credit: 10,175,128 RAC: 0	Message 1822895 - Posted: 9 Oct 2016, 3:37:23 UTC - in response to Message 1822739. When I installed my Sapphire Radeon RX 480 I did a fresh install, even fresh home directory. I reinstalled all apps then restored core data. One of the things I noticed was that my boot times were halved, compared to my original nVidia setup, which was about 4 years old, that included about 7 OS upgrades/updates. So I think there was fair bit of 'garbage' from those years of messing around, including several versions of the nVidia drivers etc. So in a nutshell, assuming you didnt do it, I would advise doing a fresh install, home directory included, to fully rule out conflicts with old setups. Apologies if you did do the above as I dont mean to seem condescending, just trying to help. Cheers Mark ID: 1822895 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1822914 - Posted: 9 Oct 2016, 5:56:46 UTC - in response to Message 1822895. Yep, that machine of mine definitely had a load of software on it. It's the same machine used to build the two Apps at Crunchers Anonymous. Now I'm going to have to reinstall most of it, such a deal. After it became apparent it was toast, I decided to try out the 4 different systems. All of them were clean installs. I had a feeling I'd end up back where I started seeing as how they aren't making drivers for those cards anymore. As the older fglrx drivers don't work in 16.04, I imagine there are going to be a few people making the same decision. Oh well, 14.04 works just fine on my machines. Look at that, the Linux CUDA 42 App has broken a Hundred. Something must have stirred interest again. Well, for the older CUDA cards it is the best App around ;-) ID: 1822914 ·

Arivald Ha'gel Send message Joined: 9 May 03 Posts: 14 Credit: 16,623,619 RAC: 2	Message 1823704 - Posted: 12 Oct 2016, 12:18:27 UTC - in response to Message 1820965. Another thing I found interesting is that the script now checks for traces of "-instances_per_device" -- from what I can tell no hosts I've scanned are running multiple instances (this is probably limited to people running Lunatics which I don't include in this scan). I know the detection works because it picks it up for my own hosts (stock, but 2 tasks per GPU). Funny fact: I'm running Seti@Home on 2 instances per device (I have a single Tahiti right now). I use original app, and I use app_config.xml to set it. So this graph might be little skewed by such action(s). AMD Bonaire should have x1.5 efficiency of AMD Fiji, and for AMD Fiji we can see great differences, so it seems that this script indeed doesn't "catch" some elusive parameters like those defined in app_config (even if it does catch those defined for cmd line). ID: 1823704 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.