MB v8: CPU vs GPU (in terms of efficiency)

Author	Message
qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1815733 - Posted: 8 Sep 2016, 14:41:41 UTC Last modified: 8 Sep 2016, 14:43:58 UTC The question about the efficiency of current GPU apps has been raised in another thread but since it's quite OT there I thought I make a new thread for this. I run a (dedicated) main cruncher with a GTX 750, known to be one of the most efficient cards out there. The TDP is 55 watts. The CPU on this machine is not used for crunching, it's reserved to feed the GPU. http://setiathome.berkeley.edu/show_host_detail.php?hostid=7563243 From time to time I crunch a bit with one of my laptops, which has a Intel N3520, a CPU that is also known for a very good power/watt ratio. The TDP is 7.5 watts. http://setiathome.berkeley.edu/show_host_detail.php?hostid=7433880 I run the apps from the latest lunatics installer on CPU and Raistmers latest openCL build (r3525) on GPU. Seeing the task times on both machines I was wondering which one is more efficient. So time to do the math. (I know that Guppies are currently not handled very well by GPUs, so I'm talking about Arecibo tasks only) Here are 2 examples of tasks with a very similar angle range, the first one crunched on my lappy, the second one on my main machine: http://setiathome.berkeley.edu/result.php?resultid=5143371920 WU true angle range is : 0.423457 5 hours 5 min 18 sec http://setiathome.berkeley.edu/result.php?resultid=5141706081 WU true angle range is : 0.423120 42 min 12 sec Now we need to take into account that my lappy runs 4 tasks at a time, while I only do 2 at a time on my GTX750. That means that we have to double the time from the GPU. So it takes my main cruncher ~84 min to do the same work my lappy does in ~ 305 min. That means that the GPU crunches faster by a factor of 3.63. BUT, the GPU uses much more power to do this, the factor here is 7.33 (55/7.5). So in the end, the CPU is more then twice (2.02) as efficient as the GPU! That's quite a surprise for me, AFAIR it was quite the opposite for v7. So at the moment, if efficiency is your main goal, you may think about using your CPU instead of your GPU / build a CPU monster cruncher instead of a multi GPU machine. (Ofc it all depends on the type of CPU/GPU). PS: I will try to check other ARs soon. ID: 1815733 ·

AMDave Volunteer tester Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0	Message 1815753 - Posted: 8 Sep 2016, 16:03:40 UTC - in response to Message 1815733. Now we need to take into account that my lappy runs 4 tasks at a time, while I only do 2 at a time on my GTX750. That means that we have to double the time from the GPU. So it takes my main cruncher ~84 min to do the same work my lappy does in ~ 305 min. That means that the GPU crunches faster by a factor of 3.63. BUT, the GPU uses much more power to do this, the factor here is 7.33 (55/7.5). So in the end, the CPU is more then twice (2.02) as efficient as the GPU! Your math is inaccurate.Â Â WU completion times are not linear when concurrency is increased.Â Â Assuming both machines run only GPU WUs, and similar ARs (ie VLAR to VLAR, VHAR to VHAR, Mid to Mid), you can't state that doubling the completion time of 2 concurrent WUs run on machine A should equal the completion time of 4 concurrent WUs on machine B.Â Â This would be true even if both machines were identical, and here they are not. More often than not, increasing concurrency is less than linear (or < 1:1).Â Â For example,Â a single WU may complete in 20 min, 2 concurrent WUs may complete in 37 min, and 3 concurrent WUs may complete in 52 min.Â Â This will follow up to a point, wherein The Law of Diminishing Returns takes over.Â Â Because of the multitude of hardware configurations, where this point lies is different for every machine.Â Â If you read other threads, you'll see that for some with a GTX 750 Ti, it is 3 WUs, while for some with a GTX 1060, it is 2 WU. ID: 1815753 ·

qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1815768 - Posted: 8 Sep 2016, 18:10:04 UTC Last modified: 8 Sep 2016, 18:12:10 UTC Dave, I think you got me wrong there. Maybe my fault, I can't explain things in english as well as I can in german;-) I suppose with "concurrency" you mean running 4 tasks at a time on GPU instead of just 2, right? But that's not what I'm talking about. The CPU in my lappy has 4 cores. So if I wanna use its full capacity, I can only run 4 tasks at a time. AFAIK it's not possible to use more then 1 core for a task. So, I have 4 tasks on CPU and 2 Tasks on GPU. If I wanna compare those, I have to scale. I can do this by either multiply the times from the GPU by 2 or divide the times from the lappy by 2. Or, to put it another way, imagine I have 2 exact same main crunchers with the same GTX 750. If I would run the same two tasks on each of this two machines, it should take each of those machines exactly the same amount of time to finish those tasks (at least in theory, practically they may differ by a few seconds). ID: 1815768 ·

AMDave Volunteer tester Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0	Message 1815797 - Posted: 8 Sep 2016, 20:48:21 UTC - in response to Message 1815768. Dave, I think you got me wrong there. Maybe my fault, I can't explain things in english as well as I can in german;-) I suppose with "concurrency" you mean running 4 tasks at a time on GPU instead of just 2, right? But that's not what I'm talking about. Yes, that's how I understood it.Â Â Thanks for the clarification. ID: 1815797 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1815815 - Posted: 8 Sep 2016, 23:42:33 UTC - in response to Message 1815733. Last modified: 9 Sep 2016, 0:03:17 UTC I normally calculate the Watt Hours per task for my devices to rank them by their efficiency. I have done a few posts about the method I use here & also here. Using the TDP, run time, & # of tasks per device for your CPU & GPU I get. Note: I rounded the task times to the nearest minute. Also this calculation expects the run times to be when the specified number of tasks were running. Device Watts # Tasks Run Time Task/hr Task/day Wh/Day Wh/Task GTX 750 55 2 42 2.857142 68.571428 1320 19.25 N3520 7.5 4 305 0.786885 18.885245 180 9.53 Which would make the N3520 ~2.02 times as efficient as the GTX 750, but it completes less than a third as much work in a given day. However, actual GPU power consumption is typically lower than the rated TDP value when processing tasks. A good rule of thumb is to use ~80%. Many have found that their GTX 750's run in the 40-45W range. If we figure 80% of 55W that does happen to be 44W. Which would give the GTX 750 a Wh/Task of 15.4 instead of 19.25 & make the N3520 only ~1.62 times as efficient. I should also add that my GTX 750 Ti FTW would complete two 0.42AR tasks at once in ~25 min while using ~45w. Giving it a Wh/Task of 9.38. This type of calculation does not take into account the whole power usage of the system the device is in. To do that you would really need to use a power meter, or UPS with a power usage display, and measure each system at idle and then when processing tasks. The delta could then be used to calculate Wh/Task. Task/hr = (60/Run Time)/# Tasks Task/day = Task/hr * 24 Wh/Day = Watts * 24 Wh/Task = (Wh/Day)/(Task/day) SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1815815 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 1815847 - Posted: 9 Sep 2016, 4:48:50 UTC Last modified: 9 Sep 2016, 5:01:40 UTC Lets take your N3520, it can do ~2.75 GFLOPS on average per second, each core takes 1.875W to operate, gives us ~1.47GFLOPS/W. The 750 does ~3GFLOPS per second, and a 55 TDP gives us 0.0545 GFLOPS/W........... I will link you to this message on the forum. link and quote of it: Actually I think its lower. If we take a sample of 2 tasks from my GT 840m, I can see that avg per second is 11.26 gflops or 13.35 gflops, taken from the flop counter. Lets take the higher value, BOINC tells me that the device can peak 863 gflops, yet averaged with time my dGPU outputs 13.35 gflops, that is about 1.6% 5/09/2016 21:22:02 PM \| \| CUDA: NVIDIA GPU 0: GeForce 840M (driver version 362.00, CUDA version 8.0, compute capability 5.0, 2048MB, 1679MB available, 863 GFLOPS peak) Task 1 Task 2 This 'test' is running 1 task at a time, according to the internal flop counter. And the subsequent reply link: Last time I checked it was using as TDP demanded. The GT 840m is a 35W(depending on which brand of laptop) part. Last I measured by hooking up probes(not by me, by my prof) it was hitting somewhere between 30W and 34.8W, so can't get anymore out of it. GPU-z measures ~95% avg usage. Not sure how accurate as I believe it measures only first CU. Yay for being a student at WSU ID: 1815847 ·

M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8	Message 1815848 - Posted: 9 Sep 2016, 4:50:56 UTC - in response to Message 1815815. Last modified: 9 Sep 2016, 5:16:27 UTC Just to mention, if efficiency is a primary concern, undervolting and underclocking your GPU could significantly boost its power efficiency. For example, if you underclock your GPU by just 10% (and undervolt by another 10-15%, actually as much as you can to still keep it 100% stable), your GPU power usage will go down by 25-30%. This is the primary way how mobile GPUs are selected, testing them slightly on lower clock and much lower voltage. On other hand, this means that overclocking (especially with overvolting) is significantly decreasing the power efficiency, which is nothing new but people usually overlook. Worth mentioning is also that GPU apps are far away from their optimal efficiency, which is not so much case for CPU apps. For example, Petri33 custom optimized nV GPU application is 2-2.5x more efficient (and 2.5-3x faster) then standard app, and he is convinced there is still space for further improvement. Reason is that it is much harder to properly optimize GPU applications, due to GPUs heavy parallelism and various architectures. ID: 1815848 ·

George 254 Volunteer tester Send message Joined: 25 Jul 99 Posts: 155 Credit: 16,507,264 RAC: 19	Message 1815852 - Posted: 9 Sep 2016, 5:26:32 UTC - in response to Message 1815733. qbit Thanks for your post. Clicked on the task links and got: No such task: 5141706081 No such task: 5143371920 ???? ID: 1815852 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1815859 - Posted: 9 Sep 2016, 6:58:14 UTC - in response to Message 1815733. http://setiathome.berkeley.edu/result.php?resultid=5141706081 WU true angle range is : 0.423120 42 min 12 sec At present i'm running my GTX 750Tis using the SoG application, with a modified version of one of the suggested command lines. Running 1 WU at a time, they're crunching most Arecibo WUs in 13-14 min. The highest peak I've seen for power consumption is around 85%, generally it's around 70-75% (42-45W). Grant Darwin NT ID: 1815859 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1815870 - Posted: 9 Sep 2016, 7:44:18 UTC - in response to Message 1815815. Last modified: 9 Sep 2016, 7:45:09 UTC This type of calculation does not take into account the whole power usage of the system the device is in. To do that you would really need to use a power meter, or UPS with a power usage display, and measure each system at idle and then when processing tasks. The delta could then be used to calculate Wh/Task. And that's quite important part. If some device completes task much faster - it needs correspondingly less amount of time when whole system powered ON. If very low-power device takes much longer to complete the same task - it requires full system support (with all its energy consuming overhead) through all this long time. Hence, w/o accounting for whole system energy consumption overhead such CPU vs GPU efficiency comparisons are quite biased IMO and don't show real benefits of fast GPU computing. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1815870 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1815876 - Posted: 9 Sep 2016, 8:44:50 UTC In Another thread i posted this that reflects how much juice it's required for work: +-----------------------------------------------------------------------------+ \| NVIDIA-SMI 367.35 Driver Version: 367.35 \| \|-------------------------------+----------------------+----------------------+ \| GPU Name Persistence-M\| Bus-Id Disp.A \| Volatile Uncorr. ECC \| \| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \| \|===============================+======================+======================\| \| 0 GeForce GTX 750 Ti Off \| 0000:01:00.0 Off \| N/A \| \| 39% 57C P0 23W / 46W \| 1016MiB / 1998MiB \| 100% Default \| +-------------------------------+----------------------+----------------------+ \| 1 GeForce GTX 750 Ti Off \| 0000:02:00.0 Off \| N/A \| \| 40% 58C P0 27W / 46W \| 1016MiB / 2000MiB \| 99% Default \| +-------------------------------+----------------------+----------------------+ \| 2 GeForce GTX 750 Ti Off \| 0000:04:00.0 Off \| N/A \| \| 38% 53C P0 25W / 46W \| 1016MiB / 2000MiB \| 100% Default \| +-------------------------------+----------------------+----------------------+ \| 3 GeForce GTX 750 Ti Off \| 0000:05:00.0 Off \| N/A \| \| 37% 51C P0 24W / 46W \| 1016MiB / 2000MiB \| 98% Default \| +-------------------------------+----------------------+----------------------+ It seems like each card consumes about 25W when crunching on my Quad 750TI host. http://setiathome.berkeley.edu/show_host_detail.php?hostid=8053171 But then again you need to take into account that you need a computer "around the cards" to drive it. But i presume that computer consumes around 200W at the wall but i can't confirm it though. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1815876 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1816031 - Posted: 10 Sep 2016, 0:00:10 UTC - in response to Message 1815870. This type of calculation does not take into account the whole power usage of the system the device is in. To do that you would really need to use a power meter, or UPS with a power usage display, and measure each system at idle and then when processing tasks. The delta could then be used to calculate Wh/Task. And that's quite important part. If some device completes task much faster - it needs correspondingly less amount of time when whole system powered ON. If very low-power device takes much longer to complete the same task - it requires full system support (with all its energy consuming overhead) through all this long time. Hence, w/o accounting for whole system energy consumption overhead such CPU vs GPU efficiency comparisons are quite biased IMO and don't show real benefits of fast GPU computing. I find it most useful to calculate the CPU & GPU in the same system for each app or type of work from a project. Then I can use the information to determine which device in the system is most suited to running a given type of work. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1816031 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1816133 - Posted: 10 Sep 2016, 11:22:14 UTC - in response to Message 1816031. This type of calculation does not take into account the whole power usage of the system the device is in. To do that you would really need to use a power meter, or UPS with a power usage display, and measure each system at idle and then when processing tasks. The delta could then be used to calculate Wh/Task. And that's quite important part. If some device completes task much faster - it needs correspondingly less amount of time when whole system powered ON. If very low-power device takes much longer to complete the same task - it requires full system support (with all its energy consuming overhead) through all this long time. Hence, w/o accounting for whole system energy consumption overhead such CPU vs GPU efficiency comparisons are quite biased IMO and don't show real benefits of fast GPU computing. I find it most useful to calculate the CPU & GPU in the same system for each app or type of work from a project. Then I can use the information to determine which device in the system is most suited to running a given type of work. Yes, if system overhead power remains the same, needed corrections can be devised from pure device power data. But they are needed still, especially if throughput of devices in comparison differs by order of magnitude or more. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1816133 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20258 Credit: 7,508,002 RAC: 20	Message 1816386 - Posted: 11 Sep 2016, 13:18:05 UTC Very good theorizing... However the best test is to test reality directly: Directly measure power consumed as measured at your mains power wall socket over for example 48 hours and divide by total WUs or RAC. You should get some interesting numbers, especially comparing the WU and RAC values. Happy efficient crunchin! Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 1816386 ·

qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1816401 - Posted: 11 Sep 2016, 14:11:08 UTC Just for clarification: I can't provide "real" numbers for the whole systems because I lack the tools to measure real power usage. And I have no plans to get some because the cheap ones are pretty inaccurate and the professional ones are too expensive to buy just for fun. But if anybody has such tools, feel free to share your findings here. ID: 1816401 ·

M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8	Message 1816463 - Posted: 11 Sep 2016, 17:37:35 UTC - in response to Message 1816401. Last modified: 11 Sep 2016, 18:36:42 UTC I think even the cheap power meters ($15-20) should be accurate enough to measure average power consumption, so why not try? My measurements at wall socket are below (I also have a APC SmartUPS that also draws some 5% from shown below). My PC idle (i.e. ordinary desktop work, websurf etc) with 24" LCD is around 170W (100W in real idle with monitor sleeping). With S&H running just on CPU, power draw is around 275W. (i7-2600k, overclocked to 4.5GHz) With S&H running CPU + GTX1080, power draw is around 390W. So GTX1080 is responsible for around 115W draw, which is around 64% of its TDP, close to what I get as report from GPU-Z average power consumption. ID: 1816463 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.