Energy Efficiency of Arm Based sysetms over x86 or GPU based systems

Message boards : Number crunching : Energy Efficiency of Arm Based sysetms over x86 or GPU based systems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 12990
Credit: 208,696,464
RAC: 690
Australia
Message 1552395 - Posted: 4 Aug 2014, 23:48:07 UTC - in response to Message 1552384.  
Last modified: 4 Aug 2014, 23:49:09 UTC

Doesn't GFLOPS refer to Billions of Floating-point Operations Per Second. That by definition is a measurement of computational work per second and should be usable to compare several devices of a range of configurations. It of course isn't perfect as each system build will have it's nuances.

Not all FLOPS are equal, different operations have different overheads.
That's why FLOPS, just like the number of WUs per hour (due to the different types of WUs) aren't a good indicator of actual performance.
My present video cards have a much higher FLOP rating than the cards they replace, however the older cards can actually process more WUs per hour than the new ones because the present applications aren't optimised for the new video cards.
However, I can run 3 of my new video cards for less power than one of the old ones used.


As badly screwed as Credit New is, and even with the very lagging nature of RAC, RAC is the best indicator of work done we have.
Unfortunately it's not as good as it once was for comparing between different types of WU, and it's of no use at all for comparing between MB & AP, but it is very good at comparing between similar types of WU.


Can you make up for the less CPU performance with raw numbers and still maintain a good electrical foot print.

With out a doubt (as I mentioned with my new video cards). However similar to games, there will instances where more CPU power will be required to keep the faster video cards busy.
AP WUs are a good example- many people running highend & multiple highend video cards have to leave 1, 2 or even more CPU cores free, just to feed the GPUs.
Grant
Darwin NT
ID: 1552395 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1552389 - Posted: 4 Aug 2014, 23:20:45 UTC - in response to Message 1552293.  

Compared to my Bay Trail-D system.
Application	GFLOPS	Cores	Total GLOPS	System Watts	GFLOPS/Watt
SETI@home v7	 10.25	    4	 41.00		25		2.050
AstroPulse v6	 21.30	    4	 85.20		25		4.260


Hmm, my machine is running somewhat fewer FLOPS than yours for both MB and AP. I haven't worked out how to enable the iGPU for crunching under Linux yet.

I took delivery of an Nvidia Tegra K "Jetson" SDK tonight and should have all the bits needed to run it (HDMI->DVI cable, USB hub, Keyboard+mouse) on next-day delivery tomorrow. First plan is to work out how it runs (it's an ARM version of Ubuntu) and install the latest CUDA libraries. Then, after I've got my hologram reconstructions running on the 192-core Kepler, I'll see if there are all the resources needed to compile BOINC & S@H on it. Watch, as they say, this space.

Must take my Wattmeter back into work next time I have to power down this rig (which is running 143 W ATM, it's usually around 250 W when the GPUs have APs to crunch).
ID: 1552389 · Report as offensive
mavrrick

Send message
Joined: 12 Apr 00
Posts: 17
Credit: 1,894,993
RAC: 9
United States
Message 1552384 - Posted: 4 Aug 2014, 23:11:45 UTC - in response to Message 1552368.  

So are you saying that field is a rating and not a measured value. The fact it indicated a Average would indicate some level of analysis is being done.

To me it looks more like a indication about how many GFLOPS that host is able to achieve when running that application.

The WU per hour is useless for what I am getting at here. I am looking actually computational performance instead of relating it to WU. We all know not all WU are the same. There has to be some way to calculate the amount of work done in a second and I suspect using GFLOPS.

Doesn't GFLOPS refer to Billions of Floating-point Operations Per Second. That by definition is a measurement of computational work per second and should be usable to compare several devices of a range of configurations. It of course isn't perfect as each system build will have it's nuances.

Nothing is going to be exactly perfect. The math I present was based on using 1 100% of each core. My expectation is that the application would only represent 1 core as each application runs as a single thread. So in the math I setup the value for Average GFLOP's is multiplied by the number of cores in the system. If there are any processes within the app that float out from that core then the number will fluctuate a little.

The point here is to evaluate the potential of lowered powered systems compared to Giant number crunchers. Can you make up for the less CPU performance with raw numbers and still maintain a good electrical foot print.
ID: 1552384 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 12990
Credit: 208,696,464
RAC: 690
Australia
Message 1552368 - Posted: 4 Aug 2014, 22:12:37 UTC - in response to Message 1552292.  
Last modified: 4 Aug 2014, 22:16:13 UTC

If you take the "Average GFLOPs" as a way to indicated the processing speed of the device.

There's the problem. GFLOPs is an indicator, but a very, very poor one. Depending on the application being run, a card with a lower GFLOPS rating can process more work per hour than one with a much higher rating.
The number of WUs per hour is a better indicator, but the mix of WUs (VHARs, shorties) makes it difficult to compare things.
Average Processing Rate is a good one as it directly relates to the work being done, unfortunately it isn't accurate as processing more than one WU at a time results in a lower APR, even though the work done per hour is much higher than doing a single WU at a time.
RAC is probably the best indicator, however due to the nature of Credit New (almost completely borked) you can only compare MB to MB, AP to AP. And people that run a mix of the 2 can't really be compared to either (or even each other due to the different mixes).
Grant
Darwin NT
ID: 1552368 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 130
United States
Message 1552293 - Posted: 4 Aug 2014, 19:01:09 UTC
Last modified: 4 Aug 2014, 19:10:41 UTC

Watts per flop or flops per watt is the name of the game.
I think it might be best to calculate the performance of each app.

For one of my i5-4670K systems.
Application	GFLOPS	Cores	Total GLOPS	System Watts	GFLOPS/Watt
SETI@home v7	 42.81	    4	171.24		90		1.903
AstroPulse v6	106.52	    4	426.08		90		4.734


Compared to my Bay Trail-D system.
Application	GFLOPS	Cores	Total GLOPS	System Watts	GFLOPS/Watt
SETI@home v7	 10.25	    4	 41.00		25		2.050
AstroPulse v6	 21.30	    4	 85.20		25		4.260


For ARM system I would do each app version for comparison. One might be better than the others performance wise & that information could be feed back to home base.

EDIT:
I will also add that my low powered system is drawing more than it should. As I have an oversized PSU that is not the most efficient right now. Ideally it would be in the 10-20w range for power consumption.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1552293 · Report as offensive
mavrrick

Send message
Joined: 12 Apr 00
Posts: 17
Credit: 1,894,993
RAC: 9
United States
Message 1552292 - Posted: 4 Aug 2014, 18:54:39 UTC - in response to Message 1552274.  

A few things I though of after posting.

1. Depending on the GPU a significant hike and power usage can occur just be installing a dedicated GPU. The 4870 adds about 90-100 watts to the base systems power draw. So if the card was removed my desktop's power efficiency would increase a far amount. up to around .193 GFlops per watt.

2. There are obviously more power efficient CPU's now then my Phenom 2 6 core cpu. It would be nice to get some comparable numbers with some newer higher end systems.


Here is a interesting point to think about to. If you take the "Average GFLOPs" as a way to indicated the processing speed of the device.
Then in in theory with 8 Ouya's I could generate the same seti@home processing power as my desktop. The Ouya' would only need 40 watts to do so. That is a fair amount of power savings, and possibly heat savings. And if doing the same work should receive the same RAC.
ID: 1552292 · Report as offensive
mavrrick

Send message
Joined: 12 Apr 00
Posts: 17
Credit: 1,894,993
RAC: 9
United States
Message 1552274 - Posted: 4 Aug 2014, 17:49:04 UTC

I brought this up a few months ago and didn't get real far with the analysis. For some reason my interest in this topic was sparked again recently.

So since the last time I brought this up I have been letting my Ouya run Seti@home full time and have racked up a fair amount of credit. I am not really interested in going crazy with it. I am just letting it run. Only thing I did was move the little box next to a system with a active fan with cooling. It has run well and for the little power it uses 4.5-5 watts I am very impressed.

So now to the point of all this. While I was looking over the host information I found information referring to Average GFLOP based on the app running. I also found what is labeled as Device Peak Flops, on the task information for my computer and the processed work units. Interestingly enough the two aren't very close.

The one that seems to correlate best with the time the Wu takes is the Average GFLOP number. So my first question is does anyone clearly understand what that number is. Like what is being used to create it.

So my math is pretty simple. Take the Average GFLOP value, multiply it by the number of cores the system has, then device that by the aprox. power usage of the device getting as close as I can.

I don't expect this to be exact but it should be a fairly decent approximation of the performance per watt. or more specifically GFLOPS per watt.

Unfortunately I was only able to get fairly perciese power numbers for a few systems: My desktop and the Ouya. My desktop is a Phenom II 6 core system and a radeon 4870. Nothing cutting edge but also a big enough system to churn through some work if I want to.

So the numbers:
Ouya got a average Gflops across the 4 apps it had used of 1.515 with 4 cores running at 5 watt was getting about 1.212 GFLOPS per watt

My desktop got a average of GFLOPS across 1 app of 8.06 GFLOPS, and had 6 cores. The desktop functioned at around 360 watts of energy which gives it 0.13433 GFlops per watt. The Radeon 4870 did a AstroPulse WU and achieved a average GFlops of 101.38, and used about 60 Watts of energy giving it the best performance per watt of 1.689. The catch though is that you have to have the rest of the computer running. If you combine the performance values for the CPU and GPU the performance per watt drops down to 0.3565

I suspect the most efficient option is a lower power CPU with a really good GPU. As long as the CPU can feed the data needed to keep the GPU happy and feed to chug away that may produce the best performance per watt results.

Now of course this doesn't account for RAC at all. Just simply the amount of work processing work a device can do compared to the power consumed.
ID: 1552274 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Energy Efficiency of Arm Based sysetms over x86 or GPU based systems


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.