Energy Efficiency of Arm Based sysetms over x86 or GPU based systems

Message boards : Number crunching : Energy Efficiency of Arm Based sysetms over x86 or GPU based systems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1553089 - Posted: 6 Aug 2014, 23:04:20 UTC - in response to Message 1553085.  

I also like what you talked about with the Baytrail-D system. If you don't mind me asking which ones do you have?

Mine is just something eBuyer had on special a couple of months ago for £130: Acer Aspire XC-603 Desktop PC. It took a bit of effort to find something that would boot on it -- I ended up with Centos 7. I could have tried to put corporate Windows 7 on it, but several comments said that it was difficult to get the drivers right. I wouldn't mind putting the wattmeter on it too. As I mentioned earlier, I've not found out if it's possible to get the iGPU crunching too under Linux -- BOINC says there's no GPU.
ID: 1553089 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1553091 - Posted: 6 Aug 2014, 23:30:32 UTC - in response to Message 1553085.  

Well I was just looking over some docs and saw a peak power draw into the 40's. I am not as much into the hardware as I use to be and it got me thinking what would probably be the biggest consumer of power. The peripherals are a given as that was brought up on the doc I was looking over about power, but I was also wondering how they quantified typical real work load.

I was just thinking that running a CUDA SETI@home app isn't typical real world load.

10 Watts seems very reasonable to me

The comment about a Jetson TK1 cluster was really about the two ways I see to increase efficiency. You either get a faster cpu's/GPU that do the work in a shorter amount of time or you get many smaller cpu's that don't use as much energy and do more WU at one time just each takes longer.

Goes back to the question about what is more efficient. If it only takes 4 TK1's to complete the same amount of work as a regular high end desktop and it uses 1/5 the power to run those WU then that would be the most energy efficient way to go. I was being sarcastic but that was my thought when I said it. Someone on the TK1 developer forums has a cluster of like 9 nodes setup.

I may be the minority on this site, but I don't run my systems 24/7 anymore. I have one box at my house that is and it is my server, so it has to be up. The rest are just clients so I got all the power saving goodness setup on them, and let them sleep normally if not being used. The second most used system in a HTPC which was built to be rather power efficient, although I am sure I could do better now. This is what drives this for me. I would love to still contribute, but want to do it in a rather green manor. ARM Devices that are low power seem to be possibly the best option. Just hoping it can keep the power company from raiding my wallet. You could almost say this is research to see if I can find a way for me to get back into contributing :) much again.

I also like what you talked about with the Baytrail-D system. If you don't mind me asking which ones do you have?

I have an ASRock Q1900-ITX. I put the specs for it in the most recent show us your crunchers thread here.
I like to get hardware that has the best performance per watt, for the amount of money I want to spend, & is withing the power usage I like. For my gaming system I go for cards that are 150w. Most of the time the 150w ATI cards are one of the most efficient in PPW as well. So that works out nice for me.
I had been looking at the other Silvermont processors to replace an older machine, but the Avoton server based systems were just more than I wanted to spend. I was looking for the same or greater performance & power consumption of 50% or lower than the older machine. The Bay Trail-D systems meet both of those requirements, the board was fairly cheap, & I was able to use some old memory I had laying around. Now I just need to find a good power supply to use with it that is in the right power range & is efficient.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1553091 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1553101 - Posted: 7 Aug 2014, 0:18:42 UTC - in response to Message 1552992.  

This assumes you've installed the CUDA SDK and added the appropriate locations to your PATH and LD_LIBRARY_PATH environment variables, but that's well-covered in the Nvidia documentation.

As I've just been reminded, it's a good idea also to set up the system-wide pointer to the libs with ldconfig, in case you forget to set up LD_LIBRARY_PATH or start up boinc from an account/shell that doesn't set it automatically -- in these cases the libraries aren't found and the jobs all die...
ubuntu@tegra-ubuntu:~/BOINC$ ldd projects/setiathome.berkeley.edu/setiathome_x41zc_armv7l-unknown-linux-gnu_cuda60
	libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb66cd000)
	libcudart.so.6.0 => not found
	libcufft.so.6.0 => not found
	libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb6621000)
	libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb65b5000)
	libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb6594000)
	libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb64ac000)
	/lib/ld-linux-armhf.so.3 (0xb6704000)
ubuntu@tegra-ubuntu:~/BOINC$ sudo nano /etc/ld.so.conf.d/cuda.conf
... [edit file here]
ubuntu@tegra-ubuntu:~/BOINC$ cat /etc/ld.so.conf.d/cuda.conf
# cuda default configuration
/usr/local/cuda/lib

ubuntu@tegra-ubuntu:~/BOINC$ sudo ldconfig
ubuntu@tegra-ubuntu:~/BOINC$ ldd projects/setiathome.berkeley.edu/setiathome_x41zc_armv7l-unknown-linux-gnu_cuda60
	libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb6700000)
	libcudart.so.6.0 => /usr/local/cuda/lib/libcudart.so.6.0 (0xb66b6000)
	libcufft.so.6.0 => /usr/local/cuda/lib/libcufft.so.6.0 (0xb4b7f000)
	libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb4ad4000)
	libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb4a68000)
	libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb4a47000)
	libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb495f000)
	/lib/ld-linux-armhf.so.3 (0xb6738000)
	libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb4954000)
	librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0xb4946000)

ID: 1553101 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1553356 - Posted: 7 Aug 2014, 20:18:13 UTC - in response to Message 1552955.  
Last modified: 7 Aug 2014, 20:19:48 UTC

The first WU has just finished; Run time 50 min 57 sec, CPU time 21 min 32 sec. Not validated yet. Run time is just about twice what I'm currently achieving with the 750 Ti, but that's running two at once.

Interestingly, I decided about midnight last night to test whether there's a memory bandwidth problem within the Jetson by changing to run two WUs at a time. The interesting part is that for the next few WUs, from the same tape and for about the same credit, the run-time per WU increased by 50% (not 100%) but the reported CPU time fell to 50%. I take the sublinear increase in real time to mean a bottleneck that's alleviated by crunching another thread while memory transfers(?) stall a thread (remember this GPU doesn't have separate RAM, it uses part of system memory AFAICT). The decrease in CPU time perhaps implies some busy-waiting -- the CPU time for two instances is the same as for a single one.
The run-times are starting to fluctuate as more varied WUs arrive. I think I'll let it stabilise for a while and then explore three simultaneous WUs, but cynicism and experience suggest that there's little more to be gained there.
ID: 1553356 · Report as offensive
mavrrick

Send message
Joined: 12 Apr 00
Posts: 17
Credit: 1,894,993
RAC: 4
United States
Message 1557399 - Posted: 15 Aug 2014, 17:32:43 UTC
Last modified: 15 Aug 2014, 17:37:21 UTC

Ivan,

Have you thought about compileing the apps to run on the CPU, to get some performance values of the individual Cortex a15 CPU's?

**Nevermind I see you already have some CPU based WU starting to process on it now**
ID: 1557399 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1558710 - Posted: 18 Aug 2014, 14:00:16 UTC - in response to Message 1557399.  

Ivan,

Have you thought about compileing the apps to run on the CPU, to get some performance values of the individual Cortex a15 CPU's?

**Nevermind I see you already have some CPU based WU starting to process on it now**

I haven't had any luck with it. Runs OK standalone, but when I fire it up under BOINC it errors out.
Now the SSD I was using to offload the stress on the limited onboard storage has stopped working, so that project is out for the week (I have to go to a meeting Up North).
ID: 1558710 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1559119 - Posted: 19 Aug 2014, 12:16:09 UTC - in response to Message 1558710.  

Now the SSD I was using to offload the stress on the limited onboard storage has stopped working, so that project is out for the week (I have to go to a meeting Up North).

Turns out not to be the SSD, I just lost the mount points for it. I can manually mount the partitions -- I'll sort out a more permanent arrangement next week. Then I couldn't do name-server lookups so time was way out and no connections to s@h -- fixed that by manually configuring /etc/resolv.conf, ubuntu is doing some weird things in dnsmasq.
ID: 1559119 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Energy Efficiency of Arm Based sysetms over x86 or GPU based systems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.