Energy Efficiency of Arm Based sysetms over x86 or GPU based systems

Message boards : Number crunching : Energy Efficiency of Arm Based sysetms over x86 or GPU based systems
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1559119 - Posted: 19 Aug 2014, 12:16:09 UTC - in response to Message 1558710.  

Now the SSD I was using to offload the stress on the limited onboard storage has stopped working, so that project is out for the week (I have to go to a meeting Up North).

Turns out not to be the SSD, I just lost the mount points for it. I can manually mount the partitions -- I'll sort out a more permanent arrangement next week. Then I couldn't do name-server lookups so time was way out and no connections to s@h -- fixed that by manually configuring /etc/resolv.conf, ubuntu is doing some weird things in dnsmasq.
ID: 1559119 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1558710 - Posted: 18 Aug 2014, 14:00:16 UTC - in response to Message 1557399.  

Ivan,

Have you thought about compileing the apps to run on the CPU, to get some performance values of the individual Cortex a15 CPU's?

**Nevermind I see you already have some CPU based WU starting to process on it now**

I haven't had any luck with it. Runs OK standalone, but when I fire it up under BOINC it errors out.
Now the SSD I was using to offload the stress on the limited onboard storage has stopped working, so that project is out for the week (I have to go to a meeting Up North).
ID: 1558710 · Report as offensive
mavrrick

Send message
Joined: 12 Apr 00
Posts: 17
Credit: 1,894,993
RAC: 9
United States
Message 1557399 - Posted: 15 Aug 2014, 17:32:43 UTC
Last modified: 15 Aug 2014, 17:37:21 UTC

Ivan,

Have you thought about compileing the apps to run on the CPU, to get some performance values of the individual Cortex a15 CPU's?

**Nevermind I see you already have some CPU based WU starting to process on it now**
ID: 1557399 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1553356 - Posted: 7 Aug 2014, 20:18:13 UTC - in response to Message 1552955.  
Last modified: 7 Aug 2014, 20:19:48 UTC

The first WU has just finished; Run time 50 min 57 sec, CPU time 21 min 32 sec. Not validated yet. Run time is just about twice what I'm currently achieving with the 750 Ti, but that's running two at once.

Interestingly, I decided about midnight last night to test whether there's a memory bandwidth problem within the Jetson by changing to run two WUs at a time. The interesting part is that for the next few WUs, from the same tape and for about the same credit, the run-time per WU increased by 50% (not 100%) but the reported CPU time fell to 50%. I take the sublinear increase in real time to mean a bottleneck that's alleviated by crunching another thread while memory transfers(?) stall a thread (remember this GPU doesn't have separate RAM, it uses part of system memory AFAICT). The decrease in CPU time perhaps implies some busy-waiting -- the CPU time for two instances is the same as for a single one.
The run-times are starting to fluctuate as more varied WUs arrive. I think I'll let it stabilise for a while and then explore three simultaneous WUs, but cynicism and experience suggest that there's little more to be gained there.
ID: 1553356 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1553101 - Posted: 7 Aug 2014, 0:18:42 UTC - in response to Message 1552992.  

This assumes you've installed the CUDA SDK and added the appropriate locations to your PATH and LD_LIBRARY_PATH environment variables, but that's well-covered in the Nvidia documentation.

As I've just been reminded, it's a good idea also to set up the system-wide pointer to the libs with ldconfig, in case you forget to set up LD_LIBRARY_PATH or start up boinc from an account/shell that doesn't set it automatically -- in these cases the libraries aren't found and the jobs all die...
ubuntu@tegra-ubuntu:~/BOINC$ ldd projects/setiathome.berkeley.edu/setiathome_x41zc_armv7l-unknown-linux-gnu_cuda60
	libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb66cd000)
	libcudart.so.6.0 => not found
	libcufft.so.6.0 => not found
	libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb6621000)
	libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb65b5000)
	libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb6594000)
	libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb64ac000)
	/lib/ld-linux-armhf.so.3 (0xb6704000)
ubuntu@tegra-ubuntu:~/BOINC$ sudo nano /etc/ld.so.conf.d/cuda.conf
... [edit file here]
ubuntu@tegra-ubuntu:~/BOINC$ cat /etc/ld.so.conf.d/cuda.conf
# cuda default configuration
/usr/local/cuda/lib

ubuntu@tegra-ubuntu:~/BOINC$ sudo ldconfig
ubuntu@tegra-ubuntu:~/BOINC$ ldd projects/setiathome.berkeley.edu/setiathome_x41zc_armv7l-unknown-linux-gnu_cuda60
	libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb6700000)
	libcudart.so.6.0 => /usr/local/cuda/lib/libcudart.so.6.0 (0xb66b6000)
	libcufft.so.6.0 => /usr/local/cuda/lib/libcufft.so.6.0 (0xb4b7f000)
	libstdc++.so.6 => /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 (0xb4ad4000)
	libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb4a68000)
	libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb4a47000)
	libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb495f000)
	/lib/ld-linux-armhf.so.3 (0xb6738000)
	libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb4954000)
	librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0xb4946000)

ID: 1553101 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 130
United States
Message 1553091 - Posted: 6 Aug 2014, 23:30:32 UTC - in response to Message 1553085.  

Well I was just looking over some docs and saw a peak power draw into the 40's. I am not as much into the hardware as I use to be and it got me thinking what would probably be the biggest consumer of power. The peripherals are a given as that was brought up on the doc I was looking over about power, but I was also wondering how they quantified typical real work load.

I was just thinking that running a CUDA SETI@home app isn't typical real world load.

10 Watts seems very reasonable to me

The comment about a Jetson TK1 cluster was really about the two ways I see to increase efficiency. You either get a faster cpu's/GPU that do the work in a shorter amount of time or you get many smaller cpu's that don't use as much energy and do more WU at one time just each takes longer.

Goes back to the question about what is more efficient. If it only takes 4 TK1's to complete the same amount of work as a regular high end desktop and it uses 1/5 the power to run those WU then that would be the most energy efficient way to go. I was being sarcastic but that was my thought when I said it. Someone on the TK1 developer forums has a cluster of like 9 nodes setup.

I may be the minority on this site, but I don't run my systems 24/7 anymore. I have one box at my house that is and it is my server, so it has to be up. The rest are just clients so I got all the power saving goodness setup on them, and let them sleep normally if not being used. The second most used system in a HTPC which was built to be rather power efficient, although I am sure I could do better now. This is what drives this for me. I would love to still contribute, but want to do it in a rather green manor. ARM Devices that are low power seem to be possibly the best option. Just hoping it can keep the power company from raiding my wallet. You could almost say this is research to see if I can find a way for me to get back into contributing :) much again.

I also like what you talked about with the Baytrail-D system. If you don't mind me asking which ones do you have?

I have an ASRock Q1900-ITX. I put the specs for it in the most recent show us your crunchers thread here.
I like to get hardware that has the best performance per watt, for the amount of money I want to spend, & is withing the power usage I like. For my gaming system I go for cards that are 150w. Most of the time the 150w ATI cards are one of the most efficient in PPW as well. So that works out nice for me.
I had been looking at the other Silvermont processors to replace an older machine, but the Avoton server based systems were just more than I wanted to spend. I was looking for the same or greater performance & power consumption of 50% or lower than the older machine. The Bay Trail-D systems meet both of those requirements, the board was fairly cheap, & I was able to use some old memory I had laying around. Now I just need to find a good power supply to use with it that is in the right power range & is efficient.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1553091 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1553089 - Posted: 6 Aug 2014, 23:04:20 UTC - in response to Message 1553085.  

I also like what you talked about with the Baytrail-D system. If you don't mind me asking which ones do you have?

Mine is just something eBuyer had on special a couple of months ago for £130: Acer Aspire XC-603 Desktop PC. It took a bit of effort to find something that would boot on it -- I ended up with Centos 7. I could have tried to put corporate Windows 7 on it, but several comments said that it was difficult to get the drivers right. I wouldn't mind putting the wattmeter on it too. As I mentioned earlier, I've not found out if it's possible to get the iGPU crunching too under Linux -- BOINC says there's no GPU.
ID: 1553089 · Report as offensive
mavrrick

Send message
Joined: 12 Apr 00
Posts: 17
Credit: 1,894,993
RAC: 9
United States
Message 1553085 - Posted: 6 Aug 2014, 22:43:06 UTC - in response to Message 1553058.  

Well I was just looking over some docs and saw a peak power draw into the 40's. I am not as much into the hardware as I use to be and it got me thinking what would probably be the biggest consumer of power. The peripherals are a given as that was brought up on the doc I was looking over about power, but I was also wondering how they quantified typical real work load.

I was just thinking that running a CUDA SETI@home app isn't typical real world load.

10 Watts seems very reasonable to me

The comment about a Jetson TK1 cluster was really about the two ways I see to increase efficiency. You either get a faster cpu's/GPU that do the work in a shorter amount of time or you get many smaller cpu's that don't use as much energy and do more WU at one time just each takes longer.

Goes back to the question about what is more efficient. If it only takes 4 TK1's to complete the same amount of work as a regular high end desktop and it uses 1/5 the power to run those WU then that would be the most energy efficient way to go. I was being sarcastic but that was my thought when I said it. Someone on the TK1 developer forums has a cluster of like 9 nodes setup.

I may be the minority on this site, but I don't run my systems 24/7 anymore. I have one box at my house that is and it is my server, so it has to be up. The rest are just clients so I got all the power saving goodness setup on them, and let them sleep normally if not being used. The second most used system in a HTPC which was built to be rather power efficient, although I am sure I could do better now. This is what drives this for me. I would love to still contribute, but want to do it in a rather green manor. ARM Devices that are low power seem to be possibly the best option. Just hoping it can keep the power company from raiding my wallet. You could almost say this is research to see if I can find a way for me to get back into contributing :) much again.

I also like what you talked about with the Baytrail-D system. If you don't mind me asking which ones do you have?
ID: 1553085 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1553058 - Posted: 6 Aug 2014, 21:49:55 UTC - in response to Message 1553006.  

That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads.

The more I look at Nvidia's support page this doesn't make much since. I don't think this applies to pushing the Cuda cores to their limit. It would be interesting to get a power meter on it to see what it's usage is.

The Jetson docs I was reading yesterday said that total at-the-wall consumption was (IIRC) 10.somethng W. Then it went through the chain describing the inefficiencies (20% loss in the power brick, etc...). It did stress that because it was a development chip the peripherals hadn't been chosen for low power consumption. Next time I power down my home system I'll remove my power-meter and apply it to the Jetson instead.
Remember, though, that the Jetson runs its cores at a lower frequency than many PCI-e video cards and the memory bus is narrower, which drops power consumption.
/home/ubuntu/CUDA-SDK/NVIDIA_CUDA-6.0_Samples/bin/armv7l/linux/release/gnueabihf/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GK20A"
  CUDA Driver Version / Runtime Version          6.0 / 6.0
  CUDA Capability Major/Minor version number:    3.2
  Total amount of global memory:                 1746 MBytes (1831051264 bytes)
  ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
  GPU Clock rate:                                852 MHz (0.85 GHz)
  Memory Clock rate:                             924 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 131072 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = GK20A
Result = PASS

ID: 1553058 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 130
United States
Message 1553011 - Posted: 6 Aug 2014, 20:19:34 UTC - in response to Message 1553006.  
Last modified: 6 Aug 2014, 20:21:17 UTC

That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads.


The more I look at Nvidia's support page this doesn't make much since. I don't think this applies to pushing the Cuda cores to their limit. It would be interesting to get a power meter on it to see what it's usage is.

"the Kepler GPU in Tegra K1 consists of 192 CUDA cores and consumes less than two watts*.
*Average power measured on GPU power rail while playing a collection of popular mobile games."

Under full load from SETI@home the GPU average power consumption may be much more. As the load will be as varied as when playing a game.
I just did a quick test with a i5-3470. The iGPU averages 3.6w in GPUz under full load, from an app called HeavyLoad. While playing flash games the average reading in GPUz is 0.4w. I don't have much else game wise to check on this system. As it is my cubical system.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1553011 · Report as offensive
mavrrick

Send message
Joined: 12 Apr 00
Posts: 17
Credit: 1,894,993
RAC: 9
United States
Message 1553006 - Posted: 6 Aug 2014, 19:49:17 UTC - in response to Message 1552975.  

That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads.


The more I look at Nvidia's support page this doesn't make much since. I don't think this applies to pushing the Cuda cores to their limit. It would be interesting to get a power meter on it to see what it's usage is.
ID: 1553006 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 130
United States
Message 1553003 - Posted: 6 Aug 2014, 19:32:19 UTC - in response to Message 1552975.  

I suspect he compiled it himself based on his previous posts.

I am really curious about how he set this up and well, and if he will share directions later for those adventurous enough to attempt it after him :). That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads. It would be nice to see that validated, but even at 10 watts that is some awesome crunching.

This topic is actually kind of making me a little annoyed at my own gear now. I am seeing how truly bad my desktop really is and am to the point I would rather not turn it on at all. Considering just selling it off for parts. To build something more energy efficient.

Now all you need to do is setup a about 8 of them for a cluster solution, and let it churn out some WU.

A home-built cluster is not much of an advantage for SETI@home or BOINC. As you have tot run the app on each node in the cluster.

Most of my computers are on for other reasons. So I run SETI@home on them. My past several system upgrades have been to increase my system efficiency. More than to increase the system performance.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1553003 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1552992 - Posted: 6 Aug 2014, 18:57:13 UTC - in response to Message 1552966.  


The first WU has just finished; Run time 50 min 57 sec, CPU time 21 min 32 sec. Not validated yet. Run time is just about twice what I'm currently achieving with the 750 Ti, but that's running two at once.

From your stderr out:

setiathome enhanced x41zc, Cuda 6.00


Where did you get this version from?

As mavrrick says, I compiled it myself. However, the hard part isn't s@h; the hard part is compiling BOINC. It has so many prerequisites. The basic instructions are here.
git clone git://boinc.berkeley.edu/boinc-v2.git boinc
cd boinc
git tag [Note the version corresponding to the latest recommendation.]
git checkout client_release/<required release>; git status
./_autosetup
./configure --disable-server --enable-manager
make -j n [where n is the number of cores/threads at your disposal]

The problems you will have is first finding the libraries and utilities that _autosetup wants, then ensuring that you have g++ installed, and then finding all the libraries and development packs that configure wants (you need the -devs for the header definition files). The final hurdle, if you want to use the boincmgr graphical command interface, is getting wxWidgets. It tends not to be included in repositories for modern distributions now so you have to try to compile it yourself. Which I haven't managed lately as BOINC wants an old version which was (apparently) badly coded and gives lots of problems with the newest, smartest gcc/g++ compilers. You may need to just learn how to use the boinccmd command-line controller...
The simplest way to then compile s@h was detailed back in January, in this thread.
cd <directory your boinc directory is in>
svn checkout -r1921 https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt/Xbranch
cd Xbranch
[edit client/analyzeFuncs.h and add the line '#include <unistd.h>']
sh ./_autosetup
sh ./configure BOINCDIR=../boinc --enable-sse2 --enable-fast-math
make -j n

This assumes you've installed the CUDA SDK and added the appropriate locations to your PATH and LD_LIBRARY_PATH environment variables, but that's well-covered in the Nvidia documentation. As I alluded to above, you will probably have to edit the configure file too, to make sure obsolete gencode entries are removed and appropriate ones for your kit are included. Oh, and drop the --enable-sse2 if you're compiling for other than Intel/AMD CPUs.
ID: 1552992 · Report as offensive
mavrrick

Send message
Joined: 12 Apr 00
Posts: 17
Credit: 1,894,993
RAC: 9
United States
Message 1552975 - Posted: 6 Aug 2014, 17:32:51 UTC - in response to Message 1552966.  

I suspect he compiled it himself based on his previous posts.

I am really curious about how he set this up and well, and if he will share directions later for those adventurous enough to attempt it after him :). That is pretty awesome performance for a system that NVidia says is using 5 watts under real work loads. It would be nice to see that validated, but even at 10 watts that is some awesome crunching.

This topic is actually kind of making me a little annoyed at my own gear now. I am seeing how truly bad my desktop really is and am to the point I would rather not turn it on at all. Considering just selling it off for parts. To build something more energy efficient.

Now all you need to do is setup a about 8 of them for a cluster solution, and let it churn out some WU.
ID: 1552975 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1552966 - Posted: 6 Aug 2014, 17:19:47 UTC - in response to Message 1552955.  


The first WU has just finished; Run time 50 min 57 sec, CPU time 21 min 32 sec. Not validated yet. Run time is just about twice what I'm currently achieving with the 750 Ti, but that's running two at once.

From your stderr out:

setiathome enhanced x41zc, Cuda 6.00


Where did you get this version from?
ID: 1552966 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1552955 - Posted: 6 Aug 2014, 16:53:29 UTC - in response to Message 1552687.  
Last modified: 6 Aug 2014, 16:54:28 UTC

Ah, there it is!
Well, I got both my hologram reconstruction and s@h compiled and running on the Jetson today. The holograms run about 10x slower than on my GTX 750 Ti (1.5 frames/sec for a 4Kx4K reconstruction). No real problems with the s@h, just the missing include I reported last January, and I had to edit the config file to remove the old compute capabilities that nvcc didn't like and put in 3.2 for the Tegra.
The first WU has just finished; Run time 50 min 57 sec, CPU time 21 min 32 sec. Not validated yet. Run time is just about twice what I'm currently achieving with the 750 Ti, but that's running two at once.
ID: 1552955 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1552687 - Posted: 5 Aug 2014, 20:52:09 UTC - in response to Message 1552676.  


05-Aug-2014 16:03:28 [---] This computer is not attached to any projects
05-Aug-2014 16:03:28 [---] Visit http://boinc.berkeley.edu for instructions
05-Aug-2014 16:03:29 Initialization completed
05-Aug-2014 16:03:29 [---] Suspending GPU computation - computer is in use
05-Aug-2014 16:04:00 [---] Received signal 2
05-Aug-2014 16:04:01 [---] Exit requested by user

Now to try to attach to S@H since the project is up again!

05-Aug-2014 20:31:55 [---] Suspending GPU computation - computer is in use
05-Aug-2014 20:39:33 [---] Running CPU benchmarks
05-Aug-2014 20:39:33 [---] Suspending computation - CPU benchmarks in progress
05-Aug-2014 20:39:33 [---] Running CPU benchmarks
05-Aug-2014 20:39:33 [---] Running CPU benchmarks
05-Aug-2014 20:39:33 [---] Running CPU benchmarks
05-Aug-2014 20:39:33 [---] Running CPU benchmarks
05-Aug-2014 20:40:05 [---] Benchmark results:
05-Aug-2014 20:40:05 [---]    Number of CPUs: 4
05-Aug-2014 20:40:05 [---]    966 floating point MIPS (Whetstone) per CPU
05-Aug-2014 20:40:05 [---]    6829 integer MIPS (Dhrystone) per CPU
05-Aug-2014 20:40:06 [---] Resuming computation
05-Aug-2014 20:40:12 [http://setiathome.berkeley.edu/] Master file download succeeded
05-Aug-2014 20:40:17 [---] Number of usable CPUs has changed from 4 to 1.
05-Aug-2014 20:40:17 [http://setiathome.berkeley.edu/] Sending scheduler request: Project initialization.
05-Aug-2014 20:40:17 [http://setiathome.berkeley.edu/] Requesting new tasks for CPU and NVIDIA
05-Aug-2014 20:40:22 [SETI@home] Scheduler request completed: got 0 new tasks
05-Aug-2014 20:40:22 [SETI@home] This project doesn't support computers of type armv7l-unknown-linux-gnueabihf
05-Aug-2014 20:40:24 [SETI@home] Started download of arecibo_181.png
05-Aug-2014 20:40:24 [SETI@home] Started download of sah_40.png
05-Aug-2014 20:40:27 [SETI@home] Finished download of arecibo_181.png
05-Aug-2014 20:40:27 [SETI@home] Finished download of sah_40.png
05-Aug-2014 20:40:27 [SETI@home] Started download of sah_banner_290.png
05-Aug-2014 20:40:27 [SETI@home] Started download of sah_ss_290.png
05-Aug-2014 20:40:29 [SETI@home] Finished download of sah_banner_290.png
05-Aug-2014 20:40:29 [SETI@home] Finished download of sah_ss_290.png
05-Aug-2014 20:43:41 [---] Resuming GPU computation
05-Aug-2014 20:44:27 [---] Suspending GPU computation - computer is in use
:-)
Ah, there it is!
ID: 1552687 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 507
United Kingdom
Message 1552676 - Posted: 5 Aug 2014, 20:30:30 UTC - in response to Message 1552389.  

I took delivery of an Nvidia Tegra TK1 "Jetson" SDK tonight and should have all the bits needed to run it (HDMI->DVI cable, USB hub, Keyboard+mouse) on next-day delivery tomorrow. First plan is to work out how it runs (it's an ARM version of Ubuntu) and install the latest CUDA libraries. Then, after I've got my hologram reconstructions running on the 192-core Kepler, I'll see if there are all the resources needed to compile BOINC & S@H on it. Watch, as they say, this space.

Well, I've got this far so far:
05-Aug-2014 16:03:28 [---] cc_config.xml not found - using defaults
05-Aug-2014 16:03:28 [---] Starting BOINC client version 7.2.42 for armv7l-unknown-linux-gnueabihf
05-Aug-2014 16:03:28 [---] log flags: file_xfer, sched_ops, task
05-Aug-2014 16:03:28 [---] Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
05-Aug-2014 16:03:28 [---] Data directory: /home/ubuntu/BOINC
05-Aug-2014 16:03:28 [---] CUDA: NVIDIA GPU 0: GK20A (driver version unknown, CUDA version 6.0, compute capability 3.2, 1746MB, 141MB available, 327 GFLOPS peak)
05-Aug-2014 16:03:28 [---] Host name: tegra-ubuntu
05-Aug-2014 16:03:28 [---] Processor: 1 ARM ARMv7 Processor rev 3 (v7l)
05-Aug-2014 16:03:28 [---] Processor features: swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
05-Aug-2014 16:03:28 [---] OS: Linux: 3.10.24-g6a2d13a
05-Aug-2014 16:03:28 [---] Memory: 1.71 GB physical, 0 bytes virtual
05-Aug-2014 16:03:28 [---] Disk: 11.69 GB total, 5.63 GB free
05-Aug-2014 16:03:28 [---] Local time is UTC +0 hours
05-Aug-2014 16:03:28 [---] No general preferences found - using defaults
05-Aug-2014 16:03:28 [---] Preferences:
05-Aug-2014 16:03:28 [---]    max memory usage when active: 873.11MB
05-Aug-2014 16:03:28 [---]    max memory usage when idle: 1571.60MB
05-Aug-2014 16:03:28 [---]    max disk usage: 5.55GB
05-Aug-2014 16:03:28 [---]    don't use GPU while active
05-Aug-2014 16:03:28 [---]    suspend work if non-BOINC CPU load exceeds 25%
05-Aug-2014 16:03:28 [---]    (to change preferences, visit a project web site or select Preferences in the Manager)
05-Aug-2014 16:03:28 [---] Not using a proxy
05-Aug-2014 16:03:28 [---] This computer is not attached to any projects
05-Aug-2014 16:03:28 [---] Visit http://boinc.berkeley.edu for instructions
05-Aug-2014 16:03:29 Initialization completed
05-Aug-2014 16:03:29 [---] Suspending GPU computation - computer is in use
05-Aug-2014 16:04:00 [---] Received signal 2
05-Aug-2014 16:04:01 [---] Exit requested by user

As with the Celeron I bought recently, I had a lot of trouble with the graphics, especially finding the GL, GLU and GLT libraries -- compounded by the fact that neither install (the Celeron is CENTOS 7) had g++ by default and ./configure doesn't really point that out to you. Big showstopper is wxWidgets. Need to compile it myself, and it looks like BOINC code isn't compatible with anything past 2.8.3 -- but 2.8.3 won't compile with gcc 4.8.3 apparently. So I haven't got boincmgr running on either yet.
Now to try to attach to S@H since the project is up again!
ID: 1552676 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 130
United States
Message 1552454 - Posted: 5 Aug 2014, 3:09:14 UTC - in response to Message 1552389.  

Compared to my Bay Trail-D system.
Application	GFLOPS	Cores	Total GLOPS	System Watts	GFLOPS/Watt
SETI@home v7	 10.25	    4	 41.00		25		2.050
AstroPulse v6	 21.30	    4	 85.20		25		4.260


Hmm, my machine is running somewhat fewer FLOPS than yours for both MB and AP. I haven't worked out how to enable the iGPU for crunching under Linux yet.

I took delivery of an Nvidia Tegra K "Jetson" SDK tonight and should have all the bits needed to run it (HDMI->DVI cable, USB hub, Keyboard+mouse) on next-day delivery tomorrow. First plan is to work out how it runs (it's an ARM version of Ubuntu) and install the latest CUDA libraries. Then, after I've got my hologram reconstructions running on the 192-core Kepler, I'll see if there are all the resources needed to compile BOINC & S@H on it. Watch, as they say, this space.

Must take my Wattmeter back into work next time I have to power down this rig (which is running 143 W ATM, it's usually around 250 W when the GPUs have APs to crunch).


I am running optimized apps & my system seems to like to stay running at its Burt frequency all of the time. Either or both of those could be the reason for my systems higher numbers.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1552454 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 130
United States
Message 1552453 - Posted: 5 Aug 2014, 3:05:19 UTC - in response to Message 1552368.  

If you take the "Average GFLOPs" as a way to indicated the processing speed of the device.

There's the problem. GFLOPs is an indicator, but a very, very poor one. Depending on the application being run, a card with a lower GFLOPS rating can process more work per hour than one with a much higher rating.
The number of WUs per hour is a better indicator, but the mix of WUs (VHARs, shorties) makes it difficult to compare things.
Average Processing Rate is a good one as it directly relates to the work being done, unfortunately it isn't accurate as processing more than one WU at a time results in a lower APR, even though the work done per hour is much higher than doing a single WU at a time.
RAC is probably the best indicator, however due to the nature of Credit New (almost completely borked) you can only compare MB to MB, AP to AP. And people that run a mix of the 2 can't really be compared to either (or even each other due to the different mixes).

The way they stated "average GFLOPS" I figured they were talking about APR. Which is displayed in GFLOPS. While it may not be the most accurate it is a measure of the application output on that device. So it does seem to be a valid measure to use.
It will be lower when running more tasks on a device. However, (GFLOPS * instances) should reflect the increased output.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1552453 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Energy Efficiency of Arm Based sysetms over x86 or GPU based systems


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.