Message boards :
Number crunching :
Linux (ARM processor) app and alternatives
Message board moderation
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I'm trying to launch SETI computations on Parallella board. As I understand it has armhf platform. Repository's boinc/seti package (installed via sudo apt-get install boinc-app-seti) has only v7.28 MB app so will not get any work now. But we have "Linux (ARM processor)" app in MB v8.* apps list. Should it go with Parallella's ARMv7 ? Or it requires smth other than armhf platform? Also, are there any alternative binaries available for try? SETI apps news We're not gonna fight them. We're gonna transcend them. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Seems like armhf might be a better route to go, but would probably need an app compiled to take advantage of it I would guess the current Linux ARM apps do not. I does look like a new Linux ARM64 app was just pushed out on Beta about an hour ago. So maybe Eric is looking to expand the Linux ARM app base. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Seems it fetches work now: SETI@home v8 8.02 arm-unknown-linux-gnueabihf http://setiathome.berkeley.edu/host_app_versions.php?hostid=8194984 I did project detouch and removed pre-installed links to app_info and V7 binary. After re-attaching stock app seems downloaded. /var/lib/boinc-client/slots/0/stderr.txt 654/654 100% setiathome_v8 8.00 Revision: 3378 g++ (Raspbian 4.9.2-10) 4.9.2 libboinc: BOINC 7.7.0 Work Unit Info: ............... WU true angle range is : 0.448224 features: half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) vfp_GetPowerSpectrum 0.004314 0.00000 fpu_opt_ChirpData 0.242348 0.00000 fftwf_transpose 0.031481 0.00000 opt NEON folding 0.012961 0.00000 to compare with Samsung J1 mini & Pine: https://setiathome.berkeley.edu/forum_thread.php?id=80803&postid=1840115 J1 mini: vfp_GetPowerSpectrum 0.001373 0.00000 vfp_ChirpData 0.052066 0.00000 fftwf_transpose 0.022859 0.00000 opt NEON folding 0.014100 0.00000 Pine: vfp_GetPowerSpectrum 0.001141 0.00000 v_ChirpData 0.081789 0.00000 fftwf_transpose 0.024457 0.00000 opt NEON folding 0.003022 0.00000 So, Parallella's ARM part doesn't look impressive nowadays at all. To be really useful Epithany part required... And just as with Pine, but in much bigger degreee there is definitive reserve for software optimization. One of slowest chirp routines selected for some reason... SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
Raistmer- I compiled the ARMHF Linux 8.02 app that is on main. It is a straight compile of the SETI client code against the libraries provided from the Debian (Raspbian) repository at the time. Claggy compiled the 8.03 app that is on beta. It is about 5% faster. It is the same except that Debian updated the FFTW library to properly use the NEON instructions. I've been trying to get Eric to put 8.03 on main. These apps work on ARMHF computers like Raspberry Pis, Orange Pis, Parallelas, etc... On the chirp function selection , there is an issue with the VFP and NEON chirp functions. They work in Android but don't work in Linux. They are written in assembly. I don't program in assembly and don't know how to fix them. The other functions written in assembly work fine like get power spectrum and folding. The code currently is set up to just not use the VFP and NEON chirp functions on Linux. I've been testing apps built with the latest version of FFTW. On ARM64, they are faster than the 8.00 version that just got put on beta (it is also a straight compile I did against the standard libraries). In my testing, the 8.00 version just works but isn't all that fast. According to Bench, the Linux ARMHF 8.03 app is about 5% faster on my ODROID C2 because it is using the VFP and NEON code. (Also beyond my ability to port to ARM64). The apps using the latest FFTW libraries I compiled are 25-80% faster. I still have to do some testing, but if the fast one works well, I will send it to Eric for testing on beta. Last night I built the ARMHF app against the latest FFTW. It is currently running in Bench compared to the 8.03 app. - Tom |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Hi Tom, thanks for explanation. Could you post commands you use for building app binary/FFTW lib, please. Also, do you doing cross-compilation (that is, ARM binary on x86 PC desktop) or build natively (that is, on ARM device itself). I would like to learn how to build natively (on Parallella board in my case). And another question - did you try Mateusz modified sources? AFAIK he used other than FFTW FFT library for his build. It was faster than stock @v7 times, but unfortunately that sources weren't updated to v8 so far (it was ARM Android build also). And where could I download mentioned alternative binaries for trying with my device? On the chirp function selection , there is an issue with the VFP and NEON chirp functions. They work in Android but don't work in Linux. They are written in assembly. I don't program in assembly and don't know how to fix them. The other functions written in assembly work fine like get power spectrum and folding. The code currently is set up to just not use the VFP and NEON chirp functions on Linux. On x86 there are number of intrinsics that called in function call notation but translated by capable compiler directly into corresponding SIMD instruction. This allows to avoid assembly usage still achieving vector-unit based computation. Maybe smth similar available for ARM's vector instructions too? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
I build natively - for ARMHF on a Raspberry Pi 2 running the latest Raspbian and for ARM64 on an ODROID C2 running the latest Hardkernel Ubuntu. The following are my instructions for building with the latest FFTW for ARMHF or ARM64: Get Prerequisites sudo apt-get update sudo apt-get install git make m4 libtool autoconf pkg-config automake subversion libcurl4-openssl-dev libssl-dev gettext docbook2x docbook-xml libxml2-utils zlib1g-dev libsm-dev libice-dev libxmu-dev libxi-dev libx11-dev libnotify-dev freeglut3-dev libfcgi-dev libjpeg8-dev libxss-dev libxcb-util0-dev libxcb-dpms0-dev libxext-dev libstdc++6-4.7-dev Get and Build FFTW wget http://www.fftw.org/fftw-3.3.6-pl1.tar.gz tar xzvf fftw-3.3.6-pl1.tar.gz cd fftw-3.3.6-pl1 ./configure --enable-single --enable-neon --with-slow-timer make -j 4 (make -j 2 on Parallella) sudo make install cd ~ Get and build the Boinc libraries - Current Version git clone https://github.com/BOINC/boinc boinc cd boinc ./_autosetup ./configure --disable-server --disable-manager LDFLAGS=-static-libgcc --with-boinc-platform=arm-unknown-linux-gnueabihf (remove --with-boinc-platform option on ARM64) make -j 4 (make -j 2 on Parallella) cd ~ Now get the Seti_boinc source, and build the app: svn checkout https://setisvn.ssl.berkeley.edu/svn/seti_boinc seti_boinc cd seti_boinc ./_autosetup ./configure CFLAGS="-O3" CXXFLAGS="-O3" BOINCDIR=/home/pi/boinc --enable-client --enable-static --disable-shared --disable-server --enable-fast-math (edit path to Boinc directory for your system) make -j 4 (make -j 2 on Parallella) Compilation will end with an error. It is ok, the binary has been built. ls -l client look for a file that starts with setiathome-8.0. This is the binary you just compiled. As an example, on an ARMv7 device you can copy to your home directory and make an app_info.xml file: cp ~/seti_boinc/client/setiathome-8.0.armv7l-unknown-linux-gnueabihf ~/ cd ~ nano app_info.xml <app_info> <app> <name>setiathome_v8</name> <user_friendly_name>SETI@home v8</user_friendly_name> </app> <file_info> <name>setiathome-8.0.armv7l-unknown-linux-gnueabihf</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>800</version_num> <file_ref> <file_name>setiathome-8.0.armv7l-unknown-linux-gnueabihf</file_name> <main_program/> </file_ref> </app_version> </app_info> Assuming you are already attached to the SETI project copy the files to the SETI project directory and fix the permissions: Copy the app and the app_info.xml files to the project folder: sudo cp setiathome-8.0.armv7l-unknown-linux-gnueabihf /var/lib/boinc-client/projects/setiathome.berkeley.edu/ sudo cp app_info.xml /var/lib/boinc-client/projects/setiathome.berkeley.edu/ For Beta testing use: sudo cp setiathome-8.0.armv7l-unknown-linux-gnueabihf /var/lib/boinc-client/projects/setiweb.ssl.berkeley.edu_beta/ sudo cp app_info.xml /var/lib/boinc-client/projects/setiweb.ssl.berkeley.edu_beta/ Change the file permissions to boinc:boinc: sudo chown boinc:boinc /var/lib/boinc-client/projects/setiathome.berkeley.edu/setiathome-8.0.armv7l-unknown-linux-gnueabihf sudo chown boinc:boinc /var/lib/boinc-client/projects/setiathome.berkeley.edu/app_info.xml For Beta testing use: sudo chown boinc:boinc /var/lib/boinc-client/projects/setiweb.ssl.berkeley.edu_beta/setiathome-8.0.armv7l-unknown-linux-gnueabihf sudo chown boinc:boinc /var/lib/boinc-client/projects/setiweb.ssl.berkeley.edu_beta/app_info.xml Finally, restart boinc-client so it will use the new app: sudo /etc/init.d/boinc-client restart On your other questions, I have not seen Mateusz modified sources, and in the past changing the compiler flags to use the NEON and VFP intrinsics didn't result in a speed improvement (see: https://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2275#55845). - Tom |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Thanks a lot for such detailed instructions! In case you would interest in Mateusz codebase it's here: https://github.com/matszpk/native-boinc-for-android/tree/master/src His FFT lib of choice was FFTS it seems. I don't understand switching to intrinsics thing though. If they exist code should be re-written to make use of it. This would allow walk away from assembly coded functions to intrinsics-coded ones (with better portability between 32-bit and 64-bit ARM models for example or different compilers if they exist for ARM platforms). Anyway I need to get more familiar with corresponding vector parts of code (for stock all such things concentrated inside vector subdirectory) to continue on this. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Does all it mean that ARM doesn't have cycle-counter usable for FFTW ? And having to plan separately for each task doesn't speed things up. For opt codebase we implemented wisdom caching in project directory but for stock codebase wisdom still re-created with each new task, right? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
Yes. The ARM devices do not have a cycle counter that is available to user space. On ARMv8 (ARM64) devices there is a kernel module you can build that provides a cycle counter. On ARMv7 and ARMv8 devices, you can also recompile the kernel to provide one to user space. See the notes in FFTW configure.ac file. Neither of these options are going to be available for most users. On my ODROID C2 (ARM64), I compared the kernel module cycle counter, no cycle counter and slow timer cycle counter: KWSN-Linux-MBbench v2.1.08 Running on odroid64 at Wed 01 Feb 2017 05:04:26 AM UTC ---------------------------------------------------------------- Starting benchmark run... ---------------------------------------------------------------- Listing wu-file(s) in /testWUs : reference_work_unit_r3215.wu Listing executable(s) in /APPS : setiathome-8.kmcc.aarch64-unknown-linux-gnu setiathome-8.nocc.aarch64-unknown-linux-gnu setiathome-8.slowcc.aarch64-unknown-linux-gnu Listing executable in /REF_APPS : setiathome-8.0.aarch64-unknown-linux-gnu ---------------------------------------------------------------- Current WU: reference_work_unit_r3215.wu ---------------------------------------------------------------- Skipping default app setiathome-8.0.aarch64-unknown-linux-gnu, displaying saved result(s) Elapsed Time: ....................... 24732 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome-8.kmcc.aarch64-unknown-linux-gnu -verb Elapsed Time : ...................... 13863 seconds Speed compared to default : ......... 178 % ----------------- Comparing results Result : Strongly similar, Q= 99.99% ---------------------------------------------------------------- Running app with command : .......... setiathome-8.nocc.aarch64-unknown-linux-gnu -verb Elapsed Time : ...................... 19738 seconds Speed compared to default : ......... 125 % ----------------- Comparing results Result : Strongly similar, Q= 99.99% ---------------------------------------------------------------- Running app with command : .......... setiathome-8.slowcc.aarch64-unknown-linux-gnu -verb Elapsed Time : ...................... 13945 seconds Speed compared to default : ......... 177 % ----------------- Comparing results Result : Strongly similar, Q= 99.98% ---------------------------------------------------------------- Done with reference_work_unit_r3215.wu ==================================================================== Done with Benchmark run! Removing temporary files! The slow timer seems fast enough with the stock code and whatever it is doing with wisdom files. On my Raspberry Pi 2, I compared no cycle counter and the slow cycle counter against the 8.03 ARMHF app: KWSN-Linux-MBbench v2.1.08 Running on pitft at Wed 01 Feb 2017 06:43:10 AM UTC ---------------------------------------------------------------- Starting benchmark run... ---------------------------------------------------------------- Listing wu-file(s) in /testWUs : reference_work_unit_r3215.wu Listing executable(s) in /APPS : setiathome-8.nocc.arm-unknown-linux-gnueabihf setiathome-8.slowcc.arm-unknown-linux-gnueabihf Listing executable in /REF_APPS : setiathome_8.03_arm-unknown-linux-gnueabihf ---------------------------------------------------------------- Current WU: reference_work_unit_r3215.wu ---------------------------------------------------------------- Running default app with command :... setiathome_8.03_arm-unknown-linux-gnueabihf -verb Elapsed Time: ....................... 45756 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome-8.nocc.arm-unknown-linux-gnueabihf -verb Elapsed Time : ...................... 43282 seconds Speed compared to default : ......... 105 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% ---------------------------------------------------------------- Running app with command : .......... setiathome-8.slowcc.arm-unknown-linux-gnueabihf -verb Elapsed Time : ...................... 35679 seconds Speed compared to default : ......... 128 % ----------------- Comparing results Result : Strongly similar, Q= 99.99% ---------------------------------------------------------------- Done with reference_work_unit_r3215.wu ==================================================================== Done with Benchmark run! Removing temporary files! I'm going to send the slowcc apps to Eric for testing on Beta. - Tom |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I suppose FFT lib statically linked into app binary on Linux? That is, no shared object (DLL) is needed? I got binary. Now need to setup KWSN benchmarking to test it. Regarding vectorising ARM functions. Seems assembly ones (*.S) are Mateusz ones. That is, some attempt to integrate his code into stock was done. But in functions vector I see neon_ChirpData function and doesn't see its code anywhere (maybe missed - need to run full-projec search on Windows host tomorrow). But if this function miss indeed that means there is no(!) SIMD chirping for ARM at all. And regarding *.S files. preprocessor directives there imply that hard float sopported (I suppose !defined(__SOFTFP__) means hard float). What tool is able to compile *.S into object files on Linux? That is, what is GNU assembler tool? EDIT: this one http://tigcc.ticalc.org/doc/gnuasm.html So, next will be attempt to compile S into obj SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And there are NEON intrinsics indeed: http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/ARM-NEON-Intrinsics.html Perhaps Mateusz is too familiar with asm to care use them instead of raw assembly. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
It is not selected on line 413 in analyzeFuncs_vector.cpp The android def keeps it from being used. The app crashes if you remove it when it tries to run the tests. There is a link in the thread from Beta above where Claggy describes the issue. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Neither of these options are going to be available for most users. On my ODROID C2 (ARM64), I compared the kernel module cycle counter, no cycle counter and slow timer cycle counter: are you sure that for single bench run wisdom file was deleted between different binaries launch? If not then your "no counter" and "slow counter" binaries could just re-use well-prepared FFTW wisdom file from first "with counter" binary run. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It is not selected on line 413 in analyzeFuncs_vector.cpp The android def keeps it from being used. The app crashes if you remove it when it tries to run the tests. There is a link in the thread from Beta above where Claggy describes the issue. yep, I understand that currently it disabled. But that means currently SIMD ARM unit almost not used. Just imagine what would be with x86 build w/o SSE (even not AVX, any SSE)... So, the first attempt for optimization of current state of things should be to enable all SIMD code already written. Next step - write additional SIMD code (or re-write hopelessly broken one). BTW, that ANDROID define disables not only NEON version, seems it disables VFP version too. That is, no "real" floating point for chirping at all??? If really so it would be just veeeeery slow. EDIT: though FPU still used perhaps. My stderr shows fpu_opt_ChirpData selected. it has tag BA_ANY but hardly that means it will not use FPU on ARM... need to understand what the difference between vfp_ and neon_ChirpData then... EDIT2: from the other side it uses double. How ARM handles double type? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
If I remember correctly I tried the VFP chirp and it didn't work either |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
I will try running each app separately in Bench to see if it is using the well prepared wisdom files from previous runs. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
[quote]EDIT: though FPU still used perhaps. My stderr shows fpu_opt_ChirpData selected. Don't forget to supply the -verb cmdline entry eithier in the bench, or via app_config.xml to see all the timings. Claggy |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Unfortunately, I just confirmed this. Current stock ARM binary saves wisdom.sah in slot directory, not in project directory. That is, it will re-create wisdom for each new task. And all wisdom-creation overhead (I'll soon post what such overhead is) will apply to each and every task SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
==================================================================== Seems Parallella misses some pre-requisites for benchmark? Any suggestions how to fix? hm... https://debian.pro/927 SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
KWSN doesn't stop/suspend boinc automatically. Pity. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.