Linux (ARM processor) app and alternatives

Message boards : Number crunching : Linux (ARM processor) app and alternatives
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1849088 - Posted: 16 Feb 2017, 15:26:44 UTC - in response to Message 1848794.  

Great, it saves me from FFTW patching cause Parallella prevers NEON anyway.


I built the app and successfully ran it on a Raspberry Pi 2 (ARMv7). It chose the VFP chirp function as fastest:

setiathome_v8 8.00 Revision: 3633 g++ (Raspbian 4.9.2-10) 4.9.2
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is :  0.008955
Getting CPU Capabilities from /proc/cpuinfo
features:  half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
 
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.002819 0.00000  test
            vfp_GetPowerSpectrum 0.001020 0.00000  test
           neon_GetPowerSpectrum 0.002269 0.00000  test
            vfp_GetPowerSpectrum 0.001020 0.00000  choice

                     v_ChirpData 0.161951 0.00000  test
                   fpu_ChirpData 0.171985 1.51106  test
               fpu_opt_ChirpData 0.182788 0.00000  test
                   vfp_ChirpData 0.070567 0.00000  test
                  neon_ChirpData 0.074742 0.00000  test
                   vfp_ChirpData 0.070567 0.00000  choice

                     v_Transpose 0.107684 0.00000  test
                    v_Transpose2 0.055372 0.00000  test
                    v_Transpose4 0.032668 0.00000  test
                    v_Transpose8 0.060820 0.00000  test
                 fftwf_transpose 0.026449 0.00000  test
                  v_pfTranspose2 0.051654 0.00000  test
                  v_pfTranspose4 0.030063 0.00000  test
                  v_pfTranspose8 0.052994 0.00000  test
                 v_vfpTranspose2 0.054142 0.00000  test
                 fftwf_transpose 0.026449 0.00000  choice

                 FPU opt folding 0.023844 0.00000  test
                 opt VFP folding 0.018468 0.20945  test
                opt NEON folding 0.015297 0.00000  test
                opt NEON folding 0.015297 0.00000  choice

                   Test duration    35.52 seconds


It adds maybe 5% over 8.04/8.05:

KWSN-Linux-MBbench v2.1.08
Running on pitft at Thu 16 Feb 2017 07:59:02 AM UTC
----------------------------------------------------------------
Starting benchmark run...
----------------------------------------------------------------
Listing wu-file(s) in /testWUs :
PG0009_v8.wu

Listing executable(s) in /APPS :
setiathome-8.neonvfpchirp.arm-unknown-linux-gnueabihf

Listing executable in /REF_APPS :
setiathome_8.04_arm-unknown-linux-gnueabihf
----------------------------------------------------------------
Current WU: PG0009_v8.wu

----------------------------------------------------------------
Skipping default app setiathome_8.04_arm-unknown-linux-gnueabihf, displaying saved result(s)
Elapsed Time: ....................... 6542 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome-8.neonvfpchirp.arm-unknown-linux-gnueabihf -verb
Elapsed Time : ...................... 6181 seconds
Speed compared to default : ......... 105 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.54%

----------------------------------------------------------------
Done with PG0009_v8.wu


I'm going to test it on my Raspberry Pi 1 (ARMv6). If it works, which I expect it will, I will send it to Eric as 8.06.

- Tom
ID: 1849088 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1849109 - Posted: 16 Feb 2017, 17:16:20 UTC - in response to Message 1849088.  
Last modified: 16 Feb 2017, 17:18:24 UTC

setiathome_v8 8.00 Revision: 3633 g++ (Raspbian 4.9.2-10) 4.9.2
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is :  0.008955
Getting CPU Capabilities from /proc/cpuinfo
features:  half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
 
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.002819 0.00000  test
            vfp_GetPowerSpectrum 0.001020 0.00000  test
           neon_GetPowerSpectrum 0.002269 0.00000  test
            vfp_GetPowerSpectrum 0.001020 0.00000  choice

                     v_ChirpData 0.161951 0.00000  test
                   fpu_ChirpData 0.171985 1.51106  test
               fpu_opt_ChirpData 0.182788 0.00000  test
                   vfp_ChirpData 0.070567 0.00000  test
                  neon_ChirpData 0.074742 0.00000  test
                   vfp_ChirpData 0.070567 0.00000  choice

                     v_Transpose 0.107684 0.00000  test
                    v_Transpose2 0.055372 0.00000  test
                    v_Transpose4 0.032668 0.00000  test
                    v_Transpose8 0.060820 0.00000  test
                 fftwf_transpose 0.026449 0.00000  test
                  v_pfTranspose2 0.051654 0.00000  test
                  v_pfTranspose4 0.030063 0.00000  test
                  v_pfTranspose8 0.052994 0.00000  test
                 v_vfpTranspose2 0.054142 0.00000  test
                 fftwf_transpose 0.026449 0.00000  choice

                 FPU opt folding 0.023844 0.00000  test
                 opt VFP folding 0.018468 0.20945  test
                opt NEON folding 0.015297 0.00000  test
                opt NEON folding 0.015297 0.00000  choice

                   Test duration    35.52 seconds

I wonder if we can fix the fpu_ChirpData and opt VFP folding now, the opt VFP folding being of more importance as that'll speed up the Pi 1.

Claggy
ID: 1849109 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1850422 - Posted: 21 Feb 2017, 14:23:53 UTC
Last modified: 21 Feb 2017, 15:10:47 UTC

I've been running the app with NEON and VFP chirp on two computers and noticed that it only reports about half the memory usage:

name: 29ja16ad.26537.3748.6.40.190_0
WU name: 29ja16ad.26537.3748.6.40.190
project URL: http://setiweb.ssl.berkeley.edu/beta/
report deadline: Mon May 1 23:53:00 2017
ready to report: no
got server ack: no
final CPU time: 0.000000
state: downloaded
scheduler state: scheduled
exit_status: 0
signal: 0
suspended via GUI: no
active_task_state: EXECUTING
app version num: 806
checkpoint CPU time: 969.970000
current CPU time: 970.960000
fraction done: 0.003944
swap size: 40 MB
working set size: 39 MB

estimated CPU time remaining: 547136.902713

versus with 8.04:

name: 29ja16ad.26537.2930.6.40.36.vlar_2
WU name: 29ja16ad.26537.2930.6.40.36.vlar
project URL: http://setiweb.ssl.berkeley.edu/beta/
report deadline: Sat Apr 15 07:13:30 2017
ready to report: no
got server ack: no
final CPU time: 0.000000
state: downloaded
scheduler state: scheduled
exit_status: 0
signal: 0
suspended via GUI: no
active_task_state: EXECUTING
app version num: 804
checkpoint CPU time: 21836.390000
current CPU time: 21875.590000
fraction done: 0.085344
swap size: 69 MB
working set size: 68 MB

estimated CPU time remaining: 135688.838572
ID: 1850422 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1850429 - Posted: 21 Feb 2017, 15:17:17 UTC

I've also been testing the NEON and VFP chirp app on a Raspberry Pi 1 (ARMv6). It has done a few WUs on Beta and seems to work well. This is the test info:

setiathome_v8 8.00 Revision: 3633 g++ (Raspbian 4.9.2-10) 4.9.2
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is :  0.015056
features: half thumb fastmult vfp edsp java tls 
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.009036 0.00000  test
            vfp_GetPowerSpectrum 0.003225 0.00000  test
           neon_GetPowerSpectrum not supported on CPU
            vfp_GetPowerSpectrum 0.003225 0.00000  choice

                     v_ChirpData 0.392410 0.00000  test
                   fpu_ChirpData 0.252939 0.94721  test
               fpu_opt_ChirpData 0.407524 0.00000  test
                   vfp_ChirpData 0.105006 0.00000  test
                  neon_ChirpData not supported on CPU
                   vfp_ChirpData 0.105006 0.00000  choice

                     v_Transpose 0.036693 0.00000  test
                    v_Transpose2 0.035889 0.00000  test
                    v_Transpose4 0.036924 0.00000  test
                    v_Transpose8 0.077224 0.00000  test
                 fftwf_transpose 0.039018 0.00000  test
                  v_pfTranspose2 0.096429 0.00000  test
                  v_pfTranspose4 0.063666 0.00000  test
                  v_pfTranspose8 0.115470 0.00000  test
                 v_vfpTranspose2 0.033947 0.00000  test
                 v_vfpTranspose2 0.033947 0.00000  choice

                 FPU opt folding 0.084852 0.00000  test
                 opt VFP folding 0.064964 0.20972  test
                opt NEON folding not supported on CPU
                 FPU opt folding 0.084852 0.00000  choice

                   Test duration    48.22 seconds
ID: 1850429 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1850499 - Posted: 22 Feb 2017, 11:40:03 UTC

I've built a 8.06 level app, but with fftw 3.3.4 (and without the longer wisdom generating) as a comparison, running it with the normal PG set with 8.02, 8.03 & 8.04 for comparison on my Pi 2.

Claggy
ID: 1850499 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1850602 - Posted: 22 Feb 2017, 19:05:32 UTC - in response to Message 1850422.  
Last modified: 22 Feb 2017, 19:10:19 UTC

Well, change in size should not be puzzling.
Older Chirp used pre-computed Trigonometry arrays while optimized ones compute sin/cos in more efficient way.
Hence save on not creating TrigArray massive.

Hope we could get updated build in beta soon.

Regarding broken VFP folding - didn't spot obviouse issues so far. Need to compare line by line with original code.
From the other side Android buids are done from this new codebase and they work ("opt VFP" in some of stderrs confirm this). So, smth more complex then obvios typo there...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1850602 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1850612 - Posted: 22 Feb 2017, 19:50:19 UTC - in response to Message 1850602.  

Hope we could get updated build in beta soon.

I e-mailed Eric again and he sent me a note saying he thinks he might be able to put it on Beta today.
ID: 1850612 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1850619 - Posted: 22 Feb 2017, 20:49:15 UTC - in response to Message 1850612.  

Linux ARM 8.06 app is on Beta now!
ID: 1850619 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1850751 - Posted: 23 Feb 2017, 12:16:13 UTC - in response to Message 1850602.  

Well, change in size should not be puzzling.
Older Chirp used pre-computed Trigonometry arrays while optimized ones compute sin/cos in more efficient way.
Hence save on not creating TrigArray massive.

Hope we could get updated build in beta soon.

Regarding broken VFP folding - didn't spot obviouse issues so far. Need to compare line by line with original code.
From the other side Android buids are done from this new codebase and they work ("opt VFP" in some of stderrs confirm this). So, smth more complex then obvios typo there...

If you build the app without the fast mathes option then the fpu_ChirpData works correctly, no change with the VFP folding though.

Claggy
ID: 1850751 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1850762 - Posted: 23 Feb 2017, 13:19:45 UTC - in response to Message 1850751.  

Are there any proven cases of chosen fpu_chirp on hosts where it works correctly?
If it never get selected and if there is baseline replacement for it (like v_Chirp) exists I see no sense to keep it in benchmark at all.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1850762 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1850765 - Posted: 23 Feb 2017, 13:39:03 UTC - in response to Message 1849109.  

I wonder if we can fix the fpu_ChirpData and opt VFP folding now, the opt VFP folding being of more importance as that'll speed up the Pi 1.

Claggy


The fpu_ChirpData bug is a simple fix. On line 78 of analyzeFuncs_fpu.cpp, remove:

 || defined (__arm__)


Line 79 has a comment that says: // TODO: ADD CHECK THAT THIS WORKS

It doesn't work.

Raistmer -

Can you make this fix to the code and upload it to the SVN site?

- Tom
ID: 1850765 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1850838 - Posted: 23 Feb 2017, 19:54:55 UTC - in response to Message 1850765.  

I'll look at that.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1850838 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1850868 - Posted: 23 Feb 2017, 22:37:20 UTC - in response to Message 1850838.  

done At revision: 3643
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1850868 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1852009 - Posted: 28 Feb 2017, 5:32:26 UTC

Now that 8.06 has been up for five days, it is interesting to see some of the results. My three Raspberry Pi 2's (Broadcom BCM2836 ARM Cortex-A7 at 900 MHz) all test the functions the same and always choose:

vfp_GetPowerSpectrum
vfp_ChirpData
fftwf_transpose
opt NEON folding

with vfp_ChirpData testing a little faster than neon_ChirpData. The 8.06 app shows Average processing rate of around 1.40 GFLOPS. 8.03 was around 1.06, and 8.02 was around 1.01.

I also have an Orange Pi One (AllWinner H3 ARM Cortex-A7 at 1.2 GHz) https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=81609.

It is not consistent. It typically chooses:

vfp_GetPowerSpectrum
neon_ChirpData
fftwf_transpose
opt NEON folding

with neon_ChirpData testing a little faster than vfp_ChirpData.

Sometimes it chooses vfp_ChirpData testing a little faster than neon_ChirpData and sometimes it tests quite a bit faster. Sometimes it chooses v_pfTranspose4 a little faster than fftwf_transpose.

If the opt VFP folding function was working, it would be interesting to see how it would test on various computers. I guess all this means that it is good to have many different function options that get tested at the beginning, since some computers will use different ones.

I'm getting an ODROID XU4 which has an octa core with four ARM Cortex-A15 and four ARM Cortex-A7. It will be interesting to see what it does running the 8.06 app. I also wonder if I will be able to compile the Open CL app to run on its Mali-T628 MP6 GPU (it supports OpenCL 1.1 Full profile).
ID: 1852009 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1852019 - Posted: 28 Feb 2017, 8:14:51 UTC - in response to Message 1852009.  

Similar situation with Windows x64 builds driven me to different outcome: embedded benchmark, especially on multicore hosts, is very unstable thing that hardly can be trusted.
What I propose to do to make distinction between these explanations:
to make 2 builds,one with vfp_Chirp disabled,one with neon_Chirp disabled and identical otherwise.
Run them few tasks each with results + task AR logging. Then compare performance between each other _AND_ 'versatile" build that have both.
So, if switching between chirp selections is "real", not just bench artifact, we will see that "versatile" build faster on average than both fixed ones.
Or we will see what chirp really preferable on particular host.
I'm afraid this could be very long experiment though due to low performance of ARM core.

Similar could be done in more controlled environment of PG set benchmark. But again, one needs to reproduce real conditions for multicore processing (bench with both/all cores busy).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1852019 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1852021 - Posted: 28 Feb 2017, 8:18:31 UTC - in response to Message 1852009.  

I also wonder if I will be able to compile the Open CL app to run on its Mali-T628 MP6 GPU (it supports OpenCL 1.1 Full profile).

That's would be interesting indeed. AFAIK Urs also made some experiments with Mali. Maybe he could provide some hints here.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1852021 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1852032 - Posted: 28 Feb 2017, 12:31:02 UTC - in response to Message 1852019.  
Last modified: 28 Feb 2017, 12:39:50 UTC

Indeed. Most people will be unaware that modern cache and prefetch implementations include statistically based AI components, and therefore are non-deterministic. Instead of providing absolute data points, then any particular code path will provide a distribution over many runs instead of a constant performance figure. There are formal/effective ways to deal with that, though I tend to try provide fewer options with more definite spacing, as opposed to having to build a massive knowledge base equipped to 'split hairs' over months or years to reach an answer. Is there a provably optimal answer to the best choices ? yes there is, but only with the benefit of hindsight, and even then not all the runtime conditions are known.

One popular and effective Engineering strategy is to try to make the choices better by 2x in some specific metric than others. It leads to a limited set of solid rational choices, much lower maintenance/overhead, and therefore less confusion or likliehood to settle on a wrong answer.

[Edit:] There are pros and cons to each of compile-time, install time, and run-time/dynamic optimisation. The problems discussed may be related to overlap between these methods, and to be clear many techniques are not completely refined as evidence by the changing mobile market using all 3. Square pegs, round holes, and triangular windows.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1852032 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1852078 - Posted: 1 Mar 2017, 4:54:59 UTC - in response to Message 1848556.  

There's a further posting on Beta about it, once that kernel or a later version comes out as a production kernel, then i'll get the Pi News thread unlocked and post there too.

Claggy



I just updated my Pi's today and 4.4.48 is out as a production kernel now.
ID: 1852078 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1852085 - Posted: 1 Mar 2017, 6:59:19 UTC - in response to Message 1852078.  

I was doing that on my Pi this morning, although it hasn'the had a reboot yet.

Claggy
ID: 1852085 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1852379 - Posted: 2 Mar 2017, 21:13:11 UTC - in response to Message 1852078.  

There's a further posting on Beta about it, once that kernel or a later version comes out as a production kernel, then i'll get the Pi News thread unlocked and post there too.

Claggy



I just updated my Pi's today and 4.4.48 is out as a production kernel now.

Posted in News.

Claggy
ID: 1852379 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : Linux (ARM processor) app and alternatives


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.