Linux (ARM processor) app and alternatives

Message boards : Number crunching : Linux (ARM processor) app and alternatives
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1846902 - Posted: 5 Feb 2017, 22:16:05 UTC - in response to Message 1846872.  
Last modified: 5 Feb 2017, 22:21:48 UTC

yes, I established that already compared C and its asm.
Now attempting to re-code.

EDIT: though sp[] notation looks more like stack thn integer registers.
From the other side, chirp has no float. it has pointers,ints and doubles.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1846902 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1846910 - Posted: 5 Feb 2017, 22:57:05 UTC - in response to Message 1846902.  
Last modified: 5 Feb 2017, 22:57:25 UTC

LoL, have some reading for the next coding window http://www.peter-cockerell.net/aalp/resources/pdf/all.pdf
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1846910 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1846915 - Posted: 5 Feb 2017, 23:19:15 UTC
Last modified: 5 Feb 2017, 23:19:37 UTC

It does garbage now but at least not fail immediately:
before Chirp test:                  neon_ChirpData 
in[0].xy=(1.00000,0.00000)]
in[1].xy=(1.00000,0.00000)]
in[2].xy=(1.00000,0.00001)]
after Chirp test:                  neon_ChirpData 
out[0].xy=(1.00000,0.00000)]
out[1].xy=(0.85399,0.52029)]
out[2].xy=(-0.57938,0.81506)]
                  neon_ChirpData 0.071992 1.44774  test
                     v_ChirpData 0.173869 0.00000  choice

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1846915 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1847156 - Posted: 7 Feb 2017, 1:34:38 UTC - in response to Message 1846017.  

I have attached two of my ARMv7 computers (Raspberry Pi 2 and Orange Pi One) to Beta running the ARMHF app compiled with FFTW 3.3.6-pl1. They are both showing a ~25% increase in speed on the various Beta tasks over the setiathome_8.03_arm-unknown-linux-gnueabihf app. This matches the Bench results.
ID: 1847156 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1847160 - Posted: 7 Feb 2017, 1:43:59 UTC - in response to Message 1846915.  

Can the same changes be made to the VFP code? On my computers, the vfp_GetPowerSpectrum and opt NEON folding functions are the fastest. It is not always the NEON functions. I haven't tried the app on my Raspberry Pi 1 recently, but I remember that the one of the v_pfTranspose functions was faster than the FFTW one. That computer is an ARMv6 with no NEON and it doesn't work well with the seti app due to a kernel bug.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.003291 0.00000  test
            vfp_GetPowerSpectrum 0.001460 0.00000  test
           neon_GetPowerSpectrum 0.002423 0.00000  test
            vfp_GetPowerSpectrum 0.001460 0.00000  choice

                     v_ChirpData 0.163963 0.00000  test
                   fpu_ChirpData 0.134297 0.94721  test
               fpu_opt_ChirpData 0.195064 0.00000  test
                     v_ChirpData 0.163963 0.00000  choice

                     v_Transpose 0.165621 0.00000  test
                    v_Transpose2 0.084227 0.00000  test
                    v_Transpose4 0.047448 0.00000  test
                    v_Transpose8 0.083159 0.00000  test
                 fftwf_transpose 0.026780 0.00000  test
                  v_pfTranspose2 0.062298 0.00000  test
                  v_pfTranspose4 0.035013 0.00000  test
                  v_pfTranspose8 0.063770 0.00000  test
                 v_vfpTranspose2 0.084038 0.00000  test
                 fftwf_transpose 0.026780 0.00000  choice

                 FPU opt folding 0.005288 0.00000  test
                 opt VFP folding 0.004453 0.15613  test
                opt NEON folding 0.003941 0.00000  test
                opt NEON folding 0.003941 0.00000  choice

                   Test duration    30.59 seconds
ID: 1847160 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1847302 - Posted: 8 Feb 2017, 10:10:52 UTC - in response to Message 1847160.  

yes, but until neon_Chirp will give valid results not too much sense change vfp one.
I assume processing chain in function works OK, just memory locations references are broken because of different stack size for soft and hard float.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1847302 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1847312 - Posted: 8 Feb 2017, 11:49:17 UTC
Last modified: 8 Feb 2017, 11:50:26 UTC

Have attached a single Pi2 and a single Pi3 to Seti Beta to help out with testing. They've both picked up the 8.03 app and have four work units running.
BOINC blog
ID: 1847312 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1847333 - Posted: 8 Feb 2017, 15:13:35 UTC - in response to Message 1847302.  

My thinking is that the VFP code is simpler and might be easier to fix. I think it would be worth trying the changes you made to the NEON code that let it run without crashing on the VFP code. If you PM me the changes I can try them if you want.
ID: 1847333 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1847345 - Posted: 8 Feb 2017, 16:54:36 UTC - in response to Message 1847333.  

My thinking is that the VFP code is simpler and might be easier to fix. I think it would be worth trying the changes you made to the NEON code that let it run without crashing on the VFP code. If you PM me the changes I can try them if you want.

Here they are: https://cloud.mail.ru/public/AGPM/QAy4nr3Wd
Just uncomment line 413 in analyzeFuncs_vector.cpp.
Maybe you or smth else could spot flaw in my changes or extend them to working code. I'll try again near to weekend.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1847345 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1847348 - Posted: 8 Feb 2017, 17:14:48 UTC

BTW, anyone attempted to build Android for Parallella or Pi ?
Would be interesting to test existing soft fp apps under it.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1847348 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1847457 - Posted: 9 Feb 2017, 4:15:44 UTC - in response to Message 1847345.  

My thinking is that the VFP code is simpler and might be easier to fix. I think it would be worth trying the changes you made to the NEON code that let it run without crashing on the VFP code. If you PM me the changes I can try them if you want.

Here they are: https://cloud.mail.ru/public/AGPM/QAy4nr3Wd
Just uncomment line 413 in analyzeFuncs_vector.cpp.
Maybe you or smth else could spot flaw in my changes or extend them to working code. I'll try again near to weekend.


Thanks for the code. Your changes and comments help make it more understandable. There are a few more places where sp is used and where d0, d1, or r3 should be referenced instead. I'm working on copying your changes into analyzeFuncs_vfp.S
ID: 1847457 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1847488 - Posted: 9 Feb 2017, 10:51:58 UTC - in response to Message 1847312.  
Last modified: 9 Feb 2017, 10:52:22 UTC

Have attached a single Pi2 and a single Pi3 to Seti Beta to help out with testing. They've both picked up the 8.03 app and have four work units running.

Got blc05 vlar work on both Pi2 and Pi3.

Pi3 on 97% done after 23 hours and 6 mins so looks like it will be around 24 hours
Pi2 on 30% done after 23 hours and 8 mins so looks like 6 days.
BOINC blog
ID: 1847488 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1847511 - Posted: 9 Feb 2017, 13:47:58 UTC - in response to Message 1847160.  

I haven't tried the app on my Raspberry Pi 1 recently, but I remember that the one of the v_pfTranspose functions was faster than the FFTW one. That computer is an ARMv6 with no NEON and it doesn't work well with the seti app due to a kernel bug.

We have a possible fix for that, hopefully someone will apply the patch to the kernel:

https://github.com/raspberrypi/linux/issues/600

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332633.html

Claggy
ID: 1847511 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1847515 - Posted: 9 Feb 2017, 14:02:14 UTC - in response to Message 1847511.  

At this rate I estimate ARM may overtake x86/x86_64 within 3 years :D
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1847515 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1847565 - Posted: 9 Feb 2017, 17:39:52 UTC - in response to Message 1847457.  

My thinking is that the VFP code is simpler and might be easier to fix. I think it would be worth trying the changes you made to the NEON code that let it run without crashing on the VFP code. If you PM me the changes I can try them if you want.

Here they are: https://cloud.mail.ru/public/AGPM/QAy4nr3Wd
Just uncomment line 413 in analyzeFuncs_vector.cpp.
Maybe you or smth else could spot flaw in my changes or extend them to working code. I'll try again near to weekend.


Thanks for the code. Your changes and comments help make it more understandable. There are a few more places where sp is used and where d0, d1, or r3 should might be referenced instead or maybe something else. I'm working on copying your changes into analyzeFuncs_vfp.S


I made the changes to analyzeFuncs_vfp.S. To get it to compile, I had to change the vmov to vmov.f64 here:

#else
                mov r2,r12
                vmov.f64 d9, d0
                vmov.f64 d10, d1
#endif


The test runs, but error is nan so it is not used.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.003465 0.00000  test
            vfp_GetPowerSpectrum 0.001584 0.00000  test
           neon_GetPowerSpectrum 0.002287 0.00000  test
            vfp_GetPowerSpectrum 0.001584 0.00000  choice

                     v_ChirpData 0.192698 0.00000  test
                   fpu_ChirpData 0.179985 1.51106  test
               fpu_opt_ChirpData 0.206817 0.00000  test
                   vfp_ChirpData 0.074339     nan  test
                     v_ChirpData 0.192698 0.00000  choice

                     v_Transpose 0.179137 0.00000  test
                    v_Transpose2 0.089267 0.00000  test
                    v_Transpose4 0.051027 0.00000  test
                    v_Transpose8 0.085502 0.00000  test
                 fftwf_transpose 0.029702 0.00000  test
                  v_pfTranspose2 0.073391 0.00000  test
                  v_pfTranspose4 0.045184 0.00000  test
                  v_pfTranspose8 0.069477 0.00000  test
                 v_vfpTranspose2 0.084667 0.00000  test
                 fftwf_transpose 0.029702 0.00000  choice

                 FPU opt folding 0.009317 0.00000  test
                 opt VFP folding 0.007850 0.16100  test
                opt NEON folding 0.006979 0.00000  test
                opt NEON folding 0.006979 0.00000  choice

                   Test duration    38.97 seconds
ID: 1847565 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1847597 - Posted: 9 Feb 2017, 20:02:28 UTC - in response to Message 1847565.  
Last modified: 9 Feb 2017, 20:03:09 UTC


I made the changes to analyzeFuncs_vfp.S. To get it to compile, I had to change the vmov to vmov.f64 here:

#else
                mov r2,r12
                vmov.f64 d9, d0
                vmov.f64 d10, d1
#endif


From what I read vmov.f64 would be correct syntax, but when I tried it I added space between vmov and dot so got error.
And just vmov assembled OK (but maybe produce garbage).
I'll try correct syntax.

Seems more work required to get correct results.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1847597 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1847985 - Posted: 11 Feb 2017, 6:13:17 UTC

I found another ARM assembly programing guide:

http://www.coranac.com/tonc/text/asm.htm - In section 23.3.1, it explains the registers and that sp is the stack pointer. It is also r13. It looks like r12 is the Intra-Procedure-call scratch. I'm trying the VFP code using r10 instead of r12 to see if it makes a difference.
ID: 1847985 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1848040 - Posted: 11 Feb 2017, 14:20:08 UTC - in response to Message 1847515.  

At this rate I estimate ARM may overtake x86/x86_64 within 3 years :D

Well I went and did it, Forked the RaspberryPi Kernel source tree, Updated vfpmodule.c and did a Pull request:

https://github.com/Claggy3/linux/commit/3bac5778aaa13b16d0dbb9d9dfa605bd15514154

It's been two and a half years since the bug was reported, we'll see if it gets accepted soon, and if it works.

Claggy
ID: 1848040 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1848055 - Posted: 11 Feb 2017, 15:35:48 UTC
Last modified: 11 Feb 2017, 15:38:21 UTC

The 8.04 app has hit Seti Beta last night.

Function choices on my Pi 2 underload:
setiathome_v8 8.00 Revision: 3618 g++ (Raspbian 4.9.2-10) 4.9.2
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is :  0.013040
features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.003757 0.00000  test
            vfp_GetPowerSpectrum 0.001644 0.00000  test
           neon_GetPowerSpectrum 0.002899 0.00000  test
            vfp_GetPowerSpectrum 0.001644 0.00000  choice

                     v_ChirpData 0.220993 0.00000  test
                   fpu_ChirpData 0.188798 0.94721  test
               fpu_opt_ChirpData 0.254211 0.00000  test
                     v_ChirpData 0.220993 0.00000  choice

                     v_Transpose 0.195790 0.00000  test
                    v_Transpose2 0.081458 0.00000  test
                    v_Transpose4 0.053155 0.00000  test
                    v_Transpose8 0.098426 0.00000  test
                 fftwf_transpose 0.032448 0.00000  test
                  v_pfTranspose2 0.092149 0.00000  test
                  v_pfTranspose4 0.051635 0.00000  test
                  v_pfTranspose8 0.087617 0.00000  test
                 v_vfpTranspose2 0.108129 0.00000  test
                 fftwf_transpose 0.032448 0.00000  choice

                 FPU opt folding 0.030280 0.00000  test
                 opt VFP folding 0.024221 0.20978  test
                opt NEON folding 0.021037 0.00000  test
                opt NEON folding 0.021037 0.00000  choice

                   Test duration    38.93 seconds

Claggy
ID: 1848055 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1848066 - Posted: 11 Feb 2017, 16:43:51 UTC - in response to Message 1848055.  
Last modified: 11 Feb 2017, 16:59:45 UTC

8.04 is the app without NEON or VFP chirp compiled with FFTW 3.3.6-pl1 with the slow timer:

./configure --enable-single --enable-neon --with-slow-timer


Basically 8.03 with the new FFTW. In my testing with Bench, it runs at about 125% of 8.03.
ID: 1848066 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : Number crunching : Linux (ARM processor) app and alternatives


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.