Message boards :
Number crunching :
Linux (ARM processor) app and alternatives
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
yes, I established that already compared C and its asm. Now attempting to re-code. EDIT: though sp[] notation looks more like stack thn integer registers. From the other side, chirp has no float. it has pointers,ints and doubles. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
LoL, have some reading for the next coding window http://www.peter-cockerell.net/aalp/resources/pdf/all.pdf SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It does garbage now but at least not fail immediately: before Chirp test: neon_ChirpData in[0].xy=(1.00000,0.00000)] in[1].xy=(1.00000,0.00000)] in[2].xy=(1.00000,0.00001)] after Chirp test: neon_ChirpData out[0].xy=(1.00000,0.00000)] out[1].xy=(0.85399,0.52029)] out[2].xy=(-0.57938,0.81506)] neon_ChirpData 0.071992 1.44774 test v_ChirpData 0.173869 0.00000 choice SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
I have attached two of my ARMv7 computers (Raspberry Pi 2 and Orange Pi One) to Beta running the ARMHF app compiled with FFTW 3.3.6-pl1. They are both showing a ~25% increase in speed on the various Beta tasks over the setiathome_8.03_arm-unknown-linux-gnueabihf app. This matches the Bench results. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
Can the same changes be made to the VFP code? On my computers, the vfp_GetPowerSpectrum and opt NEON folding functions are the fastest. It is not always the NEON functions. I haven't tried the app on my Raspberry Pi 1 recently, but I remember that the one of the v_pfTranspose functions was faster than the FFTW one. That computer is an ARMv6 with no NEON and it doesn't work well with the seti app due to a kernel bug. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.003291 0.00000 test vfp_GetPowerSpectrum 0.001460 0.00000 test neon_GetPowerSpectrum 0.002423 0.00000 test vfp_GetPowerSpectrum 0.001460 0.00000 choice v_ChirpData 0.163963 0.00000 test fpu_ChirpData 0.134297 0.94721 test fpu_opt_ChirpData 0.195064 0.00000 test v_ChirpData 0.163963 0.00000 choice v_Transpose 0.165621 0.00000 test v_Transpose2 0.084227 0.00000 test v_Transpose4 0.047448 0.00000 test v_Transpose8 0.083159 0.00000 test fftwf_transpose 0.026780 0.00000 test v_pfTranspose2 0.062298 0.00000 test v_pfTranspose4 0.035013 0.00000 test v_pfTranspose8 0.063770 0.00000 test v_vfpTranspose2 0.084038 0.00000 test fftwf_transpose 0.026780 0.00000 choice FPU opt folding 0.005288 0.00000 test opt VFP folding 0.004453 0.15613 test opt NEON folding 0.003941 0.00000 test opt NEON folding 0.003941 0.00000 choice Test duration 30.59 seconds |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
yes, but until neon_Chirp will give valid results not too much sense change vfp one. I assume processing chain in function works OK, just memory locations references are broken because of different stack size for soft and hard float. SETI apps news We're not gonna fight them. We're gonna transcend them. |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
Have attached a single Pi2 and a single Pi3 to Seti Beta to help out with testing. They've both picked up the 8.03 app and have four work units running. BOINC blog |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
My thinking is that the VFP code is simpler and might be easier to fix. I think it would be worth trying the changes you made to the NEON code that let it run without crashing on the VFP code. If you PM me the changes I can try them if you want. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
My thinking is that the VFP code is simpler and might be easier to fix. I think it would be worth trying the changes you made to the NEON code that let it run without crashing on the VFP code. If you PM me the changes I can try them if you want. Here they are: https://cloud.mail.ru/public/AGPM/QAy4nr3Wd Just uncomment line 413 in analyzeFuncs_vector.cpp. Maybe you or smth else could spot flaw in my changes or extend them to working code. I'll try again near to weekend. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
BTW, anyone attempted to build Android for Parallella or Pi ? Would be interesting to test existing soft fp apps under it. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
My thinking is that the VFP code is simpler and might be easier to fix. I think it would be worth trying the changes you made to the NEON code that let it run without crashing on the VFP code. If you PM me the changes I can try them if you want. Thanks for the code. Your changes and comments help make it more understandable. There are a few more places where sp is used and where d0, d1, or r3 should be referenced instead. I'm working on copying your changes into analyzeFuncs_vfp.S |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
Have attached a single Pi2 and a single Pi3 to Seti Beta to help out with testing. They've both picked up the 8.03 app and have four work units running. Got blc05 vlar work on both Pi2 and Pi3. Pi3 on 97% done after 23 hours and 6 mins so looks like it will be around 24 hours Pi2 on 30% done after 23 hours and 8 mins so looks like 6 days. BOINC blog |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I haven't tried the app on my Raspberry Pi 1 recently, but I remember that the one of the v_pfTranspose functions was faster than the FFTW one. That computer is an ARMv6 with no NEON and it doesn't work well with the seti app due to a kernel bug. We have a possible fix for that, hopefully someone will apply the patch to the kernel: https://github.com/raspberrypi/linux/issues/600 http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332633.html Claggy |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
At this rate I estimate ARM may overtake x86/x86_64 within 3 years :D "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
My thinking is that the VFP code is simpler and might be easier to fix. I think it would be worth trying the changes you made to the NEON code that let it run without crashing on the VFP code. If you PM me the changes I can try them if you want. I made the changes to analyzeFuncs_vfp.S. To get it to compile, I had to change the vmov to vmov.f64 here: #else mov r2,r12 vmov.f64 d9, d0 vmov.f64 d10, d1 #endif The test runs, but error is nan so it is not used. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.003465 0.00000 test vfp_GetPowerSpectrum 0.001584 0.00000 test neon_GetPowerSpectrum 0.002287 0.00000 test vfp_GetPowerSpectrum 0.001584 0.00000 choice v_ChirpData 0.192698 0.00000 test fpu_ChirpData 0.179985 1.51106 test fpu_opt_ChirpData 0.206817 0.00000 test vfp_ChirpData 0.074339 nan test v_ChirpData 0.192698 0.00000 choice v_Transpose 0.179137 0.00000 test v_Transpose2 0.089267 0.00000 test v_Transpose4 0.051027 0.00000 test v_Transpose8 0.085502 0.00000 test fftwf_transpose 0.029702 0.00000 test v_pfTranspose2 0.073391 0.00000 test v_pfTranspose4 0.045184 0.00000 test v_pfTranspose8 0.069477 0.00000 test v_vfpTranspose2 0.084667 0.00000 test fftwf_transpose 0.029702 0.00000 choice FPU opt folding 0.009317 0.00000 test opt VFP folding 0.007850 0.16100 test opt NEON folding 0.006979 0.00000 test opt NEON folding 0.006979 0.00000 choice Test duration 38.97 seconds |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
From what I read vmov.f64 would be correct syntax, but when I tried it I added space between vmov and dot so got error. And just vmov assembled OK (but maybe produce garbage). I'll try correct syntax. Seems more work required to get correct results. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
I found another ARM assembly programing guide: http://www.coranac.com/tonc/text/asm.htm - In section 23.3.1, it explains the registers and that sp is the stack pointer. It is also r13. It looks like r12 is the Intra-Procedure-call scratch. I'm trying the VFP code using r10 instead of r12 to see if it makes a difference. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
At this rate I estimate ARM may overtake x86/x86_64 within 3 years :D Well I went and did it, Forked the RaspberryPi Kernel source tree, Updated vfpmodule.c and did a Pull request: https://github.com/Claggy3/linux/commit/3bac5778aaa13b16d0dbb9d9dfa605bd15514154 It's been two and a half years since the bug was reported, we'll see if it gets accepted soon, and if it works. Claggy |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
The 8.04 app has hit Seti Beta last night. Function choices on my Pi 2 underload: setiathome_v8 8.00 Revision: 3618 g++ (Raspbian 4.9.2-10) 4.9.2 libboinc: BOINC 7.7.0 Work Unit Info: ............... WU true angle range is : 0.013040 features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.003757 0.00000 test vfp_GetPowerSpectrum 0.001644 0.00000 test neon_GetPowerSpectrum 0.002899 0.00000 test vfp_GetPowerSpectrum 0.001644 0.00000 choice v_ChirpData 0.220993 0.00000 test fpu_ChirpData 0.188798 0.94721 test fpu_opt_ChirpData 0.254211 0.00000 test v_ChirpData 0.220993 0.00000 choice v_Transpose 0.195790 0.00000 test v_Transpose2 0.081458 0.00000 test v_Transpose4 0.053155 0.00000 test v_Transpose8 0.098426 0.00000 test fftwf_transpose 0.032448 0.00000 test v_pfTranspose2 0.092149 0.00000 test v_pfTranspose4 0.051635 0.00000 test v_pfTranspose8 0.087617 0.00000 test v_vfpTranspose2 0.108129 0.00000 test fftwf_transpose 0.032448 0.00000 choice FPU opt folding 0.030280 0.00000 test opt VFP folding 0.024221 0.20978 test opt NEON folding 0.021037 0.00000 test opt NEON folding 0.021037 0.00000 choice Test duration 38.93 seconds Claggy |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
8.04 is the app without NEON or VFP chirp compiled with FFTW 3.3.6-pl1 with the slow timer: ./configure --enable-single --enable-neon --with-slow-timer Basically 8.03 with the new FFTW. In my testing with Bench, it runs at about 125% of 8.03. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.