Message boards :
Number crunching :
Linux (ARM processor) app and alternatives
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I found another ARM assembly programing guide: Both r10 and r12 are saved on stack in beginning of function (neon): push {r4,r5,r6,r7,r8,r9,r10,r11,r12,lr} but r10 used lter in original Mateusz code: adr r10,.Lcosapprox So seems not save to use r10 (for neon) w/o additional register reassignment. Still invalid results with .f64 added to vmov, looking further... EDIT: here is some stack arithmetics: sub sp,sp,#16+64 add r7,sp,#16 add r11,sp,#16+32 fstmiad sp,{d9,d10} so, some register spilling takes plce here, processed prms stored back on stack. But, as stack now has another size I suppose all sp-based offsets should be changed. 1 32bit and 2 64bit-wide params now in registers so seems like new stack 20 bytes shorter. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
I tried r10 with the VFP code last night and it makes no difference. I still get nan for the error. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I tried r10 with the VFP code last night and it makes no difference. I still get nan for the error. I use part of in and out bufs printing to see if issue solved. After changing sub sp,sp,#16+64 to sub sp,sp,#16+44 : before Chirp test: neon_ChirpData in[0].xy=(1.00000,0.00000)] in[1].xy=(1.00000,0.00000)] in[2].xy=(1.00000,0.00001)] after Chirp test: neon_ChirpData out[0].xy=( nan, nan)] out[1].xy=( nan, nan)] out[2].xy=( nan, nan)] before Chirp test: neon_ChirpData in[0].xy=(1.00000,0.00000)] in[1].xy=(1.00000,0.00000)] in[2].xy=(1.00000,0.00001)] after Chirp test: neon_ChirpData out[0].xy=(1.00000,0.00000)] out[1].xy=(-0.94075,0.33910)] out[2].xy=(0.18588,-0.98257)] So, it's not only different from correct processing. It's inconsistent. Same input - different output. Usually it means data was taken from wrong place, out of real function memory boundaries so some junk read. Input/output pointers were in registers before, nothing changed. So, most probably smth wrong with stack pointer still. Maybe I'm making wrong corrections. In case you want to enable debug printing too: int ind=TESTCHIRPIND; while ((j<100) && ((j<20) || (timing<(10*timer.resolution())))) { memset(outdata,0,NumDataPoints*sizeof(sah_complex)); #if 1 fprintf(stderr,"before Chirp test:%32s \n",ChirpDataFuncs[i].nom); for(int k=0;k<3;k++){ fprintf(stderr,"in[%d].xy=(%7.5f,%7.5f)]\n",k,indata[k][0],indata[k][1]); } #endif timer.start(); rv=ChirpDataFuncs[i].func(indata,outdata,ind,MinChirpStep*ind,NumDataPoints,swi.subband_sample_rate); onetime=timer.stop(); #if 1 fprintf(stderr,"after Chirp test:%32s \n",ChirpDataFuncs[i].nom); for(int k=0;k<3;k++){ fprintf(stderr,"out[%d].xy=(%7.5f,%7.5f)]\n",k,outdata[k][0],outdata[k][1]); } #endif near to 910 line of analyzeFuncs_vector.cpp SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
parallella@parallella:~/seti_boinc/client$ readelf -A setiathome-8.0.armv7l-unknown-linux-gnueabihf Attribute Section: aeabi File Attributes Tag_CPU_name: "7-A" Tag_CPU_arch: v7 Tag_CPU_arch_profile: Application Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-2 Tag_FP_arch: VFPv3 Tag_Advanced_SIMD_arch: NEONv1 Tag_ABI_PCS_wchar_t: 4 Tag_ABI_FP_rounding: Needed Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte Tag_ABI_align_preserved: 8-byte, except leaf SP Tag_ABI_enum_size: int Tag_ABI_HardFP_use: SP and DP Tag_ABI_VFP_args: VFP registers Tag_CPU_unaligned_access: v6 parallella@parallella:~/seti_boinc/client$ readelf -A setiathome_7.0_arm-android-linux-gnu Attribute Section: aeabi File Attributes Tag_CPU_name: "7-A" Tag_CPU_arch: v7 Tag_CPU_arch_profile: Application Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-2 Tag_FP_arch: VFPv3 Tag_Advanced_SIMD_arch: NEONv1 Tag_ABI_PCS_wchar_t: 4 Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte Tag_ABI_enum_size: int Tag_ABI_HardFP_use: SP and DP Tag_ABI_optimization_goals: Aggressive Size Tag_DIV_use: Not allowed Maybe useful: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf SETI apps news We're not gonna fight them. We're gonna transcend them. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
At this rate I estimate ARM may overtake x86/x86_64 within 3 years :D It's been committed to the rpi-4.4.y, 4.9.y and 4.10.y branches, now just waiting for an update. Claggy |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Seems I found error in my changes. double registers storing was misplaced and active only for zero chirp. So moved them and now got: before Chirp test: neon_ChirpData in[0].xy=(1.00000,0.00000)] in[1].xy=(1.00000,0.00000)] in[2].xy=(1.00000,0.00001)] after Chirp test: neon_ChirpData out[0].xy=(1.00000,0.00000)] out[1].xy=(1.00000,0.00000)] out[2].xy=(1.00000,0.00001)] That is, just as with other functions. Now need to test on smth bigger. neon_ChirpData 0.072044 0.00000 test neon_ChirpData 0.072044 0.00000 choice SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
KWSN-Linux-MBbench v3.0 cache-keeping edition SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
So, now I'm removing debug printing and starting direct comparison vs beta's 8.04 on PG set. This will take night My build: Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) vfp_GetPowerSpectrum 0.003225 0.00000 neon_ChirpData 0.071909 0.00000 fftwf_transpose 0.026132 0.00000 8.04: Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) vfp_GetPowerSpectrum 0.003151 0.00000 v_ChirpData 0.176926 0.00000 fftwf_transpose 0.026170 0.00000 (wisgen task has no PulseFind so no folding testing included) SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
That looks good. Can you share your code again? Or since it is working upload it to the SETI svn site? I can apply the changes to the VFP code if that helps. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
That looks good. Can you share your code again? Or since it is working upload it to the SETI svn site? I can apply the changes to the VFP code if that helps. Can't be sure that it's 100% working for now, but looks OK enough to commit indeed. Will do in few minutes. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
Great! I will test it tonight also. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Looking good so far: KWSN-Linux-MBbench v3.0 cache-keeping edition Running on parallella at Sat 11 Feb 2017 10:47:57 PM UTC ---------------------------------------------------------------- Starting benchmark run... ---------------------------------------------------------------- Suspending BOINC Listing wu-file(s) in /testWUs : #WisGen1_v8.wu #WisGen2_v8.wu PG0009_v8.wu PG0395_v8.wu PG0444_v8.wu PG1327_v8.wu Listing executable(s) in /APPS : setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp setiathome_8.04_arm-unknown-linux-gnueabihf Listing executable in /REF_APPS : setiathome_8.02_arm-unknown-linux-gnueabihf ---------------------------------------------------------------- Current WU: #WisGen1_v8.wu ---------------------------------------------------------------- Skipping default app setiathome_8.02_arm-unknown-linux-gnueabihf, displaying saved result(s) Elapsed Time: ....................... 43 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp ./setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp 389.29 sec 381.62 sec 5.07 sec Elapsed Time : ...................... 389 seconds Speed compared to default : ......... 11 % ----------------- Comparing results Result : Strongly similar, Q= 99.99% ---------------------------------------------------------------- Running app with command : .......... setiathome_8.04_arm-unknown-linux-gnueabihf ./setiathome_8.04_arm-unknown-linux-gnueabihf 100.30 sec 95.33 sec 2.95 sec Elapsed Time : ...................... 101 seconds Speed compared to default : ......... 42 % ----------------- Comparing results Result : Strongly similar, Q= 99.99% ---------------------------------------------------------------- Done with #WisGen1_v8.wu ==================================================================== Current WU: #WisGen2_v8.wu ---------------------------------------------------------------- Skipping default app setiathome_8.02_arm-unknown-linux-gnueabihf, displaying saved result(s) Elapsed Time: ....................... 41 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp ./setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp 100.69 sec 95.69 sec 2.74 sec Elapsed Time : ...................... 101 seconds Speed compared to default : ......... 40 % ----------------- Comparing results Result : Strongly similar, Q= 99.99% ---------------------------------------------------------------- Running app with command : .......... setiathome_8.04_arm-unknown-linux-gnueabihf ./setiathome_8.04_arm-unknown-linux-gnueabihf 100.89 sec 95.32 sec 2.73 sec Elapsed Time : ...................... 101 seconds Speed compared to default : ......... 40 % ----------------- Comparing results Result : Strongly similar, Q= 99.99% ---------------------------------------------------------------- Done with #WisGen2_v8.wu ==================================================================== Current WU: PG0009_v8.wu ---------------------------------------------------------------- Skipping default app setiathome_8.02_arm-unknown-linux-gnueabihf, displaying saved result(s) Elapsed Time: ....................... 9511 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp ./setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp 7646.33 sec 7573.52 sec 69.12 sec Elapsed Time : ...................... 7647 seconds Speed compared to default : ......... 124 % ----------------- Comparing results Result : Strongly similar, Q= 99.54% ---------------------------------------------------------------- Running app with command : .......... setiathome_8.04_arm-unknown-linux-gnueabihf ./setiathome_8.04_arm-unknown-linux-gnueabihf 8072.08 sec 7990.89 sec 76.75 sec Elapsed Time : ...................... 8072 seconds Speed compared to default : ......... 117 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% ---------------------------------------------------------------- Done with PG0009_v8.wu ==================================================================== Current WU: PG0395_v8.wu ---------------------------------------------------------------- Skipping default app setiathome_8.02_arm-unknown-linux-gnueabihf, displaying saved result(s) Elapsed Time: ....................... 10131 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp ./setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp 8069.84 sec 7998.07 sec 68.46 sec Elapsed Time : ...................... 8069 seconds Speed compared to default : ......... 125 % ----------------- Comparing results Result : Strongly similar, Q= 99.89% ---------------------------------------------------------------- Running app with command : .......... setiathome_8.04_arm-unknown-linux-gnueabihf ./setiathome_8.04_arm-unknown-linux-gnueabihf 8430.43 sec 8354.50 sec 70.59 sec Elapsed Time : ...................... 8430 seconds Speed compared to default : ......... 120 % ----------------- Comparing results Result : Strongly similar, Q= 99.99% ---------------------------------------------------------------- Done with PG0395_v8.wu ==================================================================== SETI apps news We're not gonna fight them. We're gonna transcend them. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
I ran it overnight also, but it look like it is not working for me. It doesn't test correctly. Did your test work? Work Unit Info: ............... WU true angle range is : 0.008955 Getting CPU Capabilities from /proc/cpuinfo features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.002766 0.00000 test vfp_GetPowerSpectrum 0.001029 0.00000 test neon_GetPowerSpectrum 0.002034 0.00000 test vfp_GetPowerSpectrum 0.001029 0.00000 choice v_ChirpData 0.162285 0.00000 test fpu_ChirpData 0.171281 1.51106 test fpu_opt_ChirpData 0.183292 0.00000 test vfp_ChirpData 0.069825 nan test neon_ChirpData 0.074759 1.67885 test v_ChirpData 0.162285 0.00000 choice v_Transpose 0.103967 0.00000 test v_Transpose2 0.053552 0.00000 test v_Transpose4 0.032033 0.00000 test v_Transpose8 0.059765 0.00000 test fftwf_transpose 0.026291 0.00000 test v_pfTranspose2 0.050261 0.00000 test v_pfTranspose4 0.029482 0.00000 test v_pfTranspose8 0.052212 0.00000 test v_vfpTranspose2 0.053029 0.00000 test fftwf_transpose 0.026291 0.00000 choice FPU opt folding 0.023734 0.00000 test opt VFP folding 0.018469 0.20945 test opt NEON folding 0.015369 0.00000 test opt NEON folding 0.015369 0.00000 choice Test duration 35.37 seconds For some reason mine is also 5% faster than 8.04. They should be the same. KWSN-Linux-MBbench v2.1.08 Running on pitft at Sun 12 Feb 2017 05:55:14 AM UTC ---------------------------------------------------------------- Starting benchmark run... ---------------------------------------------------------------- Listing wu-file(s) in /testWUs : PG0009_v8.wu Listing executable(s) in /APPS : setiathome-8.neonvfp.armv7l-unknown-linux-gnueabihf Listing executable in /REF_APPS : setiathome_8.04_arm-unknown-linux-gnueabihf ---------------------------------------------------------------- Current WU: PG0009_v8.wu ---------------------------------------------------------------- Running default app with command :... setiathome_8.04_arm-unknown-linux-gnueabihf -verb Elapsed Time: ....................... 6542 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome-8.neonvfp.armv7l-unknown-linux-gnueabihf -verb Elapsed Time : ...................... 6216 seconds Speed compared to default : ......... 105 % ----------------- Comparing results Result : Strongly similar, Q= 99.54% ---------------------------------------------------------------- Done with PG0009_v8.wu ==================================================================== Done with Benchmark run! Removing temporary files! |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I'll check function choices: WU true angle range is : 0.008955 vfp_GetPowerSpectrum 0.003188 0.00000 neon_ChirpData 0.071891 0.00000 fftwf_transpose 0.026243 0.00000 opt NEON folding 0.021267 0.00000 WU true angle range is : 0.394768 vfp_GetPowerSpectrum 0.003558 0.00000 neon_ChirpData 0.079603 0.00000 fftwf_transpose 0.029865 0.00000 opt NEON folding 0.012020 0.00000 WU true angle range is : 0.444184 vfp_GetPowerSpectrum 0.003262 0.00000 neon_ChirpData 0.071950 0.00000 fftwf_transpose 0.027387 0.00000 opt NEON folding 0.014910 0.00000 WU true angle range is : 1.326684 v_BaseLineSmooth (no other) vfp_GetPowerSpectrum 0.003539 0.00000 neon_ChirpData 0.079539 0.00000 fftwf_transpose 0.030009 0.00000 opt NEON folding 0.007281 0.00000 And one more finished task: Current WU: PG0444_v8.wu ---------------------------------------------------------------- Skipping default app setiathome_8.02_arm-unknown-linux-gnueabihf, displaying saved result(s) Elapsed Time: ....................... 9466 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp ./setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp 7454.49 sec 7376.35 sec 70.34 sec Elapsed Time : ...................... 7454 seconds Speed compared to default : ......... 126 % ----------------- Comparing results Result : Strongly similar, Q= 99.78% ---------------------------------------------------------------- Running app with command : .......... setiathome_8.04_arm-unknown-linux-gnueabihf ./setiathome_8.04_arm-unknown-linux-gnueabihf 7778.20 sec 7700.31 sec 68.62 sec Elapsed Time : ...................... 7779 seconds Speed compared to default : ......... 121 % ----------------- Comparing results Result : Strongly similar, Q= 99.98% ---------------------------------------------------------------- Done with PG0444_v8.wu Looks like it works on Parallella so far. All PG tasks are finished correctly. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And the last one: Current WU: PG1327_v8.wu ---------------------------------------------------------------- Skipping default app setiathome_8.02_arm-unknown-linux-gnueabihf, displaying saved result(s) Elapsed Time: ....................... 11311 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp ./setiathome-8.0.armv7l-unknown-linux-gnueabihf_R_NEONchirp 6867.93 sec 6698.40 sec 161.34 sec Elapsed Time : ...................... 6868 seconds Speed compared to default : ......... 164 % ----------------- Comparing results Result : Strongly similar, Q= 99.29% ---------------------------------------------------------------- Running app with command : .......... setiathome_8.04_arm-unknown-linux-gnueabihf ./setiathome_8.04_arm-unknown-linux-gnueabihf 7999.04 sec 7824.00 sec 166.34 sec Elapsed Time : ...................... 7999 seconds Speed compared to default : ......... 141 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% ---------------------------------------------------------------- Done with PG1327_v8.wu Here the new build has maximum effect because of bigger share of chirping in total processing time for VHARs. I prepared "tiny" PG set so will repeat now with -verb enabled (w/o FFTW3.3.4 based one to not break wisdom each time) SETI apps news We're not gonna fight them. We're gonna transcend them. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
8.04 errored on my Pi 1 Model B, don't know if it was because I was working on it at the time, will get another Wu in a while: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=26405085 <core_client_version>7.6.33</core_client_version> Claggy |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
http://stackoverflow.com/questions/8556447/linux-core-dump-about-signal-4 SETI apps news We're not gonna fight them. We're gonna transcend them. |
sorcrosc Send message Joined: 13 Dec 08 Posts: 3 Credit: 2,374,066 RAC: 0 |
Same here on rpi 1 <core_client_version>7.6.33</core_client_version> <![CDATA[ <message> process got signal 4 </message> <stderr_txt> setiathome_v8 8.00 Revision: 3618 g++ (Raspbian 4.9.2-10) 4.9.2 libboinc: BOINC 7.7.0 Work Unit Info: ............... WU true angle range is : 0.011565 </stderr_txt> ]]> http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=26404951 It is armv7 application and should also be released with neon plan class. Am I wrong? lo@raspberrypi /var/lib/boinc-client/projects/setiweb.ssl.berkeley.edu_beta $ readelf -A setiathome_8.04_arm-unknown-linux-gnueabihf Attribute Section: aeabi Attributi file Tag_CPU_name: "7-A" Tag_CPU_arch: v7 Tag_CPU_arch_profile: Applicazione Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-2 Tag_FP_arch: VFPv3 Tag_Advanced_SIMD_arch: NEONv1 Tag_ABI_PCS_wchar_t: 4 Tag_ABI_FP_rounding: Needed Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8 byte Tag_ABI_align_preserved: 8 byte, ad eccezione della foglia SP Tag_ABI_enum_size: int Tag_ABI_HardFP_use: SP and DP Tag_ABI_VFP_args: VFP registers Tag_CPU_unaligned_access: v6 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Ideally it should use NEON only inside few functions. And in bench process it should determine that they failing so fall-back to VFP only. From stderr hard to say if compiler inserted NEON instruction outside those functions or app failed to catch exception from bench. More probably first still. Cause in bench there is baseline function that doesn't require any testing but log ends before it was reached. That is, compiler inserted NEON instruction by its own. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.