Message boards :
Number crunching :
AVX Extensions - Ongoing development?
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Orioneti Send message Joined: 22 Oct 07 Posts: 21 Credit: 23,642,634 RAC: 0 |
Results from the revised test on a 2600k@4488Mhz w7sp1 ... ========================================================= Ftst_v7_J29 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000089 0.00000 test v_vGetPowerSpectrum 0.000045 0.00000 test v_vGetPowerSpectrum2 0.000054 0.00000 test v_vGetPowerSpectrumUnrolled 0.000041 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000056 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 choice v_ChirpData 0.003712 0.00000 test fpu_ChirpData 0.008928 0.00000 test fpu_opt_ChirpData 0.003713 0.00000 test v_vChirpData_x86_64 0.042927 0.00000 test sse1_ChirpData_ak 0.005286 0.00000 test sse2_ChirpData_ak 0.004975 0.00000 test sse3_ChirpData_ak 0.004841 0.00000 test avx_ChirpData_a 0.001872 0.90046 test avx_ChirpData_b 0.001968 0.90046 test v_ChirpData 0.003712 0.00000 choice v_Transpose 0.002283 0.00000 test v_Transpose2 0.002477 0.00000 test v_Transpose4 0.001261 0.00000 test v_Transpose8 0.002309 0.00000 test v_pfTranspose2 0.001363 0.00000 test v_pfTranspose4 0.001262 0.00000 test v_pfTranspose8 0.002642 0.00000 test v_vTranspose4 0.000735 0.00000 test v_vTranspose4np 0.000965 0.00000 test v_vTranspose4ntw 0.006052 0.00000 test v_vTranspose4x8ntw 0.002501 0.00000 test v_vTranspose4x16ntw 0.000702 0.00000 test v_vpfTranspose8x4ntw 0.006024 0.00000 test v_avxTranspose8x4ntw 0.002515 0.00000 test v_avxTranspose8x8ntw_a 0.001991 0.00000 test v_avxTranspose8x8ntw_b 0.002337 0.00000 test v_vTranspose4x16ntw 0.000702 0.00000 choice FPU opt folding 0.001727 0.00000 test AK SSE folding 0.000379 0.00000 test BH SSE folding 0.000369 0.00000 test BH SSE folding 0.000369 0.00000 choice Test duration 2.16 seconds Ftst_v7 completed successfully. Thought you might need this to be run on stock clocks so i dit it again (3.4Ghz) ... ========================================================= Ftst_v7_J29 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000114 0.00000 test v_vGetPowerSpectrum 0.000057 0.00000 test v_vGetPowerSpectrum2 0.000069 0.00000 test v_vGetPowerSpectrumUnrolled 0.000053 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000072 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 choice v_ChirpData 0.004167 0.00000 test fpu_ChirpData 0.011857 0.00000 test fpu_opt_ChirpData 0.004137 0.00000 test v_vChirpData_x86_64 0.055058 0.00000 test sse1_ChirpData_ak 0.006802 0.00000 test sse2_ChirpData_ak 0.006394 0.00000 test sse3_ChirpData_ak 0.006221 0.00000 test avx_ChirpData_a 0.002398 0.90046 test avx_ChirpData_b 0.002520 0.90046 test fpu_opt_ChirpData 0.004137 0.00000 choice v_Transpose 0.002908 0.00000 test v_Transpose2 0.003244 0.00000 test v_Transpose4 0.001616 0.00000 test v_Transpose8 0.002958 0.00000 test v_pfTranspose2 0.001697 0.00000 test v_pfTranspose4 0.001614 0.00000 test v_pfTranspose8 0.003371 0.00000 test v_vTranspose4 0.000899 0.00000 test v_vTranspose4np 0.001236 0.00000 test v_vTranspose4ntw 0.007451 0.00000 test v_vTranspose4x8ntw 0.003057 0.00000 test v_vTranspose4x16ntw 0.000879 0.00000 test v_vpfTranspose8x4ntw 0.007433 0.00000 test v_avxTranspose8x4ntw 0.003085 0.00000 test v_avxTranspose8x8ntw_a 0.002436 0.00000 test v_avxTranspose8x8ntw_b 0.002865 0.00000 test v_vTranspose4x16ntw 0.000879 0.00000 choice FPU opt folding 0.002219 0.00000 test AK SSE folding 0.000487 0.00000 test BH SSE folding 0.000474 0.00000 test BH SSE folding 0.000474 0.00000 choice Test duration 2.74 seconds Ftst_v7 completed successfully. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Results from the revised test on a 2600k@4488Mhz w7sp1 ... Thank you! Obviously there's another issue with the chirp accuracy to find, so I'll have another version in a day or two. Joe |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Ftst_v7_J32 is attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37398.html#msg37398. Fixed a problem in the AVX chirp tests, hope that's the last one. Also added additional transpose testing, partly looking for the most effective AVX version, partly to provide data concerning how different systems react at different FFT lengths. Joe |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Gah! Ftst_v7_J32 is withdrawn until I figure out more problems. The chirps still aren't right though they do run, the first of the new transposes crashes on an i7 2600 w/W7 64 SP1 . Joe |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Ftst_v7_J34 is attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37407.html#msg37407. The new transpose functions should no longer crash, and I have high hopes the chirp functions will finally work accurately. Joe |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
Forgive me for not understanding what you posted above. But on average how much more efficient is the chip performing with AVX? In efficient I mean faster. ;P Traveling through space at ~67,000mph! |
Orioneti Send message Joined: 22 Oct 07 Posts: 21 Credit: 23,642,634 RAC: 0 |
2600k@4488Mhz w7sp1 ... Ftst_v7_J34 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000089 0.00000 test v_vGetPowerSpectrum 0.000044 0.00000 test v_vGetPowerSpectrum2 0.000053 0.00000 test v_vGetPowerSpectrumUnrolled 0.000041 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000056 0.00000 test v_avxGetPowerSpectrum 0.000036 0.00000 test v_avxGetPowerSpectrum 0.000036 0.00000 choice v_ChirpData 0.003722 0.00000 test fpu_ChirpData 0.008926 0.00000 test fpu_opt_ChirpData 0.003717 0.00000 test v_vChirpData_x86_64 0.042824 0.00000 test sse1_ChirpData_ak 0.005276 0.00000 test sse2_ChirpData_ak 0.004986 0.00000 test sse2_ChirpData_ak8 0.003340 0.00000 test sse3_ChirpData_ak 0.004876 0.00000 test avx_ChirpData_a 0.001531 0.00000 test avx_ChirpData_b 0.001708 0.00000 test avx_ChirpData_a 0.001531 0.00000 choice v_Transpose 0.002715 0.00000 test v_Transpose2 0.002744 0.00000 test v_Transpose4 0.001676 0.00000 test v_Transpose8 0.002953 0.00000 test v_pfTranspose2 0.002846 0.00000 test v_pfTranspose4 0.002017 0.00000 test v_pfTranspose8 0.003382 0.00000 test v_vTranspose4 0.001384 0.00000 test v_vTranspose4np 0.000915 0.00000 test v_vTranspose4ntw 0.004736 0.00000 test v_vTranspose4x8ntw 0.002408 0.00000 test v_vTranspose4x16ntw 0.000556 0.00000 test v_vpfTranspose8x4ntw 0.005043 0.00000 test v_avxTranspose4x8ntw 0.002066 0.00000 test v_avxTranspose4x16ntw 0.000536 0.00000 test v_avxTranspose8x4ntw 0.004097 6855598.48924 test v_avxTranspose8x8ntw_a 0.002547 0.00000 test v_avxTranspose8x8ntw_b 0.002697 0.00000 test v_avxTranspose4x16ntw 0.000536 0.00000 choice v_Transpose 0.002271 0.00000 test v_Transpose2 0.002476 0.00000 test v_Transpose4 0.001258 0.00000 test v_Transpose8 0.002341 0.00000 test v_pfTranspose2 0.001355 0.00000 test v_pfTranspose4 0.001307 0.00000 test v_pfTranspose8 0.002332 0.00000 test v_vTranspose4 0.000731 0.00000 test v_vTranspose4np 0.000965 0.00000 test v_vTranspose4ntw 0.006015 0.00000 test v_vTranspose4x8ntw 0.002499 0.00000 test v_vTranspose4x16ntw 0.000707 0.00000 test v_vpfTranspose8x4ntw 0.005977 0.00000 test v_avxTranspose4x8ntw 0.002483 0.00000 test v_avxTranspose4x16ntw 0.000592 0.00000 test v_avxTranspose8x4ntw 0.006115 6847381.92511 test v_avxTranspose8x8ntw_a 0.001990 0.00000 test v_avxTranspose8x8ntw_b 0.002335 0.00000 test v_avxTranspose4x16ntw 0.000592 0.00000 choice FPU opt folding 0.001727 0.00000 test AK SSE folding 0.000380 0.00000 test BH SSE folding 0.000369 0.00000 test BH SSE folding 0.000369 0.00000 choice Test duration 2.87 seconds Ftst_v7 completed successfully. 2600k@stock w7sp1 ... Ftst_v7_J34 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000114 0.00000 test v_vGetPowerSpectrum 0.000057 0.00000 test v_vGetPowerSpectrum2 0.000069 0.00000 test v_vGetPowerSpectrumUnrolled 0.000053 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000072 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 choice v_ChirpData 0.004160 0.00000 test fpu_ChirpData 0.011392 0.00000 test fpu_opt_ChirpData 0.004138 0.00000 test v_vChirpData_x86_64 0.054736 0.00000 test sse1_ChirpData_ak 0.006747 0.00000 test sse2_ChirpData_ak 0.006364 0.00000 test sse2_ChirpData_ak8 0.004258 0.00000 test sse3_ChirpData_ak 0.006223 0.00000 test avx_ChirpData_a 0.001948 0.00000 test avx_ChirpData_b 0.002180 0.00000 test avx_ChirpData_a 0.001948 0.00000 choice v_Transpose 0.004159 0.00000 test v_Transpose2 0.004275 0.00000 test v_Transpose4 0.002162 0.00000 test v_Transpose8 0.003819 0.00000 test v_pfTranspose2 0.003754 0.00000 test v_pfTranspose4 0.002643 0.00000 test v_pfTranspose8 0.004395 0.00000 test v_vTranspose4 0.001801 0.00000 test v_vTranspose4np 0.001373 0.00000 test v_vTranspose4ntw 0.005843 0.00000 test v_vTranspose4x8ntw 0.002913 0.00000 test v_vTranspose4x16ntw 0.000664 0.00000 test v_vpfTranspose8x4ntw 0.006182 0.00000 test v_avxTranspose4x8ntw 0.002525 0.00000 test v_avxTranspose4x16ntw 0.000646 0.00000 test v_avxTranspose8x4ntw 0.005015 6855598.48924 test v_avxTranspose8x8ntw_a 0.002995 0.00000 test v_avxTranspose8x8ntw_b 0.003235 0.00000 test v_avxTranspose4x16ntw 0.000646 0.00000 choice v_Transpose 0.002892 0.00000 test v_Transpose2 0.003164 0.00000 test v_Transpose4 0.001605 0.00000 test v_Transpose8 0.002977 0.00000 test v_pfTranspose2 0.001691 0.00000 test v_pfTranspose4 0.001695 0.00000 test v_pfTranspose8 0.003038 0.00000 test v_vTranspose4 0.000954 0.00000 test v_vTranspose4np 0.001273 0.00000 test v_vTranspose4ntw 0.007372 0.00000 test v_vTranspose4x8ntw 0.002954 0.00000 test v_vTranspose4x16ntw 0.000870 0.00000 test v_vpfTranspose8x4ntw 0.007294 0.00000 test v_avxTranspose4x8ntw 0.003084 0.00000 test v_avxTranspose4x16ntw 0.000721 0.00000 test v_avxTranspose8x4ntw 0.007651 6847381.92511 test v_avxTranspose8x8ntw_a 0.002454 0.00000 test v_avxTranspose8x8ntw_b 0.002877 0.00000 test v_avxTranspose4x16ntw 0.000721 0.00000 choice FPU opt folding 0.002209 0.00000 test AK SSE folding 0.000487 0.00000 test BH SSE folding 0.000473 0.00000 test BH SSE folding 0.000473 0.00000 choice Test duration 3.62 seconds Ftst_v7 completed successfully. |
ML1 Send message Joined: 25 Nov 01 Posts: 21233 Credit: 7,508,002 RAC: 20 |
OK, some nice speedups developing there. Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I've attached a new version of the quick test to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37554.html#msg37554. It includes a fix for the 8x4 AVX transpose, plus my first attempt at AVX folding subroutines to speed up pulse finding. -BeNt- wrote: Forgive me for not understanding what you posted above. But on average how much more efficient is the chip performing with AVX? In efficient I mean faster. ;P Sorry about the delayed reply. The simplest approximation is that because 8 single precision floats can be done at once rather than 4 with SSEx, parts of the code can be made twice as fast. The various samples that Intel engineers have published range from about 1.79x to 2.53x. In terms of what I've coded, this pair shows a similar speed boost: sse2_ChirpData_ak8 0.004258 0.00000 test avx_ChirpData_a 0.001948 0.00000 test As chirping accounts for less than 15% of run time, there's no miraculous speedup overall expected. I'm just trying to take advantage of the new capabilities to increase efficiency. There are large parts of the code which are effectively limited by how fast data can be fetched and stored, AVX doesn't help there though of course newer hardware does have faster memory access. Pulse finding is another area which is more limited by processing speed than memory access, though, and it also accounts for a sizable fraction of run time. But even if I achieve a speed doubling in the folding subroutines it uses, there's a lot of surrounding logic which can't be vectorized so again the gains won't be huge overall. Joe |
Orioneti Send message Joined: 22 Oct 07 Posts: 21 Credit: 23,642,634 RAC: 0 |
2600k@4488Mhz w7sp1 ... Ftst_v7_J34 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000089 0.00000 test v_vGetPowerSpectrum 0.000044 0.00000 test v_vGetPowerSpectrum2 0.000054 0.00000 test v_vGetPowerSpectrumUnrolled 0.000041 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000056 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 choice v_ChirpData 0.003721 0.00000 test fpu_ChirpData 0.008925 0.00000 test fpu_opt_ChirpData 0.003715 0.00000 test v_vChirpData_x86_64 0.042826 0.00000 test sse1_ChirpData_ak 0.005291 0.00000 test sse2_ChirpData_ak 0.004984 0.00000 test sse2_ChirpData_ak8 0.003334 0.00000 test sse3_ChirpData_ak 0.004858 0.00000 test avx_ChirpData_a 0.001532 0.00000 test avx_ChirpData_b 0.001710 0.00000 test avx_ChirpData_a 0.001532 0.00000 choice v_Transpose 0.002279 0.00000 test v_Transpose2 0.002531 0.00000 test v_Transpose4 0.001259 0.00000 test v_Transpose8 0.002332 0.00000 test v_pfTranspose2 0.001351 0.00000 test v_pfTranspose4 0.001309 0.00000 test v_pfTranspose8 0.002333 0.00000 test v_vTranspose4 0.000728 0.00000 test v_vTranspose4np 0.000979 0.00000 test v_vTranspose4ntw 0.005982 0.00000 test v_vTranspose4x8ntw 0.002520 0.00000 test v_vTranspose4x16ntw 0.000708 0.00000 test v_vpfTranspose8x4ntw 0.005973 0.00000 test v_avxTranspose4x8ntw 0.002510 0.00000 test v_avxTranspose4x16ntw 0.000596 0.00000 test v_avxTranspose8x4ntw 0.006122 0.00000 test v_avxTranspose8x8ntw_a 0.002010 0.00000 test v_avxTranspose8x8ntw_b 0.002378 0.00000 test v_avxTranspose4x16ntw 0.000596 0.00000 choice FPU opt folding 0.001725 0.00000 test AK SSE folding 0.000381 0.00000 test BH SSE folding 0.000369 0.00000 test JS AVX folding 0.000345 0.00000 test JS AVX folding 0.000345 0.00000 choice Test duration 2.36 seconds Ftst_v7 completed successfully. 2600k@stock w7sp1 ... Ftst_v7_J34 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000114 0.00000 test v_vGetPowerSpectrum 0.000057 0.00000 test v_vGetPowerSpectrum2 0.000069 0.00000 test v_vGetPowerSpectrumUnrolled 0.000053 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000072 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 choice v_ChirpData 0.004194 0.00000 test fpu_ChirpData 0.011392 0.00000 test fpu_opt_ChirpData 0.004141 0.00000 test v_vChirpData_x86_64 0.054739 0.00000 test sse1_ChirpData_ak 0.006759 0.00000 test sse2_ChirpData_ak 0.006369 0.00000 test sse2_ChirpData_ak8 0.004255 0.00000 test sse3_ChirpData_ak 0.006217 0.00000 test avx_ChirpData_a 0.001947 0.00000 test avx_ChirpData_b 0.002183 0.00000 test avx_ChirpData_a 0.001947 0.00000 choice v_Transpose 0.002886 0.00000 test v_Transpose2 0.003163 0.00000 test v_Transpose4 0.001607 0.00000 test v_Transpose8 0.002988 0.00000 test v_pfTranspose2 0.001688 0.00000 test v_pfTranspose4 0.001670 0.00000 test v_pfTranspose8 0.002975 0.00000 test v_vTranspose4 0.000893 0.00000 test v_vTranspose4np 0.001229 0.00000 test v_vTranspose4ntw 0.007519 0.00000 test v_vTranspose4x8ntw 0.003064 0.00000 test v_vTranspose4x16ntw 0.000866 0.00000 test v_vpfTranspose8x4ntw 0.007394 0.00000 test v_avxTranspose4x8ntw 0.003049 0.00000 test v_avxTranspose4x16ntw 0.000716 0.00000 test v_avxTranspose8x4ntw 0.007558 0.00000 test v_avxTranspose8x8ntw_a 0.002434 0.00000 test v_avxTranspose8x8ntw_b 0.002864 0.00000 test v_avxTranspose4x16ntw 0.000716 0.00000 choice FPU opt folding 0.002207 0.00000 test AK SSE folding 0.000487 0.00000 test BH SSE folding 0.000472 0.00000 test JS AVX folding 0.000438 0.00000 test JS AVX folding 0.000438 0.00000 choice Test duration 2.96 seconds Ftst_v7 completed successfully. |
Orioneti Send message Joined: 22 Oct 07 Posts: 21 Credit: 23,642,634 RAC: 0 |
2600k@4488Mhz w7sp1 ... Ftst_v7_J39 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000089 0.00000 test v_vGetPowerSpectrum 0.000044 0.00000 test v_vGetPowerSpectrum2 0.000053 0.00000 test v_vGetPowerSpectrumUnrolled 0.000041 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000056 0.00000 test v_avxGetPowerSpectrum 0.000036 0.00000 test v_avxGetPowerSpectrum 0.000036 0.00000 choice v_ChirpData 0.003722 0.00000 test fpu_ChirpData 0.008986 0.00000 test fpu_opt_ChirpData 0.003729 0.00000 test v_vChirpData_x86_64 0.042929 0.00000 test sse1_ChirpData_ak 0.005384 0.00000 test sse2_ChirpData_ak 0.004980 0.00000 test sse2_ChirpData_ak8 0.003337 0.00000 test sse3_ChirpData_ak 0.004873 0.00000 test avx_ChirpData_a 0.001528 0.00000 test avx_ChirpData_b 0.001712 0.00000 test avx_ChirpData_c 0.001542 0.00000 test avx_ChirpData_a 0.001528 0.00000 choice v_Transpose 0.002273 0.00000 test v_Transpose2 0.002478 0.00000 test v_Transpose4 0.001262 0.00000 test v_Transpose8 0.002308 0.00000 test v_pfTranspose2 0.001353 0.00000 test v_pfTranspose4 0.001261 0.00000 test v_pfTranspose8 0.002629 0.00000 test v_vTranspose4 0.000732 0.00000 test v_vTranspose4np 0.000966 0.00000 test v_vTranspose4ntw 0.005996 0.00000 test v_vTranspose4x8ntw 0.002518 0.00000 test v_vTranspose4x16ntw 0.000711 0.00000 test v_vpfTranspose8x4ntw 0.006027 0.00000 test v_avxTranspose4x8ntw 0.002516 0.00000 test v_avxTranspose4x16ntw 0.000599 0.00000 test v_avxTranspose8x4ntw 0.006136 0.00000 test v_avxTranspose8x8ntw_a 0.002010 0.00000 test v_avxTranspose8x8ntw_b 0.002358 0.00000 test v_avxTranspose4x16ntw 0.000599 0.00000 choice FPU opt folding 0.001726 0.00000 test AK SSE folding 0.000379 0.00000 test BH SSE folding 0.000369 0.00000 test JS AVX_a folding 0.000327 0.00000 test JS_AVX_b folding 0.000426 0.00000 test JS AVX_a folding 0.000327 0.00000 choice Test duration 2.48 seconds Ftst_v7 completed successfully. 2600k@stock w7sp1 ... Ftst_v7_J39 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000114 0.00000 test v_vGetPowerSpectrum 0.000057 0.00000 test v_vGetPowerSpectrum2 0.000068 0.00000 test v_vGetPowerSpectrumUnrolled 0.000053 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000072 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 choice v_ChirpData 0.004161 0.00000 test fpu_ChirpData 0.011387 0.00000 test fpu_opt_ChirpData 0.004181 0.00000 test v_vChirpData_x86_64 0.055388 0.00000 test sse1_ChirpData_ak 0.006907 0.00000 test sse2_ChirpData_ak 0.006409 0.00000 test sse2_ChirpData_ak8 0.004268 0.00000 test sse3_ChirpData_ak 0.006316 0.00000 test avx_ChirpData_a 0.001956 0.00000 test avx_ChirpData_b 0.002206 0.00000 test avx_ChirpData_c 0.001965 0.00000 test avx_ChirpData_a 0.001956 0.00000 choice v_Transpose 0.002908 0.00000 test v_Transpose2 0.003267 0.00000 test v_Transpose4 0.001625 0.00000 test v_Transpose8 0.002959 0.00000 test v_pfTranspose2 0.001712 0.00000 test v_pfTranspose4 0.001617 0.00000 test v_pfTranspose8 0.003370 0.00000 test v_vTranspose4 0.000914 0.00000 test v_vTranspose4np 0.001253 0.00000 test v_vTranspose4ntw 0.007542 0.00000 test v_vTranspose4x8ntw 0.003071 0.00000 test v_vTranspose4x16ntw 0.000879 0.00000 test v_vpfTranspose8x4ntw 0.007471 0.00000 test v_avxTranspose4x8ntw 0.003083 0.00000 test v_avxTranspose4x16ntw 0.000727 0.00000 test v_avxTranspose8x4ntw 0.007655 0.00000 test v_avxTranspose8x8ntw_a 0.002489 0.00000 test v_avxTranspose8x8ntw_b 0.002885 0.00000 test v_avxTranspose4x16ntw 0.000727 0.00000 choice FPU opt folding 0.002209 0.00000 test AK SSE folding 0.000485 0.00000 test BH SSE folding 0.000473 0.00000 test JS AVX_a folding 0.000419 0.00000 test JS_AVX_b folding 0.000544 0.00000 test JS AVX_a folding 0.000419 0.00000 choice Test duration 3.13 seconds Ftst_v7 completed successfully. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Thanks for testing even though I ran out of energy before noting here there was a new version attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37595.html#msg37595. Although my attempts to improve speed actually degraded it, that indicates the kind of adjustments I made do affect speed. The compiler didn't do what I expected, I'll see if I can tell it what I want more clearly. Joe |
Stewart Send message Joined: 28 Aug 07 Posts: 4 Credit: 829,029 RAC: 0 |
i5-2500K at 4.4GHz, Win7 SP1 ========================================================= Ftst_v7_J39 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000091 0.00000 test v_vGetPowerSpectrum 0.000045 0.00000 test v_vGetPowerSpectrum2 0.000054 0.00000 test v_vGetPowerSpectrumUnrolled 0.000042 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000057 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 choice v_ChirpData 0.003719 0.00000 test fpu_ChirpData 0.009117 0.00000 test fpu_opt_ChirpData 0.003694 0.00000 test v_vChirpData_x86_64 0.043880 0.00000 test sse1_ChirpData_ak 0.005113 0.00000 test sse2_ChirpData_ak 0.004948 0.00000 test sse2_ChirpData_ak8 0.003158 0.00000 test sse3_ChirpData_ak 0.004858 0.00000 test avx_ChirpData_a 0.001557 0.00000 test avx_ChirpData_b 0.001550 0.00000 test avx_ChirpData_c 0.001571 0.00000 test avx_ChirpData_b 0.001550 0.00000 choice v_Transpose 0.002614 0.00000 test v_Transpose2 0.002679 0.00000 test v_Transpose4 0.001356 0.00000 test v_Transpose8 0.002427 0.00000 test v_pfTranspose2 0.001651 0.00000 test v_pfTranspose4 0.001478 0.00000 test v_pfTranspose8 0.002924 0.00000 test v_vTranspose4 0.000895 0.00000 test v_vTranspose4np 0.001058 0.00000 test v_vTranspose4ntw 0.006109 0.00000 test v_vTranspose4x8ntw 0.002570 0.00000 test v_vTranspose4x16ntw 0.000794 0.00000 test v_vpfTranspose8x4ntw 0.006116 0.00000 test v_avxTranspose4x8ntw 0.002528 0.00000 test v_avxTranspose4x16ntw 0.000682 0.00000 test v_avxTranspose8x4ntw 0.006140 0.00000 test v_avxTranspose8x8ntw_a 0.002067 0.00000 test v_avxTranspose8x8ntw_b 0.002383 0.00000 test v_avxTranspose4x16ntw 0.000682 0.00000 choice FPU opt folding 0.001762 0.00000 test AK SSE folding 0.000387 0.00000 test BH SSE folding 0.000377 0.00000 test JS AVX_a folding 0.000333 0.00000 test JS_AVX_b folding 0.000433 0.00000 test JS AVX_a folding 0.000333 0.00000 choice Test duration 2.55 seconds Ftst_v7 completed successfully. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Thanks all for the testing, next version (J40) is attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37635.html#msg37635. Joe |
Orioneti Send message Joined: 22 Oct 07 Posts: 21 Credit: 23,642,634 RAC: 0 |
2600k@4488Mhz w7sp1 ... Ftst_v7_J40 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000089 0.00000 test v_vGetPowerSpectrum 0.000044 0.00000 test v_vGetPowerSpectrum2 0.000053 0.00000 test v_vGetPowerSpectrumUnrolled 0.000041 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000056 0.00000 test v_avxGetPowerSpectrum 0.000036 0.00000 test v_avxGetPowerSpectrum 0.000036 0.00000 choice v_ChirpData 0.003966 0.00000 test fpu_ChirpData 0.008931 0.00000 test fpu_opt_ChirpData 0.003708 0.00000 test v_vChirpData_x86_64 0.042818 0.00000 test sse1_ChirpData_ak 0.005287 0.00000 test sse1_ChirpData_ak8e 0.004280 0.00000 test sse1_ChirpData_ak8h 0.004478 0.00000 test sse2_ChirpData_ak 0.004971 0.00000 test sse2_ChirpData_ak8 0.003352 0.00000 test sse3_ChirpData_ak 0.004857 0.00000 test sse3_ChirpData_ak8 0.003274 0.00000 test avx_ChirpData_a 0.001561 0.00000 test avx_ChirpData_b 0.001711 0.00000 test avx_ChirpData_c 0.001543 0.00000 test avx_ChirpData_c 0.001543 0.00000 choice v_Transpose 0.002278 0.00000 test v_Transpose2 0.002478 0.00000 test v_Transpose4 0.001261 0.00000 test v_Transpose8 0.002316 0.00000 test v_pfTranspose2 0.001361 0.00000 test v_pfTranspose4 0.001301 0.00000 test v_pfTranspose8 0.002334 0.00000 test v_vTranspose4 0.000737 0.00000 test v_vTranspose4np 0.000965 0.00000 test v_vTranspose4ntw 0.006016 0.00000 test v_vTranspose4x8ntw 0.002498 0.00000 test v_vTranspose4x16ntw 0.000707 0.00000 test v_vpfTranspose8x4ntw 0.005960 0.00000 test v_avxTranspose4x8ntw 0.002493 0.00000 test v_avxTranspose4x16ntw 0.000603 0.00000 test v_avxTranspose8x4ntw 0.006124 0.00000 test v_avxTranspose8x8ntw_a 0.002015 0.00000 test v_avxTranspose8x8ntw_b 0.002370 0.00000 test v_avxTranspose4x16ntw 0.000603 0.00000 choice FPU opt folding 0.001728 0.00000 test AK SSE folding 0.000380 0.00000 test BH SSE folding 0.000370 0.00000 test JS AVX_a folding 0.000328 0.00000 test JS_AVX_b folding 0.000358 0.00000 test JS AVX_a folding 0.000328 0.00000 choice Test duration 2.89 seconds Ftst_v7 completed successfully. 2600k@stock w7sp1 ... Ftst_v7_J40 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000114 0.00000 test v_vGetPowerSpectrum 0.000057 0.00000 test v_vGetPowerSpectrum2 0.000069 0.00000 test v_vGetPowerSpectrumUnrolled 0.000053 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000072 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 choice v_ChirpData 0.004156 0.00000 test fpu_ChirpData 0.011398 0.00000 test fpu_opt_ChirpData 0.004134 0.00000 test v_vChirpData_x86_64 0.054842 0.00000 test sse1_ChirpData_ak 0.006761 0.00000 test sse1_ChirpData_ak8e 0.005481 0.00000 test sse1_ChirpData_ak8h 0.005724 0.00000 test sse2_ChirpData_ak 0.006361 0.00000 test sse2_ChirpData_ak8 0.004275 0.00000 test sse3_ChirpData_ak 0.006209 0.00000 test sse3_ChirpData_ak8 0.004138 0.00000 test avx_ChirpData_a 0.001948 0.00000 test avx_ChirpData_b 0.002182 0.00000 test avx_ChirpData_c 0.001962 0.00000 test avx_ChirpData_a 0.001948 0.00000 choice v_Transpose 0.002902 0.00000 test v_Transpose2 0.003234 0.00000 test v_Transpose4 0.001617 0.00000 test v_Transpose8 0.002946 0.00000 test v_pfTranspose2 0.001693 0.00000 test v_pfTranspose4 0.001663 0.00000 test v_pfTranspose8 0.002976 0.00000 test v_vTranspose4 0.000900 0.00000 test v_vTranspose4np 0.001248 0.00000 test v_vTranspose4ntw 0.007557 0.00000 test v_vTranspose4x8ntw 0.003052 0.00000 test v_vTranspose4x16ntw 0.000882 0.00000 test v_vpfTranspose8x4ntw 0.007416 0.00000 test v_avxTranspose4x8ntw 0.003063 0.00000 test v_avxTranspose4x16ntw 0.000731 0.00000 test v_avxTranspose8x4ntw 0.007635 0.00000 test v_avxTranspose8x8ntw_a 0.002455 0.00000 test v_avxTranspose8x8ntw_b 0.002884 0.00000 test v_avxTranspose4x16ntw 0.000731 0.00000 choice FPU opt folding 0.002206 0.00000 test AK SSE folding 0.000486 0.00000 test BH SSE folding 0.000473 0.00000 test JS AVX_a folding 0.000421 0.00000 test JS_AVX_b folding 0.000458 0.00000 test JS AVX_a folding 0.000421 0.00000 choice Test duration 3.63 seconds Ftst_v7 completed successfully. |
Stewart Send message Joined: 28 Aug 07 Posts: 4 Credit: 829,029 RAC: 0 |
i5-2500K at 4.4GHz, Win7 SP1 ========================================================= Ftst_v7_J40 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000091 0.00000 test v_vGetPowerSpectrum 0.000045 0.00000 test v_vGetPowerSpectrum2 0.000054 0.00000 test v_vGetPowerSpectrumUnrolled 0.000042 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000057 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 choice v_ChirpData 0.003716 0.00000 test fpu_ChirpData 0.009133 0.00000 test fpu_opt_ChirpData 0.003701 0.00000 test v_vChirpData_x86_64 0.043767 0.00000 test sse1_ChirpData_ak 0.005115 0.00000 test sse1_ChirpData_ak8e 0.004207 0.00000 test sse1_ChirpData_ak8h 0.004308 0.00000 test sse2_ChirpData_ak 0.004971 0.00000 test sse2_ChirpData_ak8 0.003152 0.00000 test sse3_ChirpData_ak 0.004826 0.00000 test sse3_ChirpData_ak8 0.003110 0.00000 test avx_ChirpData_a 0.001555 0.00000 test avx_ChirpData_b 0.001556 0.00000 test avx_ChirpData_c 0.001572 0.00000 test avx_ChirpData_a 0.001555 0.00000 choice v_Transpose 0.002477 0.00000 test v_Transpose2 0.002631 0.00000 test v_Transpose4 0.001317 0.00000 test v_Transpose8 0.002415 0.00000 test v_pfTranspose2 0.001603 0.00000 test v_pfTranspose4 0.001496 0.00000 test v_pfTranspose8 0.002643 0.00000 test v_vTranspose4 0.000872 0.00000 test v_vTranspose4np 0.001049 0.00000 test v_vTranspose4ntw 0.006032 0.00000 test v_vTranspose4x8ntw 0.002527 0.00000 test v_vTranspose4x16ntw 0.000753 0.00000 test v_vpfTranspose8x4ntw 0.005966 0.00000 test v_avxTranspose4x8ntw 0.002506 0.00000 test v_avxTranspose4x16ntw 0.000663 0.00000 test v_avxTranspose8x4ntw 0.006207 0.00000 test v_avxTranspose8x8ntw_a 0.002072 0.00000 test v_avxTranspose8x8ntw_b 0.002374 0.00000 test v_avxTranspose4x16ntw 0.000663 0.00000 choice FPU opt folding 0.001762 0.00000 test AK SSE folding 0.000388 0.00000 test BH SSE folding 0.000377 0.00000 test JS AVX_a folding 0.000336 0.00000 test JS_AVX_b folding 0.000366 0.00000 test JS AVX_a folding 0.000336 0.00000 choice Test duration 2.94 seconds Ftst_v7 completed successfully. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Thanks for the testing. I'm at least getting a better idea of what approaches aren't effective. Another new version is attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37707.html#msg37707. Joe |
Stewart Send message Joined: 28 Aug 07 Posts: 4 Credit: 829,029 RAC: 0 |
i5-2500K at 4.4GHz, Win7 SP1 ========================================================= Ftst_v7_J43 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000091 0.00000 test v_vGetPowerSpectrum 0.000045 0.00000 test v_vGetPowerSpectrum2 0.000054 0.00000 test v_vGetPowerSpectrumUnrolled 0.000042 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000057 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 choice v_ChirpData 0.003716 0.00000 test fpu_ChirpData 0.009111 0.00000 test fpu_opt_ChirpData 0.003707 0.00000 test v_vChirpData_x86_64 0.043837 0.00000 test sse1_ChirpData_ak 0.005140 0.00000 test sse1_ChirpData_ak8e 0.004165 0.00000 test sse1_ChirpData_ak8h 0.004342 0.00000 test sse2_ChirpData_ak 0.004914 0.00000 test sse2_ChirpData_ak8 0.003157 0.00000 test sse3_ChirpData_ak 0.004859 0.00000 test sse3_ChirpData_ak8 0.003101 0.00000 test avx_ChirpData_a 0.001554 0.00000 test avx_ChirpData_b 0.001532 0.00000 test avx_ChirpData_c 0.001571 0.00000 test avx_ChirpData_b 0.001532 0.00000 choice v_Transpose 0.002620 0.00000 test v_Transpose2 0.002638 0.00000 test v_Transpose4 0.001350 0.00000 test v_Transpose8 0.002489 0.00000 test v_pfTranspose2 0.001674 0.00000 test v_pfTranspose4 0.001522 0.00000 test v_pfTranspose8 0.002960 0.00000 test v_vTranspose4 0.000899 0.00000 test v_vTranspose4np 0.001079 0.00000 test v_vTranspose4ntw 0.006218 0.00000 test v_vTranspose4x8ntw 0.002560 0.00000 test v_vTranspose4x16ntw 0.000766 0.00000 test v_vpfTranspose8x4ntw 0.006250 0.00000 test v_avxTranspose4x8ntw 0.002538 0.00000 test v_avxTranspose4x16ntw 0.000670 0.00000 test v_avxTranspose8x4ntw 0.006220 0.00000 test v_avxTranspose8x8ntw_a 0.002059 0.00000 test v_avxTranspose8x8ntw_b 0.002401 0.00000 test v_avxTranspose4x16ntw 0.000670 0.00000 choice FPU opt folding 0.001762 0.00000 test AK SSE folding 0.000391 0.00000 test BH SSE folding 0.000377 0.00000 test JS AVX_a folding 0.000337 0.00000 test JS AVX_c folding 0.000330 0.00000 test JS AVX_c folding 0.000330 0.00000 choice Test duration 2.95 seconds Ftst_v7 completed successfully. |
Orioneti Send message Joined: 22 Oct 07 Posts: 21 Credit: 23,642,634 RAC: 0 |
2600k@4488Mhz w7sp1 ... Ftst_v7_J43 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000089 0.00000 test v_vGetPowerSpectrum 0.000044 0.00000 test v_vGetPowerSpectrum2 0.000054 0.00000 test v_vGetPowerSpectrumUnrolled 0.000041 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000056 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 test v_avxGetPowerSpectrum 0.000037 0.00000 choice v_ChirpData 0.003728 0.00000 test fpu_ChirpData 0.008926 0.00000 test fpu_opt_ChirpData 0.003711 0.00000 test v_vChirpData_x86_64 0.042929 0.00000 test sse1_ChirpData_ak 0.005311 0.00000 test sse1_ChirpData_ak8e 0.004280 0.00000 test sse1_ChirpData_ak8h 0.004433 0.00000 test sse2_ChirpData_ak 0.004987 0.00000 test sse2_ChirpData_ak8 0.003345 0.00000 test sse3_ChirpData_ak 0.004840 0.00000 test sse3_ChirpData_ak8 0.003259 0.00000 test avx_ChirpData_a 0.001531 0.00000 test avx_ChirpData_b 0.001707 0.00000 test avx_ChirpData_c 0.001543 0.00000 test avx_ChirpData_a 0.001531 0.00000 choice v_Transpose 0.002286 0.00000 test v_Transpose2 0.002533 0.00000 test v_Transpose4 0.001261 0.00000 test v_Transpose8 0.002323 0.00000 test v_pfTranspose2 0.001373 0.00000 test v_pfTranspose4 0.001263 0.00000 test v_pfTranspose8 0.002634 0.00000 test v_vTranspose4 0.000748 0.00000 test v_vTranspose4np 0.000965 0.00000 test v_vTranspose4ntw 0.006022 0.00000 test v_vTranspose4x8ntw 0.002494 0.00000 test v_vTranspose4x16ntw 0.000700 0.00000 test v_vpfTranspose8x4ntw 0.005997 0.00000 test v_avxTranspose4x8ntw 0.002484 0.00000 test v_avxTranspose4x16ntw 0.000584 0.00000 test v_avxTranspose8x4ntw 0.006115 0.00000 test v_avxTranspose8x8ntw_a 0.001997 0.00000 test v_avxTranspose8x8ntw_b 0.002337 0.00000 test v_avxTranspose4x16ntw 0.000584 0.00000 choice FPU opt folding 0.001727 0.00000 test AK SSE folding 0.000383 0.00000 test BH SSE folding 0.000369 0.00000 test JS AVX_a folding 0.000329 0.00000 test JS AVX_c folding 0.000325 0.00000 test JS AVX_c folding 0.000325 0.00000 choice Test duration 2.89 seconds Ftst_v7 completed successfully. 2600k@stock w7sp1 ... Ftst_v7_J43 started. Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000114 0.00000 test v_vGetPowerSpectrum 0.000057 0.00000 test v_vGetPowerSpectrum2 0.000068 0.00000 test v_vGetPowerSpectrumUnrolled 0.000053 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000072 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 test v_avxGetPowerSpectrum 0.000047 0.00000 choice v_ChirpData 0.004162 0.00000 test fpu_ChirpData 0.011393 0.00000 test fpu_opt_ChirpData 0.004143 0.00000 test v_vChirpData_x86_64 0.055084 0.00000 test sse1_ChirpData_ak 0.006787 0.00000 test sse1_ChirpData_ak8e 0.005488 0.00000 test sse1_ChirpData_ak8h 0.005771 0.00000 test sse2_ChirpData_ak 0.006370 0.00000 test sse2_ChirpData_ak8 0.004276 0.00000 test sse3_ChirpData_ak 0.006209 0.00000 test sse3_ChirpData_ak8 0.004139 0.00000 test avx_ChirpData_a 0.001947 0.00000 test avx_ChirpData_b 0.002176 0.00000 test avx_ChirpData_c 0.001964 0.00000 test avx_ChirpData_a 0.001947 0.00000 choice v_Transpose 0.002900 0.00000 test v_Transpose2 0.003255 0.00000 test v_Transpose4 0.001623 0.00000 test v_Transpose8 0.002978 0.00000 test v_pfTranspose2 0.001684 0.00000 test v_pfTranspose4 0.001608 0.00000 test v_pfTranspose8 0.003384 0.00000 test v_vTranspose4 0.000887 0.00000 test v_vTranspose4np 0.001248 0.00000 test v_vTranspose4ntw 0.007595 0.00000 test v_vTranspose4x8ntw 0.003087 0.00000 test v_vTranspose4x16ntw 0.000878 0.00000 test v_vpfTranspose8x4ntw 0.007408 0.00000 test v_avxTranspose4x8ntw 0.003043 0.00000 test v_avxTranspose4x16ntw 0.000721 0.00000 test v_avxTranspose8x4ntw 0.007547 0.00000 test v_avxTranspose8x8ntw_a 0.002446 0.00000 test v_avxTranspose8x8ntw_b 0.002860 0.00000 test v_avxTranspose4x16ntw 0.000721 0.00000 choice FPU opt folding 0.002209 0.00000 test AK SSE folding 0.000490 0.00000 test BH SSE folding 0.000472 0.00000 test JS AVX_a folding 0.000415 0.00000 test JS AVX_c folding 0.000414 0.00000 test JS AVX_c folding 0.000414 0.00000 choice Test duration 3.63 seconds Ftst_v7 completed successfully. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Again thanks for the testing. There's another updated test version attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37870.html#msg37870. Joe |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.