AVX Extensions - Ongoing development?

Message boards : Number crunching : AVX Extensions - Ongoing development?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Orioneti

Send message
Joined: 22 Oct 07
Posts: 21
Credit: 23,642,634
RAC: 0
Finland
Message 1102374 - Posted: 1 May 2011, 7:14:55 UTC - in response to Message 1102351.  
Last modified: 1 May 2011, 7:45:32 UTC

Results from the revised test on a 2600k@4488Mhz w7sp1 ...




=========================================================
Ftst_v7_J29 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
               v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000089 0.00000  test
             v_vGetPowerSpectrum 0.000045 0.00000  test
            v_vGetPowerSpectrum2 0.000054 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000041 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000056 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  choice

                     v_ChirpData 0.003712 0.00000  test
                   fpu_ChirpData 0.008928 0.00000  test
               fpu_opt_ChirpData 0.003713 0.00000  test
             v_vChirpData_x86_64 0.042927 0.00000  test
               sse1_ChirpData_ak 0.005286 0.00000  test
               sse2_ChirpData_ak 0.004975 0.00000  test
               sse3_ChirpData_ak 0.004841 0.00000  test
                 avx_ChirpData_a 0.001872 0.90046  test
                 avx_ChirpData_b 0.001968 0.90046  test
                     v_ChirpData 0.003712 0.00000  choice

                     v_Transpose 0.002283 0.00000  test
                    v_Transpose2 0.002477 0.00000  test
                    v_Transpose4 0.001261 0.00000  test
                    v_Transpose8 0.002309 0.00000  test
                  v_pfTranspose2 0.001363 0.00000  test
                  v_pfTranspose4 0.001262 0.00000  test
                  v_pfTranspose8 0.002642 0.00000  test
                   v_vTranspose4 0.000735 0.00000  test
                 v_vTranspose4np 0.000965 0.00000  test
                v_vTranspose4ntw 0.006052 0.00000  test
              v_vTranspose4x8ntw 0.002501 0.00000  test
             v_vTranspose4x16ntw 0.000702 0.00000  test
            v_vpfTranspose8x4ntw 0.006024 0.00000  test
            v_avxTranspose8x4ntw 0.002515 0.00000  test
          v_avxTranspose8x8ntw_a 0.001991 0.00000  test
          v_avxTranspose8x8ntw_b 0.002337 0.00000  test
             v_vTranspose4x16ntw 0.000702 0.00000  choice

                 FPU opt folding 0.001727 0.00000  test
                  AK SSE folding 0.000379 0.00000  test
                  BH SSE folding 0.000369 0.00000  test
                  BH SSE folding 0.000369 0.00000  choice

                   Test duration     2.16 seconds

Ftst_v7 completed successfully.


Thought you might need this to be run on stock clocks so i dit it again (3.4Ghz) ...


=========================================================
Ftst_v7_J29 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000114 0.00000  test
             v_vGetPowerSpectrum 0.000057 0.00000  test
            v_vGetPowerSpectrum2 0.000069 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000053 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000072 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  choice

                     v_ChirpData 0.004167 0.00000  test
                   fpu_ChirpData 0.011857 0.00000  test
               fpu_opt_ChirpData 0.004137 0.00000  test
             v_vChirpData_x86_64 0.055058 0.00000  test
               sse1_ChirpData_ak 0.006802 0.00000  test
               sse2_ChirpData_ak 0.006394 0.00000  test
               sse3_ChirpData_ak 0.006221 0.00000  test
                 avx_ChirpData_a 0.002398 0.90046  test
                 avx_ChirpData_b 0.002520 0.90046  test
               fpu_opt_ChirpData 0.004137 0.00000  choice

                     v_Transpose 0.002908 0.00000  test
                    v_Transpose2 0.003244 0.00000  test
                    v_Transpose4 0.001616 0.00000  test
                    v_Transpose8 0.002958 0.00000  test
                  v_pfTranspose2 0.001697 0.00000  test
                  v_pfTranspose4 0.001614 0.00000  test
                  v_pfTranspose8 0.003371 0.00000  test
                   v_vTranspose4 0.000899 0.00000  test
                 v_vTranspose4np 0.001236 0.00000  test
                v_vTranspose4ntw 0.007451 0.00000  test
              v_vTranspose4x8ntw 0.003057 0.00000  test
             v_vTranspose4x16ntw 0.000879 0.00000  test
            v_vpfTranspose8x4ntw 0.007433 0.00000  test
            v_avxTranspose8x4ntw 0.003085 0.00000  test
          v_avxTranspose8x8ntw_a 0.002436 0.00000  test
          v_avxTranspose8x8ntw_b 0.002865 0.00000  test
             v_vTranspose4x16ntw 0.000879 0.00000  choice

                 FPU opt folding 0.002219 0.00000  test
                  AK SSE folding 0.000487 0.00000  test
                  BH SSE folding 0.000474 0.00000  test
                  BH SSE folding 0.000474 0.00000  choice

                   Test duration     2.74 seconds

Ftst_v7 completed successfully.
ID: 1102374 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1102456 - Posted: 1 May 2011, 15:03:05 UTC - in response to Message 1102374.  

Results from the revised test on a 2600k@4488Mhz w7sp1 ...
...

Thought you might need this to be run on stock clocks so i dit it again (3.4Ghz) ...

Thank you!

Obviously there's another issue with the chirp accuracy to find, so I'll have another version in a day or two.
                                                                  Joe
ID: 1102456 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1102838 - Posted: 2 May 2011, 20:49:34 UTC

Ftst_v7_J32 is attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37398.html#msg37398. Fixed a problem in the AVX chirp tests, hope that's the last one. Also added additional transpose testing, partly looking for the most effective AVX version, partly to provide data concerning how different systems react at different FFT lengths.
                                                                    Joe
ID: 1102838 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1102876 - Posted: 2 May 2011, 22:52:10 UTC - in response to Message 1102838.  

Gah! Ftst_v7_J32 is withdrawn until I figure out more problems. The chirps still aren't right though they do run, the first of the new transposes crashes on an i7 2600 w/W7 64 SP1 .
                                                                  Joe
ID: 1102876 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1103126 - Posted: 4 May 2011, 3:04:30 UTC

Ftst_v7_J34 is attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37407.html#msg37407. The new transpose functions should no longer crash, and I have high hopes the chirp functions will finally work accurately.
                                                                 Joe
ID: 1103126 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1103204 - Posted: 4 May 2011, 6:26:11 UTC

Forgive me for not understanding what you posted above. But on average how much more efficient is the chip performing with AVX? In efficient I mean faster. ;P
Traveling through space at ~67,000mph!
ID: 1103204 · Report as offensive
Orioneti

Send message
Joined: 22 Oct 07
Posts: 21
Credit: 23,642,634
RAC: 0
Finland
Message 1103257 - Posted: 4 May 2011, 12:18:44 UTC - in response to Message 1103204.  

2600k@4488Mhz w7sp1 ...


Ftst_v7_J34 started.
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000089 0.00000  test
             v_vGetPowerSpectrum 0.000044 0.00000  test
            v_vGetPowerSpectrum2 0.000053 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000041 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000056 0.00000  test
           v_avxGetPowerSpectrum 0.000036 0.00000  test
           v_avxGetPowerSpectrum 0.000036 0.00000  choice

                     v_ChirpData 0.003722 0.00000  test
                   fpu_ChirpData 0.008926 0.00000  test
               fpu_opt_ChirpData 0.003717 0.00000  test
             v_vChirpData_x86_64 0.042824 0.00000  test
               sse1_ChirpData_ak 0.005276 0.00000  test
               sse2_ChirpData_ak 0.004986 0.00000  test
              sse2_ChirpData_ak8 0.003340 0.00000  test
               sse3_ChirpData_ak 0.004876 0.00000  test
                 avx_ChirpData_a 0.001531 0.00000  test
                 avx_ChirpData_b 0.001708 0.00000  test
                 avx_ChirpData_a 0.001531 0.00000  choice

                     v_Transpose 0.002715 0.00000  test
                    v_Transpose2 0.002744 0.00000  test
                    v_Transpose4 0.001676 0.00000  test
                    v_Transpose8 0.002953 0.00000  test
                  v_pfTranspose2 0.002846 0.00000  test
                  v_pfTranspose4 0.002017 0.00000  test
                  v_pfTranspose8 0.003382 0.00000  test
                   v_vTranspose4 0.001384 0.00000  test
                 v_vTranspose4np 0.000915 0.00000  test
                v_vTranspose4ntw 0.004736 0.00000  test
              v_vTranspose4x8ntw 0.002408 0.00000  test
             v_vTranspose4x16ntw 0.000556 0.00000  test
            v_vpfTranspose8x4ntw 0.005043 0.00000  test
            v_avxTranspose4x8ntw 0.002066 0.00000  test
           v_avxTranspose4x16ntw 0.000536 0.00000  test
            v_avxTranspose8x4ntw 0.004097 6855598.48924  test
          v_avxTranspose8x8ntw_a 0.002547 0.00000  test
          v_avxTranspose8x8ntw_b 0.002697 0.00000  test
           v_avxTranspose4x16ntw 0.000536 0.00000  choice

                     v_Transpose 0.002271 0.00000  test
                    v_Transpose2 0.002476 0.00000  test
                    v_Transpose4 0.001258 0.00000  test
                    v_Transpose8 0.002341 0.00000  test
                  v_pfTranspose2 0.001355 0.00000  test
                  v_pfTranspose4 0.001307 0.00000  test
                  v_pfTranspose8 0.002332 0.00000  test
                   v_vTranspose4 0.000731 0.00000  test
                 v_vTranspose4np 0.000965 0.00000  test
                v_vTranspose4ntw 0.006015 0.00000  test
              v_vTranspose4x8ntw 0.002499 0.00000  test
             v_vTranspose4x16ntw 0.000707 0.00000  test
            v_vpfTranspose8x4ntw 0.005977 0.00000  test
            v_avxTranspose4x8ntw 0.002483 0.00000  test
           v_avxTranspose4x16ntw 0.000592 0.00000  test
            v_avxTranspose8x4ntw 0.006115 6847381.92511  test
          v_avxTranspose8x8ntw_a 0.001990 0.00000  test
          v_avxTranspose8x8ntw_b 0.002335 0.00000  test
           v_avxTranspose4x16ntw 0.000592 0.00000  choice

                 FPU opt folding 0.001727 0.00000  test
                  AK SSE folding 0.000380 0.00000  test
                  BH SSE folding 0.000369 0.00000  test
                  BH SSE folding 0.000369 0.00000  choice

                   Test duration     2.87 seconds

Ftst_v7 completed successfully.



2600k@stock w7sp1 ...


Ftst_v7_J34 started.
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000114 0.00000  test
             v_vGetPowerSpectrum 0.000057 0.00000  test
            v_vGetPowerSpectrum2 0.000069 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000053 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000072 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  choice

                     v_ChirpData 0.004160 0.00000  test
                   fpu_ChirpData 0.011392 0.00000  test
               fpu_opt_ChirpData 0.004138 0.00000  test
             v_vChirpData_x86_64 0.054736 0.00000  test
               sse1_ChirpData_ak 0.006747 0.00000  test
               sse2_ChirpData_ak 0.006364 0.00000  test
              sse2_ChirpData_ak8 0.004258 0.00000  test
               sse3_ChirpData_ak 0.006223 0.00000  test
                 avx_ChirpData_a 0.001948 0.00000  test
                 avx_ChirpData_b 0.002180 0.00000  test
                 avx_ChirpData_a 0.001948 0.00000  choice

                     v_Transpose 0.004159 0.00000  test
                    v_Transpose2 0.004275 0.00000  test
                    v_Transpose4 0.002162 0.00000  test
                    v_Transpose8 0.003819 0.00000  test
                  v_pfTranspose2 0.003754 0.00000  test
                  v_pfTranspose4 0.002643 0.00000  test
                  v_pfTranspose8 0.004395 0.00000  test
                   v_vTranspose4 0.001801 0.00000  test
                 v_vTranspose4np 0.001373 0.00000  test
                v_vTranspose4ntw 0.005843 0.00000  test
              v_vTranspose4x8ntw 0.002913 0.00000  test
             v_vTranspose4x16ntw 0.000664 0.00000  test
            v_vpfTranspose8x4ntw 0.006182 0.00000  test
            v_avxTranspose4x8ntw 0.002525 0.00000  test
           v_avxTranspose4x16ntw 0.000646 0.00000  test
            v_avxTranspose8x4ntw 0.005015 6855598.48924  test
          v_avxTranspose8x8ntw_a 0.002995 0.00000  test
          v_avxTranspose8x8ntw_b 0.003235 0.00000  test
           v_avxTranspose4x16ntw 0.000646 0.00000  choice

                     v_Transpose 0.002892 0.00000  test
                    v_Transpose2 0.003164 0.00000  test
                    v_Transpose4 0.001605 0.00000  test
                    v_Transpose8 0.002977 0.00000  test
                  v_pfTranspose2 0.001691 0.00000  test
                  v_pfTranspose4 0.001695 0.00000  test
                  v_pfTranspose8 0.003038 0.00000  test
                   v_vTranspose4 0.000954 0.00000  test
                 v_vTranspose4np 0.001273 0.00000  test
                v_vTranspose4ntw 0.007372 0.00000  test
              v_vTranspose4x8ntw 0.002954 0.00000  test
             v_vTranspose4x16ntw 0.000870 0.00000  test
            v_vpfTranspose8x4ntw 0.007294 0.00000  test
            v_avxTranspose4x8ntw 0.003084 0.00000  test
           v_avxTranspose4x16ntw 0.000721 0.00000  test
            v_avxTranspose8x4ntw 0.007651 6847381.92511  test
          v_avxTranspose8x8ntw_a 0.002454 0.00000  test
          v_avxTranspose8x8ntw_b 0.002877 0.00000  test
           v_avxTranspose4x16ntw 0.000721 0.00000  choice

                 FPU opt folding 0.002209 0.00000  test
                  AK SSE folding 0.000487 0.00000  test
                  BH SSE folding 0.000473 0.00000  test
                  BH SSE folding 0.000473 0.00000  choice

                   Test duration     3.62 seconds

Ftst_v7 completed successfully.

ID: 1103257 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21219
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1103277 - Posted: 4 May 2011, 13:39:19 UTC - in response to Message 1103257.  

OK, some nice speedups developing there.

Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1103277 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1105106 - Posted: 11 May 2011, 6:50:03 UTC - in response to Message 1103204.  

I've attached a new version of the quick test to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37554.html#msg37554. It includes a fix for the 8x4 AVX transpose, plus my first attempt at AVX folding subroutines to speed up pulse finding.


-BeNt- wrote:
Forgive me for not understanding what you posted above. But on average how much more efficient is the chip performing with AVX? In efficient I mean faster. ;P

Sorry about the delayed reply. The simplest approximation is that because 8 single precision floats can be done at once rather than 4 with SSEx, parts of the code can be made twice as fast. The various samples that Intel engineers have published range from about 1.79x to 2.53x. In terms of what I've coded, this pair shows a similar speed boost:
sse2_ChirpData_ak8  0.004258 0.00000 test
avx_ChirpData_a     0.001948 0.00000 test

As chirping accounts for less than 15% of run time, there's no miraculous speedup overall expected. I'm just trying to take advantage of the new capabilities to increase efficiency.

There are large parts of the code which are effectively limited by how fast data can be fetched and stored, AVX doesn't help there though of course newer hardware does have faster memory access. Pulse finding is another area which is more limited by processing speed than memory access, though, and it also accounts for a sizable fraction of run time. But even if I achieve a speed doubling in the folding subroutines it uses, there's a lot of surrounding logic which can't be vectorized so again the gains won't be huge overall.
                                                               Joe
ID: 1105106 · Report as offensive
Orioneti

Send message
Joined: 22 Oct 07
Posts: 21
Credit: 23,642,634
RAC: 0
Finland
Message 1105269 - Posted: 11 May 2011, 20:54:15 UTC - in response to Message 1105106.  

2600k@4488Mhz w7sp1 ...

Ftst_v7_J34 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000089 0.00000  test
             v_vGetPowerSpectrum 0.000044 0.00000  test
            v_vGetPowerSpectrum2 0.000054 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000041 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000056 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  choice

                     v_ChirpData 0.003721 0.00000  test
                   fpu_ChirpData 0.008925 0.00000  test
               fpu_opt_ChirpData 0.003715 0.00000  test
             v_vChirpData_x86_64 0.042826 0.00000  test
               sse1_ChirpData_ak 0.005291 0.00000  test
               sse2_ChirpData_ak 0.004984 0.00000  test
              sse2_ChirpData_ak8 0.003334 0.00000  test
               sse3_ChirpData_ak 0.004858 0.00000  test
                 avx_ChirpData_a 0.001532 0.00000  test
                 avx_ChirpData_b 0.001710 0.00000  test
                 avx_ChirpData_a 0.001532 0.00000  choice

                     v_Transpose 0.002279 0.00000  test
                    v_Transpose2 0.002531 0.00000  test
                    v_Transpose4 0.001259 0.00000  test
                    v_Transpose8 0.002332 0.00000  test
                  v_pfTranspose2 0.001351 0.00000  test
                  v_pfTranspose4 0.001309 0.00000  test
                  v_pfTranspose8 0.002333 0.00000  test
                   v_vTranspose4 0.000728 0.00000  test
                 v_vTranspose4np 0.000979 0.00000  test
                v_vTranspose4ntw 0.005982 0.00000  test
              v_vTranspose4x8ntw 0.002520 0.00000  test
             v_vTranspose4x16ntw 0.000708 0.00000  test
            v_vpfTranspose8x4ntw 0.005973 0.00000  test
            v_avxTranspose4x8ntw 0.002510 0.00000  test
           v_avxTranspose4x16ntw 0.000596 0.00000  test
            v_avxTranspose8x4ntw 0.006122 0.00000  test
          v_avxTranspose8x8ntw_a 0.002010 0.00000  test
          v_avxTranspose8x8ntw_b 0.002378 0.00000  test
           v_avxTranspose4x16ntw 0.000596 0.00000  choice

                 FPU opt folding 0.001725 0.00000  test
                  AK SSE folding 0.000381 0.00000  test
                  BH SSE folding 0.000369 0.00000  test
                  JS AVX folding 0.000345 0.00000  test
                  JS AVX folding 0.000345 0.00000  choice

                   Test duration     2.36 seconds

Ftst_v7 completed successfully.


2600k@stock w7sp1 ...

Ftst_v7_J34 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000114 0.00000  test
             v_vGetPowerSpectrum 0.000057 0.00000  test
            v_vGetPowerSpectrum2 0.000069 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000053 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000072 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  choice

                     v_ChirpData 0.004194 0.00000  test
                   fpu_ChirpData 0.011392 0.00000  test
               fpu_opt_ChirpData 0.004141 0.00000  test
             v_vChirpData_x86_64 0.054739 0.00000  test
               sse1_ChirpData_ak 0.006759 0.00000  test
               sse2_ChirpData_ak 0.006369 0.00000  test
              sse2_ChirpData_ak8 0.004255 0.00000  test
               sse3_ChirpData_ak 0.006217 0.00000  test
                 avx_ChirpData_a 0.001947 0.00000  test
                 avx_ChirpData_b 0.002183 0.00000  test
                 avx_ChirpData_a 0.001947 0.00000  choice

                     v_Transpose 0.002886 0.00000  test
                    v_Transpose2 0.003163 0.00000  test
                    v_Transpose4 0.001607 0.00000  test
                    v_Transpose8 0.002988 0.00000  test
                  v_pfTranspose2 0.001688 0.00000  test
                  v_pfTranspose4 0.001670 0.00000  test
                  v_pfTranspose8 0.002975 0.00000  test
                   v_vTranspose4 0.000893 0.00000  test
                 v_vTranspose4np 0.001229 0.00000  test
                v_vTranspose4ntw 0.007519 0.00000  test
              v_vTranspose4x8ntw 0.003064 0.00000  test
             v_vTranspose4x16ntw 0.000866 0.00000  test
            v_vpfTranspose8x4ntw 0.007394 0.00000  test
            v_avxTranspose4x8ntw 0.003049 0.00000  test
           v_avxTranspose4x16ntw 0.000716 0.00000  test
            v_avxTranspose8x4ntw 0.007558 0.00000  test
          v_avxTranspose8x8ntw_a 0.002434 0.00000  test
          v_avxTranspose8x8ntw_b 0.002864 0.00000  test
           v_avxTranspose4x16ntw 0.000716 0.00000  choice

                 FPU opt folding 0.002207 0.00000  test
                  AK SSE folding 0.000487 0.00000  test
                  BH SSE folding 0.000472 0.00000  test
                  JS AVX folding 0.000438 0.00000  test
                  JS AVX folding 0.000438 0.00000  choice

                   Test duration     2.96 seconds

Ftst_v7 completed successfully.

ID: 1105269 · Report as offensive
Orioneti

Send message
Joined: 22 Oct 07
Posts: 21
Credit: 23,642,634
RAC: 0
Finland
Message 1105621 - Posted: 13 May 2011, 5:21:41 UTC - in response to Message 1105269.  

2600k@4488Mhz w7sp1 ...

Ftst_v7_J39 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000089 0.00000  test
             v_vGetPowerSpectrum 0.000044 0.00000  test
            v_vGetPowerSpectrum2 0.000053 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000041 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000056 0.00000  test
           v_avxGetPowerSpectrum 0.000036 0.00000  test
           v_avxGetPowerSpectrum 0.000036 0.00000  choice

                     v_ChirpData 0.003722 0.00000  test
                   fpu_ChirpData 0.008986 0.00000  test
               fpu_opt_ChirpData 0.003729 0.00000  test
             v_vChirpData_x86_64 0.042929 0.00000  test
               sse1_ChirpData_ak 0.005384 0.00000  test
               sse2_ChirpData_ak 0.004980 0.00000  test
              sse2_ChirpData_ak8 0.003337 0.00000  test
               sse3_ChirpData_ak 0.004873 0.00000  test
                 avx_ChirpData_a 0.001528 0.00000  test
                 avx_ChirpData_b 0.001712 0.00000  test
                 avx_ChirpData_c 0.001542 0.00000  test
                 avx_ChirpData_a 0.001528 0.00000  choice

                     v_Transpose 0.002273 0.00000  test
                    v_Transpose2 0.002478 0.00000  test
                    v_Transpose4 0.001262 0.00000  test
                    v_Transpose8 0.002308 0.00000  test
                  v_pfTranspose2 0.001353 0.00000  test
                  v_pfTranspose4 0.001261 0.00000  test
                  v_pfTranspose8 0.002629 0.00000  test
                   v_vTranspose4 0.000732 0.00000  test
                 v_vTranspose4np 0.000966 0.00000  test
                v_vTranspose4ntw 0.005996 0.00000  test
              v_vTranspose4x8ntw 0.002518 0.00000  test
             v_vTranspose4x16ntw 0.000711 0.00000  test
            v_vpfTranspose8x4ntw 0.006027 0.00000  test
            v_avxTranspose4x8ntw 0.002516 0.00000  test
           v_avxTranspose4x16ntw 0.000599 0.00000  test
            v_avxTranspose8x4ntw 0.006136 0.00000  test
          v_avxTranspose8x8ntw_a 0.002010 0.00000  test
          v_avxTranspose8x8ntw_b 0.002358 0.00000  test
           v_avxTranspose4x16ntw 0.000599 0.00000  choice

                 FPU opt folding 0.001726 0.00000  test
                  AK SSE folding 0.000379 0.00000  test
                  BH SSE folding 0.000369 0.00000  test
                JS AVX_a folding 0.000327 0.00000  test
                JS_AVX_b folding 0.000426 0.00000  test
                JS AVX_a folding 0.000327 0.00000  choice

                   Test duration     2.48 seconds

Ftst_v7 completed successfully.


2600k@stock w7sp1 ...

Ftst_v7_J39 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000114 0.00000  test
             v_vGetPowerSpectrum 0.000057 0.00000  test
            v_vGetPowerSpectrum2 0.000068 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000053 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000072 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  choice

                     v_ChirpData 0.004161 0.00000  test
                   fpu_ChirpData 0.011387 0.00000  test
               fpu_opt_ChirpData 0.004181 0.00000  test
             v_vChirpData_x86_64 0.055388 0.00000  test
               sse1_ChirpData_ak 0.006907 0.00000  test
               sse2_ChirpData_ak 0.006409 0.00000  test
              sse2_ChirpData_ak8 0.004268 0.00000  test
               sse3_ChirpData_ak 0.006316 0.00000  test
                 avx_ChirpData_a 0.001956 0.00000  test
                 avx_ChirpData_b 0.002206 0.00000  test
                 avx_ChirpData_c 0.001965 0.00000  test
                 avx_ChirpData_a 0.001956 0.00000  choice

                     v_Transpose 0.002908 0.00000  test
                    v_Transpose2 0.003267 0.00000  test
                    v_Transpose4 0.001625 0.00000  test
                    v_Transpose8 0.002959 0.00000  test
                  v_pfTranspose2 0.001712 0.00000  test
                  v_pfTranspose4 0.001617 0.00000  test
                  v_pfTranspose8 0.003370 0.00000  test
                   v_vTranspose4 0.000914 0.00000  test
                 v_vTranspose4np 0.001253 0.00000  test
                v_vTranspose4ntw 0.007542 0.00000  test
              v_vTranspose4x8ntw 0.003071 0.00000  test
             v_vTranspose4x16ntw 0.000879 0.00000  test
            v_vpfTranspose8x4ntw 0.007471 0.00000  test
            v_avxTranspose4x8ntw 0.003083 0.00000  test
           v_avxTranspose4x16ntw 0.000727 0.00000  test
            v_avxTranspose8x4ntw 0.007655 0.00000  test
          v_avxTranspose8x8ntw_a 0.002489 0.00000  test
          v_avxTranspose8x8ntw_b 0.002885 0.00000  test
           v_avxTranspose4x16ntw 0.000727 0.00000  choice

                 FPU opt folding 0.002209 0.00000  test
                  AK SSE folding 0.000485 0.00000  test
                  BH SSE folding 0.000473 0.00000  test
                JS AVX_a folding 0.000419 0.00000  test
                JS_AVX_b folding 0.000544 0.00000  test
                JS AVX_a folding 0.000419 0.00000  choice

                   Test duration     3.13 seconds

Ftst_v7 completed successfully.
ID: 1105621 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1105680 - Posted: 13 May 2011, 13:12:16 UTC - in response to Message 1105621.  

Thanks for testing even though I ran out of energy before noting here there was a new version attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37595.html#msg37595.

Although my attempts to improve speed actually degraded it, that indicates the kind of adjustments I made do affect speed. The compiler didn't do what I expected, I'll see if I can tell it what I want more clearly.
                                                                  Joe
ID: 1105680 · Report as offensive
Stewart

Send message
Joined: 28 Aug 07
Posts: 4
Credit: 829,029
RAC: 0
United States
Message 1105798 - Posted: 13 May 2011, 21:01:34 UTC

i5-2500K at 4.4GHz, Win7 SP1
=========================================================
Ftst_v7_J39 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000091 0.00000  test
             v_vGetPowerSpectrum 0.000045 0.00000  test
            v_vGetPowerSpectrum2 0.000054 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000042 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000057 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  choice

                     v_ChirpData 0.003719 0.00000  test
                   fpu_ChirpData 0.009117 0.00000  test
               fpu_opt_ChirpData 0.003694 0.00000  test
             v_vChirpData_x86_64 0.043880 0.00000  test
               sse1_ChirpData_ak 0.005113 0.00000  test
               sse2_ChirpData_ak 0.004948 0.00000  test
              sse2_ChirpData_ak8 0.003158 0.00000  test
               sse3_ChirpData_ak 0.004858 0.00000  test
                 avx_ChirpData_a 0.001557 0.00000  test
                 avx_ChirpData_b 0.001550 0.00000  test
                 avx_ChirpData_c 0.001571 0.00000  test
                 avx_ChirpData_b 0.001550 0.00000  choice

                     v_Transpose 0.002614 0.00000  test
                    v_Transpose2 0.002679 0.00000  test
                    v_Transpose4 0.001356 0.00000  test
                    v_Transpose8 0.002427 0.00000  test
                  v_pfTranspose2 0.001651 0.00000  test
                  v_pfTranspose4 0.001478 0.00000  test
                  v_pfTranspose8 0.002924 0.00000  test
                   v_vTranspose4 0.000895 0.00000  test
                 v_vTranspose4np 0.001058 0.00000  test
                v_vTranspose4ntw 0.006109 0.00000  test
              v_vTranspose4x8ntw 0.002570 0.00000  test
             v_vTranspose4x16ntw 0.000794 0.00000  test
            v_vpfTranspose8x4ntw 0.006116 0.00000  test
            v_avxTranspose4x8ntw 0.002528 0.00000  test
           v_avxTranspose4x16ntw 0.000682 0.00000  test
            v_avxTranspose8x4ntw 0.006140 0.00000  test
          v_avxTranspose8x8ntw_a 0.002067 0.00000  test
          v_avxTranspose8x8ntw_b 0.002383 0.00000  test
           v_avxTranspose4x16ntw 0.000682 0.00000  choice

                 FPU opt folding 0.001762 0.00000  test
                  AK SSE folding 0.000387 0.00000  test
                  BH SSE folding 0.000377 0.00000  test
                JS AVX_a folding 0.000333 0.00000  test
                JS_AVX_b folding 0.000433 0.00000  test
                JS AVX_a folding 0.000333 0.00000  choice

                   Test duration     2.55 seconds

Ftst_v7 completed successfully.
ID: 1105798 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1106303 - Posted: 15 May 2011, 1:17:25 UTC

Thanks all for the testing, next version (J40) is attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37635.html#msg37635.
                                                                   Joe
ID: 1106303 · Report as offensive
Orioneti

Send message
Joined: 22 Oct 07
Posts: 21
Credit: 23,642,634
RAC: 0
Finland
Message 1106396 - Posted: 15 May 2011, 7:30:25 UTC - in response to Message 1106303.  

2600k@4488Mhz w7sp1 ...


Ftst_v7_J40 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000089 0.00000  test
             v_vGetPowerSpectrum 0.000044 0.00000  test
            v_vGetPowerSpectrum2 0.000053 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000041 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000056 0.00000  test
           v_avxGetPowerSpectrum 0.000036 0.00000  test
           v_avxGetPowerSpectrum 0.000036 0.00000  choice

                     v_ChirpData 0.003966 0.00000  test
                   fpu_ChirpData 0.008931 0.00000  test
               fpu_opt_ChirpData 0.003708 0.00000  test
             v_vChirpData_x86_64 0.042818 0.00000  test
               sse1_ChirpData_ak 0.005287 0.00000  test
             sse1_ChirpData_ak8e 0.004280 0.00000  test
             sse1_ChirpData_ak8h 0.004478 0.00000  test
               sse2_ChirpData_ak 0.004971 0.00000  test
              sse2_ChirpData_ak8 0.003352 0.00000  test
               sse3_ChirpData_ak 0.004857 0.00000  test
              sse3_ChirpData_ak8 0.003274 0.00000  test
                 avx_ChirpData_a 0.001561 0.00000  test
                 avx_ChirpData_b 0.001711 0.00000  test
                 avx_ChirpData_c 0.001543 0.00000  test
                 avx_ChirpData_c 0.001543 0.00000  choice

                     v_Transpose 0.002278 0.00000  test
                    v_Transpose2 0.002478 0.00000  test
                    v_Transpose4 0.001261 0.00000  test
                    v_Transpose8 0.002316 0.00000  test
                  v_pfTranspose2 0.001361 0.00000  test
                  v_pfTranspose4 0.001301 0.00000  test
                  v_pfTranspose8 0.002334 0.00000  test
                   v_vTranspose4 0.000737 0.00000  test
                 v_vTranspose4np 0.000965 0.00000  test
                v_vTranspose4ntw 0.006016 0.00000  test
              v_vTranspose4x8ntw 0.002498 0.00000  test
             v_vTranspose4x16ntw 0.000707 0.00000  test
            v_vpfTranspose8x4ntw 0.005960 0.00000  test
            v_avxTranspose4x8ntw 0.002493 0.00000  test
           v_avxTranspose4x16ntw 0.000603 0.00000  test
            v_avxTranspose8x4ntw 0.006124 0.00000  test
          v_avxTranspose8x8ntw_a 0.002015 0.00000  test
          v_avxTranspose8x8ntw_b 0.002370 0.00000  test
           v_avxTranspose4x16ntw 0.000603 0.00000  choice

                 FPU opt folding 0.001728 0.00000  test
                  AK SSE folding 0.000380 0.00000  test
                  BH SSE folding 0.000370 0.00000  test
                JS AVX_a folding 0.000328 0.00000  test
                JS_AVX_b folding 0.000358 0.00000  test
                JS AVX_a folding 0.000328 0.00000  choice

                   Test duration     2.89 seconds

Ftst_v7 completed successfully.



2600k@stock w7sp1 ...


Ftst_v7_J40 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000114 0.00000  test
             v_vGetPowerSpectrum 0.000057 0.00000  test
            v_vGetPowerSpectrum2 0.000069 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000053 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000072 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  choice

                     v_ChirpData 0.004156 0.00000  test
                   fpu_ChirpData 0.011398 0.00000  test
               fpu_opt_ChirpData 0.004134 0.00000  test
             v_vChirpData_x86_64 0.054842 0.00000  test
               sse1_ChirpData_ak 0.006761 0.00000  test
             sse1_ChirpData_ak8e 0.005481 0.00000  test
             sse1_ChirpData_ak8h 0.005724 0.00000  test
               sse2_ChirpData_ak 0.006361 0.00000  test
              sse2_ChirpData_ak8 0.004275 0.00000  test
               sse3_ChirpData_ak 0.006209 0.00000  test
              sse3_ChirpData_ak8 0.004138 0.00000  test
                 avx_ChirpData_a 0.001948 0.00000  test
                 avx_ChirpData_b 0.002182 0.00000  test
                 avx_ChirpData_c 0.001962 0.00000  test
                 avx_ChirpData_a 0.001948 0.00000  choice

                     v_Transpose 0.002902 0.00000  test
                    v_Transpose2 0.003234 0.00000  test
                    v_Transpose4 0.001617 0.00000  test
                    v_Transpose8 0.002946 0.00000  test
                  v_pfTranspose2 0.001693 0.00000  test
                  v_pfTranspose4 0.001663 0.00000  test
                  v_pfTranspose8 0.002976 0.00000  test
                   v_vTranspose4 0.000900 0.00000  test
                 v_vTranspose4np 0.001248 0.00000  test
                v_vTranspose4ntw 0.007557 0.00000  test
              v_vTranspose4x8ntw 0.003052 0.00000  test
             v_vTranspose4x16ntw 0.000882 0.00000  test
            v_vpfTranspose8x4ntw 0.007416 0.00000  test
            v_avxTranspose4x8ntw 0.003063 0.00000  test
           v_avxTranspose4x16ntw 0.000731 0.00000  test
            v_avxTranspose8x4ntw 0.007635 0.00000  test
          v_avxTranspose8x8ntw_a 0.002455 0.00000  test
          v_avxTranspose8x8ntw_b 0.002884 0.00000  test
           v_avxTranspose4x16ntw 0.000731 0.00000  choice

                 FPU opt folding 0.002206 0.00000  test
                  AK SSE folding 0.000486 0.00000  test
                  BH SSE folding 0.000473 0.00000  test
                JS AVX_a folding 0.000421 0.00000  test
                JS_AVX_b folding 0.000458 0.00000  test
                JS AVX_a folding 0.000421 0.00000  choice

                   Test duration     3.63 seconds

Ftst_v7 completed successfully.
ID: 1106396 · Report as offensive
Stewart

Send message
Joined: 28 Aug 07
Posts: 4
Credit: 829,029
RAC: 0
United States
Message 1106898 - Posted: 16 May 2011, 23:29:32 UTC

i5-2500K at 4.4GHz, Win7 SP1
=========================================================
Ftst_v7_J40 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000091 0.00000  test
             v_vGetPowerSpectrum 0.000045 0.00000  test
            v_vGetPowerSpectrum2 0.000054 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000042 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000057 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  choice

                     v_ChirpData 0.003716 0.00000  test
                   fpu_ChirpData 0.009133 0.00000  test
               fpu_opt_ChirpData 0.003701 0.00000  test
             v_vChirpData_x86_64 0.043767 0.00000  test
               sse1_ChirpData_ak 0.005115 0.00000  test
             sse1_ChirpData_ak8e 0.004207 0.00000  test
             sse1_ChirpData_ak8h 0.004308 0.00000  test
               sse2_ChirpData_ak 0.004971 0.00000  test
              sse2_ChirpData_ak8 0.003152 0.00000  test
               sse3_ChirpData_ak 0.004826 0.00000  test
              sse3_ChirpData_ak8 0.003110 0.00000  test
                 avx_ChirpData_a 0.001555 0.00000  test
                 avx_ChirpData_b 0.001556 0.00000  test
                 avx_ChirpData_c 0.001572 0.00000  test
                 avx_ChirpData_a 0.001555 0.00000  choice

                     v_Transpose 0.002477 0.00000  test
                    v_Transpose2 0.002631 0.00000  test
                    v_Transpose4 0.001317 0.00000  test
                    v_Transpose8 0.002415 0.00000  test
                  v_pfTranspose2 0.001603 0.00000  test
                  v_pfTranspose4 0.001496 0.00000  test
                  v_pfTranspose8 0.002643 0.00000  test
                   v_vTranspose4 0.000872 0.00000  test
                 v_vTranspose4np 0.001049 0.00000  test
                v_vTranspose4ntw 0.006032 0.00000  test
              v_vTranspose4x8ntw 0.002527 0.00000  test
             v_vTranspose4x16ntw 0.000753 0.00000  test
            v_vpfTranspose8x4ntw 0.005966 0.00000  test
            v_avxTranspose4x8ntw 0.002506 0.00000  test
           v_avxTranspose4x16ntw 0.000663 0.00000  test
            v_avxTranspose8x4ntw 0.006207 0.00000  test
          v_avxTranspose8x8ntw_a 0.002072 0.00000  test
          v_avxTranspose8x8ntw_b 0.002374 0.00000  test
           v_avxTranspose4x16ntw 0.000663 0.00000  choice

                 FPU opt folding 0.001762 0.00000  test
                  AK SSE folding 0.000388 0.00000  test
                  BH SSE folding 0.000377 0.00000  test
                JS AVX_a folding 0.000336 0.00000  test
                JS_AVX_b folding 0.000366 0.00000  test
                JS AVX_a folding 0.000336 0.00000  choice

                   Test duration     2.94 seconds

Ftst_v7 completed successfully.
ID: 1106898 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1107138 - Posted: 18 May 2011, 1:28:04 UTC

Thanks for the testing. I'm at least getting a better idea of what approaches aren't effective. Another new version is attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37707.html#msg37707.
                                                                  Joe
ID: 1107138 · Report as offensive
Stewart

Send message
Joined: 28 Aug 07
Posts: 4
Credit: 829,029
RAC: 0
United States
Message 1107247 - Posted: 18 May 2011, 10:35:05 UTC

i5-2500K at 4.4GHz, Win7 SP1
=========================================================
Ftst_v7_J43 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000091 0.00000  test
             v_vGetPowerSpectrum 0.000045 0.00000  test
            v_vGetPowerSpectrum2 0.000054 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000042 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000057 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  choice

                     v_ChirpData 0.003716 0.00000  test
                   fpu_ChirpData 0.009111 0.00000  test
               fpu_opt_ChirpData 0.003707 0.00000  test
             v_vChirpData_x86_64 0.043837 0.00000  test
               sse1_ChirpData_ak 0.005140 0.00000  test
             sse1_ChirpData_ak8e 0.004165 0.00000  test
             sse1_ChirpData_ak8h 0.004342 0.00000  test
               sse2_ChirpData_ak 0.004914 0.00000  test
              sse2_ChirpData_ak8 0.003157 0.00000  test
               sse3_ChirpData_ak 0.004859 0.00000  test
              sse3_ChirpData_ak8 0.003101 0.00000  test
                 avx_ChirpData_a 0.001554 0.00000  test
                 avx_ChirpData_b 0.001532 0.00000  test
                 avx_ChirpData_c 0.001571 0.00000  test
                 avx_ChirpData_b 0.001532 0.00000  choice

                     v_Transpose 0.002620 0.00000  test
                    v_Transpose2 0.002638 0.00000  test
                    v_Transpose4 0.001350 0.00000  test
                    v_Transpose8 0.002489 0.00000  test
                  v_pfTranspose2 0.001674 0.00000  test
                  v_pfTranspose4 0.001522 0.00000  test
                  v_pfTranspose8 0.002960 0.00000  test
                   v_vTranspose4 0.000899 0.00000  test
                 v_vTranspose4np 0.001079 0.00000  test
                v_vTranspose4ntw 0.006218 0.00000  test
              v_vTranspose4x8ntw 0.002560 0.00000  test
             v_vTranspose4x16ntw 0.000766 0.00000  test
            v_vpfTranspose8x4ntw 0.006250 0.00000  test
            v_avxTranspose4x8ntw 0.002538 0.00000  test
           v_avxTranspose4x16ntw 0.000670 0.00000  test
            v_avxTranspose8x4ntw 0.006220 0.00000  test
          v_avxTranspose8x8ntw_a 0.002059 0.00000  test
          v_avxTranspose8x8ntw_b 0.002401 0.00000  test
           v_avxTranspose4x16ntw 0.000670 0.00000  choice

                 FPU opt folding 0.001762 0.00000  test
                  AK SSE folding 0.000391 0.00000  test
                  BH SSE folding 0.000377 0.00000  test
                JS AVX_a folding 0.000337 0.00000  test
                JS AVX_c folding 0.000330 0.00000  test
                JS AVX_c folding 0.000330 0.00000  choice

                   Test duration     2.95 seconds

Ftst_v7 completed successfully.
ID: 1107247 · Report as offensive
Orioneti

Send message
Joined: 22 Oct 07
Posts: 21
Credit: 23,642,634
RAC: 0
Finland
Message 1107277 - Posted: 18 May 2011, 14:51:41 UTC - in response to Message 1107138.  

2600k@4488Mhz w7sp1 ...

Ftst_v7_J43 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000089 0.00000  test
             v_vGetPowerSpectrum 0.000044 0.00000  test
            v_vGetPowerSpectrum2 0.000054 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000041 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000056 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  test
           v_avxGetPowerSpectrum 0.000037 0.00000  choice

                     v_ChirpData 0.003728 0.00000  test
                   fpu_ChirpData 0.008926 0.00000  test
               fpu_opt_ChirpData 0.003711 0.00000  test
             v_vChirpData_x86_64 0.042929 0.00000  test
               sse1_ChirpData_ak 0.005311 0.00000  test
             sse1_ChirpData_ak8e 0.004280 0.00000  test
             sse1_ChirpData_ak8h 0.004433 0.00000  test
               sse2_ChirpData_ak 0.004987 0.00000  test
              sse2_ChirpData_ak8 0.003345 0.00000  test
               sse3_ChirpData_ak 0.004840 0.00000  test
              sse3_ChirpData_ak8 0.003259 0.00000  test
                 avx_ChirpData_a 0.001531 0.00000  test
                 avx_ChirpData_b 0.001707 0.00000  test
                 avx_ChirpData_c 0.001543 0.00000  test
                 avx_ChirpData_a 0.001531 0.00000  choice

                     v_Transpose 0.002286 0.00000  test
                    v_Transpose2 0.002533 0.00000  test
                    v_Transpose4 0.001261 0.00000  test
                    v_Transpose8 0.002323 0.00000  test
                  v_pfTranspose2 0.001373 0.00000  test
                  v_pfTranspose4 0.001263 0.00000  test
                  v_pfTranspose8 0.002634 0.00000  test
                   v_vTranspose4 0.000748 0.00000  test
                 v_vTranspose4np 0.000965 0.00000  test
                v_vTranspose4ntw 0.006022 0.00000  test
              v_vTranspose4x8ntw 0.002494 0.00000  test
             v_vTranspose4x16ntw 0.000700 0.00000  test
            v_vpfTranspose8x4ntw 0.005997 0.00000  test
            v_avxTranspose4x8ntw 0.002484 0.00000  test
           v_avxTranspose4x16ntw 0.000584 0.00000  test
            v_avxTranspose8x4ntw 0.006115 0.00000  test
          v_avxTranspose8x8ntw_a 0.001997 0.00000  test
          v_avxTranspose8x8ntw_b 0.002337 0.00000  test
           v_avxTranspose4x16ntw 0.000584 0.00000  choice

                 FPU opt folding 0.001727 0.00000  test
                  AK SSE folding 0.000383 0.00000  test
                  BH SSE folding 0.000369 0.00000  test
                JS AVX_a folding 0.000329 0.00000  test
                JS AVX_c folding 0.000325 0.00000  test
                JS AVX_c folding 0.000325 0.00000  choice

                   Test duration     2.89 seconds

Ftst_v7 completed successfully.



2600k@stock w7sp1 ...


Ftst_v7_J43 started.

Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000114 0.00000  test
             v_vGetPowerSpectrum 0.000057 0.00000  test
            v_vGetPowerSpectrum2 0.000068 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000053 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000072 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  test
           v_avxGetPowerSpectrum 0.000047 0.00000  choice

                     v_ChirpData 0.004162 0.00000  test
                   fpu_ChirpData 0.011393 0.00000  test
               fpu_opt_ChirpData 0.004143 0.00000  test
             v_vChirpData_x86_64 0.055084 0.00000  test
               sse1_ChirpData_ak 0.006787 0.00000  test
             sse1_ChirpData_ak8e 0.005488 0.00000  test
             sse1_ChirpData_ak8h 0.005771 0.00000  test
               sse2_ChirpData_ak 0.006370 0.00000  test
              sse2_ChirpData_ak8 0.004276 0.00000  test
               sse3_ChirpData_ak 0.006209 0.00000  test
              sse3_ChirpData_ak8 0.004139 0.00000  test
                 avx_ChirpData_a 0.001947 0.00000  test
                 avx_ChirpData_b 0.002176 0.00000  test
                 avx_ChirpData_c 0.001964 0.00000  test
                 avx_ChirpData_a 0.001947 0.00000  choice

                     v_Transpose 0.002900 0.00000  test
                    v_Transpose2 0.003255 0.00000  test
                    v_Transpose4 0.001623 0.00000  test
                    v_Transpose8 0.002978 0.00000  test
                  v_pfTranspose2 0.001684 0.00000  test
                  v_pfTranspose4 0.001608 0.00000  test
                  v_pfTranspose8 0.003384 0.00000  test
                   v_vTranspose4 0.000887 0.00000  test
                 v_vTranspose4np 0.001248 0.00000  test
                v_vTranspose4ntw 0.007595 0.00000  test
              v_vTranspose4x8ntw 0.003087 0.00000  test
             v_vTranspose4x16ntw 0.000878 0.00000  test
            v_vpfTranspose8x4ntw 0.007408 0.00000  test
            v_avxTranspose4x8ntw 0.003043 0.00000  test
           v_avxTranspose4x16ntw 0.000721 0.00000  test
            v_avxTranspose8x4ntw 0.007547 0.00000  test
          v_avxTranspose8x8ntw_a 0.002446 0.00000  test
          v_avxTranspose8x8ntw_b 0.002860 0.00000  test
           v_avxTranspose4x16ntw 0.000721 0.00000  choice

                 FPU opt folding 0.002209 0.00000  test
                  AK SSE folding 0.000490 0.00000  test
                  BH SSE folding 0.000472 0.00000  test
                JS AVX_a folding 0.000415 0.00000  test
                JS AVX_c folding 0.000414 0.00000  test
                JS AVX_c folding 0.000414 0.00000  choice

                   Test duration     3.63 seconds

Ftst_v7 completed successfully.
ID: 1107277 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1108596 - Posted: 22 May 2011, 6:38:34 UTC

Again thanks for the testing. There's another updated test version attached to http://lunatics.kwsn.net/1-discussion-forum/avx-optimized-app-development.msg37870.html#msg37870.
                                                                  Joe
ID: 1108596 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : AVX Extensions - Ongoing development?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.