Message boards :
Number crunching :
SIMD-level tailored Windows FFTW 3.3.4 DLL builds testing
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
There are few alternative FFTW DLL's binaries available built by Marco Franceschini Link at my fftw 3.3.4 versions.... Would be good to collect some speedup info regarding these binaries here. |
Ulrich Metzner ![]() Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 ![]() ![]() |
The new thread is a good idea! I cannot contribute benchmark results, but i'm successfully using Marcos libraries since they were available. There were no errors and no invalids since using them, although a lot of "timed out" will occur in the end of February, cause the first attempt on SSE 4.1 had a compile fault, which trashed ~150 Seti V8 WUs in just one eye blink on my main rig Core2Quad. ;) Aloha, Uli |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34504 Credit: 79,922,639 RAC: 80 ![]() ![]() |
I wanted to test it anyways. Maybe on weekend when i feel a little bit better again. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
APU idle CPU: Number of processors 1 Number of cores 4 (max 4) Specification AMD A10-5700 APU with Radeon(tm) HD Graphics Codename Trinity Core Stepping TN-A1 Technology 32 nm Stock frequency 3400 MHz ------------ Chipset: Northbridge AMD A55/A60M FCH rev. 00 Southbridge AMD A55/A60M rev. 11 ------------ RAM: Memory Type DDR3 Memory Size 8192 MBytes Max bandwidth PC3-10700 (667 MHz) CAS# latency (CL) 9.0 RAS# Precharge (tRP) 9 Cycle Time (tRAS) 24 ------------ OS: Windows Version Microsoft Windows Server 2008 64-bit Service Pack 2 (Build 6002) default run (both ref and test with default FFTW lib): WU : PG0009_v8.wu MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog : Elapsed 433.276 secs CPU 432.123 secs MB8_win_x64_SSE3_VS2008_r3330.exe : Elapsed 427.865 secs, speedup: 1.25% ratio: 1.01x CPU 428.691 secs, speedup: 0.79% ratio: 1.01x WU : PG0395_v8.wu MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog : Elapsed 468.985 secs CPU 470.125 secs MB8_win_x64_SSE3_VS2008_r3330.exe : Elapsed 467.368 secs, speedup: 0.34% ratio: 1.00x CPU 466.802 secs, speedup: 0.71% ratio: 1.01x WU : PG0444_v8.wu MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog : Elapsed 443.546 secs CPU 444.119 secs MB8_win_x64_SSE3_VS2008_r3330.exe : Elapsed 446.022 secs, speedup: -0.56% ratio: 0.99x CPU 445.991 secs, speedup: -0.42% ratio: 1.00x WU : PG1327_v8.wu MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog : Elapsed 440.019 secs CPU 440.531 secs MB8_win_x64_SSE3_VS2008_r3330.exe : Elapsed 440.230 secs, speedup: -0.05% ratio: 1.00x CPU 439.720 secs, speedup: 0.18% ratio: 1.00x ------------ test with SSSE3 x64 FFTW lib: WU : PG0009_v8.wu MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog : Elapsed 433.276 secs CPU 432.123 secs MB8_win_x64_SSE3_VS2008_r3330.exe : Elapsed 425.595 secs, speedup: 1.77% ratio: 1.02x CPU 424.354 secs, speedup: 1.80% ratio: 1.02x WU : PG0395_v8.wu MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog : Elapsed 468.985 secs CPU 470.125 secs MB8_win_x64_SSE3_VS2008_r3330.exe : Elapsed 465.732 secs, speedup: 0.69% ratio: 1.01x CPU 466.022 secs, speedup: 0.87% ratio: 1.01x WU : PG0444_v8.wu MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog : Elapsed 443.546 secs CPU 444.119 secs MB8_win_x64_SSE3_VS2008_r3330.exe : Elapsed 438.575 secs, speedup: 1.12% ratio: 1.01x CPU 437.052 secs, speedup: 1.59% ratio: 1.02x WU : PG1327_v8.wu MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog : Elapsed 440.019 secs CPU 440.531 secs MB8_win_x64_SSE3_VS2008_r3330.exe : Elapsed 432.328 secs, speedup: 1.75% ratio: 1.02x CPU 431.639 secs, speedup: 2.02% ratio: 1.02x AVX and AVX2 failed on this host due to incompatibility. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 ![]() ![]() |
Where would one get the test wu's and method you'd like us to use? Thanks, Chris |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 ![]() ![]() |
And I assume there is no way to try this on Macs? Are the fft functions built into the app for those? Thanks, Chris |
Marco Franceschini ![]() Send message Joined: 4 Jul 01 Posts: 54 Credit: 69,877,354 RAC: 135 ![]() ![]() |
Hi.... My own fftw 3.3.4 was cross-compiled under Ubuntu with gcc 5.2 and maximum optimization switch (i.e -O3). AVX and AVX2 version with --enable-fma to enable the fuse-multiply&add hardware in Haswell and above architecture. More speedup above the standard version made by Frigo could be achieved only with Intel compilers (used in native mode under Windows o.s). Marco. |
Marco Franceschini ![]() Send message Joined: 4 Jul 01 Posts: 54 Credit: 69,877,354 RAC: 135 ![]() ![]() |
And I assume there is no way to try this on Macs? Are the fft functions built into the app for those? Hi Chris. FFTW library can be compiled under OSX too http://dasher.wustl.edu/ffe/distribution/fftw/0README Marco. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 ![]() ![]() |
Great, thanks. The only other oddity is that the Mac apps don't seem to use these, at least it's not called out in the app_info file, nor existing version I'm the seti folder. I'll start with making Mac versions though.:) Thanks, Chris |
Urs Echternacht ![]() Send message Joined: 15 May 99 Posts: 692 Credit: 135,197,781 RAC: 211 ![]() ![]() |
Great, thanks. The only other oddity is that the Mac apps don't seem to use these, at least it's not called out in the app_info file, nor existing version I'm the seti folder. I'll start with making Mac versions though.:)You can't find the fftw-lib for OSX because it is contained in the executable (linked in). Same applies to other *nix platforms, too. _\|/_ U r s |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Where would one get the test wu's and method you'd like us to use? http://lunatics.kwsn.info/index.php?action=downloads;cat=5 |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 ![]() ![]() |
You can't find the fftw-lib for OSX because it is contained in the executable (linked in). Same applies to other *nix platforms, too. OK, that's what I thought. Thanks, Chris |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Didn't have much luck with the alternative FFTW libraries. I was running the AVX DLL on my AMD chip for about 3 hours with seeming good luck until I noticed that the majority of tasks were getting the "postponed because of impossible autocorr results" so I thought that I had better try one of the other alternatives. I tried the SSE41 DLL but unfortunately it trashed all 100 of my CPU tasks. To make matters worse, a few minutes later the SETI web servers went kaput and I couldn't get a replacement scheduler master fetch list. So I have been able to only crunch GPU work for the last half a day. Now that the web servers have come back online, I still haven't been able to get any MB8 work because of the errored tasks penalty. I reverted back to stock DLL and have been able to pick up 1 MB7 task so at least 1 core of 4 is working again. It looks like the alternative _X86 DLL is working fine for the AP7 OpenCL tasks though. I don't know why things blew up. I never got a chance to try out the SSSE3 DLL because I chickened out and reverted to stock. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34504 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Here is a bench on my FX CPU. All 3 tasks were running parallel. Stock lib. WU : new_reference.wu setiathome_8.00_windows_intelx86.exe : Elapsed 3614.091 secs CPU 3601.985 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 1889.470 secs, speedup: 47.72% ratio: 1.91x CPU 1884.695 secs, speedup: 47.68% ratio: 1.91x WU : PG0009_v7.wu setiathome_8.00_windows_intelx86.exe : Elapsed 513.062 secs CPU 509.296 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 338.079 secs, speedup: 34.11% ratio: 1.52x CPU 335.605 secs, speedup: 34.10% ratio: 1.52x WU : PG0395_v7.wu setiathome_8.00_windows_intelx86.exe : Elapsed 681.873 secs CPU 678.214 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 370.011 secs, speedup: 45.74% ratio: 1.84x CPU 367.554 secs, speedup: 45.81% ratio: 1.85x WU : PG0444_v7.wu setiathome_8.00_windows_intelx86.exe : Elapsed 627.424 secs CPU 623.801 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 349.449 secs, speedup: 44.30% ratio: 1.80x CPU 346.868 secs, speedup: 44.39% ratio: 1.80x WU : PG1327_v7.wu setiathome_8.00_windows_intelx86.exe : Elapsed 503.851 secs CPU 499.843 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 387.807 secs, speedup: 23.03% ratio: 1.30x CPU 384.496 secs, speedup: 23.08% ratio: 1.30x AVX.lib WU : new_reference.wu setiathome_8.00_windows_intelx86.exe -verb -nog : Elapsed 3548.817 secs CPU 3543.937 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 1866.859 secs, speedup: 47.39% ratio: 1.90x CPU 1862.278 secs, speedup: 47.45% ratio: 1.90x WU : PG0009_v7.wu setiathome_8.00_windows_intelx86.exe -verb -nog : Elapsed 513.062 secs CPU 509.296 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 343.498 secs, speedup: 33.05% ratio: 1.49x CPU 341.034 secs, speedup: 33.04% ratio: 1.49x WU : PG0395_v7.wu setiathome_8.00_windows_intelx86.exe -verb -nog : Elapsed 681.873 secs CPU 678.214 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 367.312 secs, speedup: 46.13% ratio: 1.86x CPU 364.824 secs, speedup: 46.21% ratio: 1.86x WU : PG0444_v7.wu setiathome_8.00_windows_intelx86.exe -verb -nog : Elapsed 627.424 secs CPU 623.801 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 349.259 secs, speedup: 44.33% ratio: 1.80x CPU 346.712 secs, speedup: 44.42% ratio: 1.80x WU : PG1327_v7.wu setiathome_8.00_windows_intelx86.exe -verb -nog : Elapsed 503.851 secs CPU 499.843 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 343.014 secs, speedup: 31.92% ratio: 1.47x CPU 340.363 secs, speedup: 31.91% ratio: 1.47x SSSE3.lib WU : new_reference.wu setiathome_8.00_windows_intelx86.exe : Elapsed 3614.091 secs CPU 3601.985 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 1899.191 secs, speedup: 47.45% ratio: 1.90x CPU 1894.398 secs, speedup: 47.41% ratio: 1.90x WU : PG0009_v7.wu setiathome_8.00_windows_intelx86.exe : Elapsed 513.062 secs CPU 509.296 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 347.895 secs, speedup: 32.19% ratio: 1.47x CPU 345.371 secs, speedup: 32.19% ratio: 1.47x WU : PG0395_v7.wu setiathome_8.00_windows_intelx86.exe : Elapsed 681.873 secs CPU 678.214 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 378.366 secs, speedup: 44.51% ratio: 1.80x CPU 375.775 secs, speedup: 44.59% ratio: 1.80x WU : PG0444_v7.wu setiathome_8.00_windows_intelx86.exe : Elapsed 627.424 secs CPU 623.801 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 358.406 secs, speedup: 42.88% ratio: 1.75x CPU 355.791 secs, speedup: 42.96% ratio: 1.75x WU : PG1327_v7.wu setiathome_8.00_windows_intelx86.exe : Elapsed 503.851 secs CPU 499.843 secs MB8_win_x64_AVX_VS2010_r3330.exe : Elapsed 388.962 secs, speedup: 22.80% ratio: 1.30x CPU 385.900 secs, speedup: 22.80% ratio: 1.30x With each crime and every kindness we birth our future. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Thanks for the posted results for both the AVX and SSE3 apps, Mike. Seems like the AVX one has the advantage with the FX processor. I have recovered from my stumbles yesterday morning and have MB8 work again. Also running the AVX app and DLL now with no apparent issues. Think my Autocorr issues were because of too high clocked FSB and not high enough load line calibration settings to compensate for Vdroop when the CPU was loaded. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.