SIMD-level tailored Windows FFTW 3.3.4 DLL builds testing

Message boards : Number crunching : SIMD-level tailored Windows FFTW 3.3.4 DLL builds testing
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1766034 - Posted: 18 Feb 2016, 14:12:03 UTC

There are few alternative FFTW DLL's binaries available built by Marco Franceschini

Link at my fftw 3.3.4 versions....

https://drive.google.com/open?id=0B9iU4E_jpim0X3ZyQmo3dUUxZFE

The three version named avx,avx2,ssse3 must be renamed to libfftw3f-3-3-4_x64.
Erase any old .wisdom file and standard fftw dll library.
The avx2 version was compiled with --enable-fma switch (e.g Haswell fma3 and upper).
Personally i used optimized libfftw3f-3-3-4_x86 for OpenCL gpu apps too.

Marco.


your link requires permission granting for access.


Try this...


https://drive.google.com/folderview?id=0B9iU4E_jpim0X3ZyQmo3dUUxZFE&usp=sharing


Would be good to collect some speedup info regarding these binaries here.
ID: 1766034 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1766048 - Posted: 18 Feb 2016, 15:16:27 UTC

The new thread is a good idea!

I cannot contribute benchmark results, but i'm successfully using Marcos libraries since they were available. There were no errors and no invalids since using them, although a lot of "timed out" will occur in the end of February, cause the first attempt on SSE 4.1 had a compile fault, which trashed ~150 Seti V8 WUs in just one eye blink on my main rig Core2Quad. ;)
Aloha, Uli

ID: 1766048 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34502
Credit: 79,922,639
RAC: 80
Germany
Message 1766049 - Posted: 18 Feb 2016, 15:16:57 UTC

I wanted to test it anyways.
Maybe on weekend when i feel a little bit better again.
With each crime and every kindness we birth our future.
ID: 1766049 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1766060 - Posted: 18 Feb 2016, 16:54:31 UTC
Last modified: 18 Feb 2016, 16:59:34 UTC

APU idle
CPU:
Number of processors 1
Number of cores 4 (max 4)
Specification AMD A10-5700 APU with Radeon(tm) HD Graphics
Codename Trinity
Core Stepping TN-A1
Technology 32 nm
Stock frequency 3400 MHz
------------
Chipset:
Northbridge AMD A55/A60M FCH rev. 00
Southbridge AMD A55/A60M rev. 11
------------
RAM:
Memory Type DDR3
Memory Size 8192 MBytes
Max bandwidth PC3-10700 (667 MHz)
CAS# latency (CL) 9.0
RAS# Precharge (tRP) 9
Cycle Time (tRAS) 24
------------
OS:
Windows Version Microsoft Windows Server 2008 64-bit Service Pack 2 (Build 6002)

default run (both ref and test with default FFTW lib):

WU : PG0009_v8.wu
MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog :
Elapsed 433.276 secs
CPU 432.123 secs
MB8_win_x64_SSE3_VS2008_r3330.exe :
Elapsed 427.865 secs, speedup: 1.25% ratio: 1.01x
CPU 428.691 secs, speedup: 0.79% ratio: 1.01x

WU : PG0395_v8.wu
MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog :
Elapsed 468.985 secs
CPU 470.125 secs
MB8_win_x64_SSE3_VS2008_r3330.exe :
Elapsed 467.368 secs, speedup: 0.34% ratio: 1.00x
CPU 466.802 secs, speedup: 0.71% ratio: 1.01x

WU : PG0444_v8.wu
MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog :
Elapsed 443.546 secs
CPU 444.119 secs
MB8_win_x64_SSE3_VS2008_r3330.exe :
Elapsed 446.022 secs, speedup: -0.56% ratio: 0.99x
CPU 445.991 secs, speedup: -0.42% ratio: 1.00x

WU : PG1327_v8.wu
MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog :
Elapsed 440.019 secs
CPU 440.531 secs
MB8_win_x64_SSE3_VS2008_r3330.exe :
Elapsed 440.230 secs, speedup: -0.05% ratio: 1.00x
CPU 439.720 secs, speedup: 0.18% ratio: 1.00x

------------

test with SSSE3 x64 FFTW lib:

WU : PG0009_v8.wu
MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog :
Elapsed 433.276 secs
CPU 432.123 secs
MB8_win_x64_SSE3_VS2008_r3330.exe :
Elapsed 425.595 secs, speedup: 1.77% ratio: 1.02x
CPU 424.354 secs, speedup: 1.80% ratio: 1.02x

WU : PG0395_v8.wu
MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog :
Elapsed 468.985 secs
CPU 470.125 secs
MB8_win_x64_SSE3_VS2008_r3330.exe :
Elapsed 465.732 secs, speedup: 0.69% ratio: 1.01x
CPU 466.022 secs, speedup: 0.87% ratio: 1.01x

WU : PG0444_v8.wu
MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog :
Elapsed 443.546 secs
CPU 444.119 secs
MB8_win_x64_SSE3_VS2008_r3330.exe :
Elapsed 438.575 secs, speedup: 1.12% ratio: 1.01x
CPU 437.052 secs, speedup: 1.59% ratio: 1.02x

WU : PG1327_v8.wu
MB8_win_x64_SSE3_VS2008_r3330.exe -verb -nog :
Elapsed 440.019 secs
CPU 440.531 secs
MB8_win_x64_SSE3_VS2008_r3330.exe :
Elapsed 432.328 secs, speedup: 1.75% ratio: 1.02x
CPU 431.639 secs, speedup: 2.02% ratio: 1.02x


AVX and AVX2 failed on this host due to incompatibility.
ID: 1766060 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1766078 - Posted: 18 Feb 2016, 18:23:06 UTC - in response to Message 1766060.  

Where would one get the test wu's and method you'd like us to use?

Thanks,

Chris
ID: 1766078 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1766081 - Posted: 18 Feb 2016, 18:40:51 UTC - in response to Message 1766078.  

And I assume there is no way to try this on Macs? Are the fft functions built into the app for those?

Thanks,

Chris
ID: 1766081 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1766090 - Posted: 18 Feb 2016, 19:10:41 UTC

Hi....
My own fftw 3.3.4 was cross-compiled under Ubuntu with gcc 5.2 and maximum optimization switch (i.e -O3).
AVX and AVX2 version with --enable-fma to enable the fuse-multiply&add hardware in Haswell and above architecture.
More speedup above the standard version made by Frigo could be achieved only with Intel compilers (used in native mode under Windows o.s).

Marco.
ID: 1766090 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1766092 - Posted: 18 Feb 2016, 19:16:23 UTC - in response to Message 1766081.  

And I assume there is no way to try this on Macs? Are the fft functions built into the app for those?

Thanks,

Chris



Hi Chris.
FFTW library can be compiled under OSX too

http://dasher.wustl.edu/ffe/distribution/fftw/0README

Marco.
ID: 1766092 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1766102 - Posted: 18 Feb 2016, 20:08:04 UTC - in response to Message 1766092.  

Great, thanks. The only other oddity is that the Mac apps don't seem to use these, at least it's not called out in the app_info file, nor existing version I'm the seti folder. I'll start with making Mac versions though.:)

Thanks,

Chris
ID: 1766102 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1766105 - Posted: 18 Feb 2016, 20:55:49 UTC - in response to Message 1766102.  

Great, thanks. The only other oddity is that the Mac apps don't seem to use these, at least it's not called out in the app_info file, nor existing version I'm the seti folder. I'll start with making Mac versions though.:)

Thanks,

Chris
You can't find the fftw-lib for OSX because it is contained in the executable (linked in). Same applies to other *nix platforms, too.
_\|/_
U r s
ID: 1766105 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1766114 - Posted: 18 Feb 2016, 21:12:12 UTC - in response to Message 1766078.  

Where would one get the test wu's and method you'd like us to use?

Thanks,

Chris


http://lunatics.kwsn.info/index.php?action=downloads;cat=5
ID: 1766114 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1766120 - Posted: 18 Feb 2016, 21:55:22 UTC - in response to Message 1766105.  

You can't find the fftw-lib for OSX because it is contained in the executable (linked in). Same applies to other *nix platforms, too.


OK, that's what I thought.

Thanks,

Chris
ID: 1766120 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1766332 - Posted: 19 Feb 2016, 23:07:05 UTC - in response to Message 1766120.  

Didn't have much luck with the alternative FFTW libraries. I was running the AVX DLL on my AMD chip for about 3 hours with seeming good luck until I noticed that the majority of tasks were getting the "postponed because of impossible autocorr results" so I thought that I had better try one of the other alternatives. I tried the SSE41 DLL but unfortunately it trashed all 100 of my CPU tasks. To make matters worse, a few minutes later the SETI web servers went kaput and I couldn't get a replacement scheduler master fetch list. So I have been able to only crunch GPU work for the last half a day. Now that the web servers have come back online, I still haven't been able to get any MB8 work because of the errored tasks penalty. I reverted back to stock DLL and have been able to pick up 1 MB7 task so at least 1 core of 4 is working again. It looks like the alternative _X86 DLL is working fine for the AP7 OpenCL tasks though. I don't know why things blew up. I never got a chance to try out the SSSE3 DLL because I chickened out and reverted to stock.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1766332 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34502
Credit: 79,922,639
RAC: 80
Germany
Message 1766337 - Posted: 19 Feb 2016, 23:21:24 UTC

Here is a bench on my FX CPU.
All 3 tasks were running parallel.

Stock lib.

WU : new_reference.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 3614.091 secs
CPU 3601.985 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 1889.470 secs, speedup: 47.72% ratio: 1.91x
CPU 1884.695 secs, speedup: 47.68% ratio: 1.91x

WU : PG0009_v7.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 513.062 secs
CPU 509.296 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 338.079 secs, speedup: 34.11% ratio: 1.52x
CPU 335.605 secs, speedup: 34.10% ratio: 1.52x

WU : PG0395_v7.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 681.873 secs
CPU 678.214 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 370.011 secs, speedup: 45.74% ratio: 1.84x
CPU 367.554 secs, speedup: 45.81% ratio: 1.85x

WU : PG0444_v7.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 627.424 secs
CPU 623.801 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 349.449 secs, speedup: 44.30% ratio: 1.80x
CPU 346.868 secs, speedup: 44.39% ratio: 1.80x

WU : PG1327_v7.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 503.851 secs
CPU 499.843 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 387.807 secs, speedup: 23.03% ratio: 1.30x
CPU 384.496 secs, speedup: 23.08% ratio: 1.30x

AVX.lib

WU : new_reference.wu
setiathome_8.00_windows_intelx86.exe -verb -nog :
Elapsed 3548.817 secs
CPU 3543.937 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 1866.859 secs, speedup: 47.39% ratio: 1.90x
CPU 1862.278 secs, speedup: 47.45% ratio: 1.90x

WU : PG0009_v7.wu
setiathome_8.00_windows_intelx86.exe -verb -nog :
Elapsed 513.062 secs
CPU 509.296 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 343.498 secs, speedup: 33.05% ratio: 1.49x
CPU 341.034 secs, speedup: 33.04% ratio: 1.49x

WU : PG0395_v7.wu
setiathome_8.00_windows_intelx86.exe -verb -nog :
Elapsed 681.873 secs
CPU 678.214 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 367.312 secs, speedup: 46.13% ratio: 1.86x
CPU 364.824 secs, speedup: 46.21% ratio: 1.86x

WU : PG0444_v7.wu
setiathome_8.00_windows_intelx86.exe -verb -nog :
Elapsed 627.424 secs
CPU 623.801 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 349.259 secs, speedup: 44.33% ratio: 1.80x
CPU 346.712 secs, speedup: 44.42% ratio: 1.80x

WU : PG1327_v7.wu
setiathome_8.00_windows_intelx86.exe -verb -nog :
Elapsed 503.851 secs
CPU 499.843 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 343.014 secs, speedup: 31.92% ratio: 1.47x
CPU 340.363 secs, speedup: 31.91% ratio: 1.47x

SSSE3.lib

WU : new_reference.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 3614.091 secs
CPU 3601.985 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 1899.191 secs, speedup: 47.45% ratio: 1.90x
CPU 1894.398 secs, speedup: 47.41% ratio: 1.90x

WU : PG0009_v7.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 513.062 secs
CPU 509.296 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 347.895 secs, speedup: 32.19% ratio: 1.47x
CPU 345.371 secs, speedup: 32.19% ratio: 1.47x

WU : PG0395_v7.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 681.873 secs
CPU 678.214 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 378.366 secs, speedup: 44.51% ratio: 1.80x
CPU 375.775 secs, speedup: 44.59% ratio: 1.80x

WU : PG0444_v7.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 627.424 secs
CPU 623.801 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 358.406 secs, speedup: 42.88% ratio: 1.75x
CPU 355.791 secs, speedup: 42.96% ratio: 1.75x

WU : PG1327_v7.wu
setiathome_8.00_windows_intelx86.exe :
Elapsed 503.851 secs
CPU 499.843 secs
MB8_win_x64_AVX_VS2010_r3330.exe :
Elapsed 388.962 secs, speedup: 22.80% ratio: 1.30x
CPU 385.900 secs, speedup: 22.80% ratio: 1.30x
With each crime and every kindness we birth our future.
ID: 1766337 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1766454 - Posted: 20 Feb 2016, 9:41:41 UTC - in response to Message 1766337.  

Thanks for the posted results for both the AVX and SSE3 apps, Mike. Seems like the AVX one has the advantage with the FX processor. I have recovered from my stumbles yesterday morning and have MB8 work again. Also running the AVX app and DLL now with no apparent issues. Think my Autocorr issues were because of too high clocked FSB and not high enough load line calibration settings to compensate for Vdroop when the CPU was loaded.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1766454 · Report as offensive

Message boards : Number crunching : SIMD-level tailored Windows FFTW 3.3.4 DLL builds testing


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.