compile a faster linux client using ipp instead of fftw

Message boards : Number crunching : compile a faster linux client using ipp instead of fftw
Message board moderation

To post messages, you must log in.

AuthorMessage
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 114839 - Posted: 25 May 2005, 19:18:39 UTC
Last modified: 25 May 2005, 20:10:13 UTC

Hi all crunching maniacs :),

I just happened to find an fft benchmark page, see single-precision 1d complex powers of two chart, and found intel-ipps was faster than fftw3. About IPP, see this page.

So I've compiled seti client using IPP fft functions, instead of fftw with Intel compiler. But damn! at first it's slower than my fftw client....I was wondering why? IPP-fft function is faster than fftw, but why seti with ipp is slower...and I came up with an idea the patch for fftw had a clue. Patched files are derived from the old seti client source package (maybe last year) and actually I found several differences from the recent sources. So I downloaded the oldest seti source;ie, Jan-01 nightly and moved 4 files (analyzeFuncs.cpp, analyzeFUncs.h, seti.cpp, seti.h) from it, put them into my May-08 seti client directory. I built a seti client with these 4 old files and I compared that "old source included" seti client with "current" seti client w/o any additional fft functions built with the same conditions and found the old one (only 4 files are old) was 5-10% faster!! You may think the boost is because of fftw, but the old source is another big factor with the default dft (not fft by default) function!!

Naturally next I modified the source so that it uses ipp fft functions.

Now benchmark is under way....1 to 1 comparison with fftw-linked binary (which takes only 40-46% time of that of the official one.)

Currently ipp 78.2% done, fftw 70.7% done.

Maybe this is the fastest seti client in the world for Linux. But benchmark hasn't finished, and validation requirements must be confirmed.

Fortunately intel compiler products and ipp libary-linked binaries for Linux are both freely distributable with non-commercial licenses for non-profitable purposes. I'll put links here to binaries for available processors here so any linux user can use them forever. But I'm not sure they work with AMD processors....

BTW I tried to link seti with Intel MKL fft math library (which was not in the chart. Only dft was in the chart), and confirmed it's slightly slower than IPP. After all IPP is for multi-media and signal-processing....and seti processes magnetic waves from the space. The best fit!

I've heard the words from Ben Kenobi...."May the Source be with you!" and "Use the source, Maverick...."

Now I'm maniac.....soon I'll return to normal if possible :) It's my father's will; "once done, recede asap."
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 114839 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 114924 - Posted: 25 May 2005, 22:05:58 UTC

ID: 114924 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 114938 - Posted: 25 May 2005, 22:29:25 UTC
Last modified: 25 May 2005, 22:47:17 UTC

Thanks....benchwork finished....

6759 seconds (ipp) vs 7347 seconds (fftw3)..this is done by running both clients at once on HT enabled machine, so it's relatively reliable.. And according to my unreliable benchmark record :), official 4.02 took 15985 seconds (very old and the condition is not unstable..unlike Metod, my benchmark is done in HT mode with other applications running)....approximately the ratios of crunching times are 42% vs 46%. Is it fast or not? It's fast to me:)

I didn't know seti client source slowed down the client since the beginning of thie year....

The result is within validation limits :) sleepy......go to bed again..:)

Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 114938 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 114979 - Posted: 26 May 2005, 0:03:36 UTC
Last modified: 26 May 2005, 0:04:52 UTC

Nice increase :)

I don't know about the newer sources as I've always used the (older) seti nightly source from 1 Dec 2004 for my AMD-based clients. I did hear Eric Korpela made some changes around Feb 2005 time.

Can you tell us a little more about IPP. Is it intel specific (only for intel processors)? Also, is it distributed as binaries or as source? I'm wondering if I could try using IPP instead of fftw with gcc for my AMD clients.

I'll go have a look at the Intel page now :)

Ned



*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 114979 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 114991 - Posted: 26 May 2005, 0:37:57 UTC - in response to Message 114979.  
Last modified: 26 May 2005, 1:09:13 UTC

Nice increase :)

I don't know about the newer sources as I've always used the (older) seti nightly source from 1 Dec 2004 for my AMD-based clients. I did hear Eric Korpela made some changes around Feb 2005 time.

Can you tell us a little more about IPP. Is it intel specific (only for intel processors)? Also, is it distributed as binaries or as source? I'm wondering if I could try using IPP instead of fftw with gcc for my AMD clients.

I'll go have a look at the Intel page now :)

Ned




oh. It's distributed in the binary form and I hope it will work with AMD processors, though I don't know the performance.. For non-commercial use anyone can download it only by registering email address for free and it can be linked with gcc.

And if I send you my binaries, will you host them?
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 114991 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 115005 - Posted: 26 May 2005, 1:09:53 UTC
Last modified: 26 May 2005, 1:12:16 UTC

Yes, sure - I'm happy to host anything for you :)

I'm pretty familiar with gcc compiler options so I'd like to try using these IPP libs with gcc in place of fftw3 if they're genuinely faster. If you can show me your source patches for them, I can do the rest :)

Which file(s) did you need to patch to use IPP?

I've just downloaded the tarball from Intel, but was a little unsure which version I should download for an Athlon XP. I downloaded the version for "Intel® IPP for Linux* on Intel® Pentium® and Itanium® Processors" hoping this will work on my Athlon XP (I don't have any Intel based machines if it doesn't).

So presumably if I can install the Intel IPP libs on my machine (and they're compatible with AMD procesors) then I should be able to use your patched source file for IPP and compile using gcc for any architecture (although I'd prefer to just concentrate on AMD and a generic i686). Either Metod or yourself could do the same for Intel CPU's using icc

One note: because it's not open source, SETI (Berkeley) will never officially incorporate the use of IPP libs into the official client, whereas fftw3 is open source so could be officially incorporated one day. No harm in us trying though :)

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 115005 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 115012 - Posted: 26 May 2005, 1:22:23 UTC - in response to Message 115005.  
Last modified: 26 May 2005, 1:47:30 UTC

Yes, sure - I'm happy to host anything for you :)

I'm pretty familiar with gcc compiler options so I'd like to try using these IPP libs with gcc in place of fftw3 if they're genuinely faster. If you can show me your source patches for them, I can do the rest :)

Which file(s) did you need to patch to use IPP?

I've just downloaded the tarball from Intel, but was a little unsure which version I should download for an Athlon XP. I downloaded the version for "Intel® IPP for Linux* on Intel® Pentium® and Itanium® Processors" hoping this will work on my Athlon XP (I don't have any Intel based machines if it doesn't).

So presumably if I can install the Intel IPP libs on my machine (and they're compatible with AMD procesors) then I should be able to use your patched source file for IPP and compile using gcc for any architecture (although I'd prefer to just concentrate on AMD and a generic i686). Either Metod or yourself could do the same for Intel CPU's using icc

One note: because it's not open source, SETI (Berkeley) will never officially incorporate the use of IPP libs into the official client, whereas fftw3 is open source so could be officially incorporated one day. No harm in us trying though :)

Ned


okay!! I will send you the patched file (only analyzeFuncs.cpp is patched and other 3 files...maybe you don't need them..just in case) and a simple README. As you know gcc options very well and can make fast binaries, if you take a look at README you'll find proper options very easily. It's originally for my memo, because I have too many revisions on my harddisk :) So far I have made more than 20 ways..different compiler/compiler switches/libraries/patches/optimizations... Now I'm tired of compiling for various cpus, so please compile with IPP and distribute them as you like (source or binaries or whatever you like. You are good at compiling fast binaries) :) Since my glibc and other libraries are compiled with -march=pentium4 option (now I'm using Gentoo and all my packages were compiled by me), I cannot make static binaries for other cpus.

I think IPP for Pentium is right for AMD also, and it contains libraries for Itanium also. At least the library supports P3, so it should work with Athlon.

And yes, I hope we or other users don't have any legal issue.
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 115012 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 115021 - Posted: 26 May 2005, 1:52:45 UTC

Ahh, the joys of gentoo :)

Yes, you could only make static clients for P4.

I will see if the libs are compatible when I try them. An Athlon XP is theoretically compatible with P3, but not always when compiling with processor specific optimizations.

There should be no legal issues with this. SETI is not a profit making organisation, and the licence is for me anyway, not SETI, and I sure don't make any profit from it.

I don't know how much longer I'm going to be able to keep up my effort on this anyway. I start a new job next week and will have far less time to spend on this (if any). The time commitments to stay on top of this are huge and there's always new things to test or try (like this IPP). Being semi-retired for the last 18 months has been nice, but now I must work to bring in some income :(

It's been a lot of fun though, and I've learned a lot about compiling software along the way, which was my main intention from the outset.

Ned



*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 115021 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 115024 - Posted: 26 May 2005, 1:56:06 UTC - in response to Message 115012.  


okay!! I will send you the patched file (only analyzeFuncs.cpp is patched and other 3 files...maybe you don't need them..just in case) and a simple README.


Oh, and could you send the original unpatched source file too so I can diff for the changes, just in case my (slightly older) version of analyzeFuncs.cpp is different from yours. I've also made some other small changes to analyzeFuncs.cpp that I'd like to keep and keeping track of all the changes is becoming hard!

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 115024 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 115026 - Posted: 26 May 2005, 2:09:31 UTC - in response to Message 115024.  
Last modified: 26 May 2005, 3:05:39 UTC


okay!! I will send you the patched file (only analyzeFuncs.cpp is patched and other 3 files...maybe you don't need them..just in case) and a simple README.


Oh, and could you send the original unpatched source file too so I can diff for the changes, just in case my (slightly older) version of analyzeFuncs.cpp is different from yours. I've also made some other small changes to analyzeFuncs.cpp that I'd like to keep and keeping track of all the changes is becoming hard!

Ned


yes...but now I have a major problem!! The new cruncher has crunched 4 wu's, but two are invalid (two are valid, which had been half done by the prior version). Since my queue is very long, I can tell the result very quickly. Incredible results.....I will send you the result.sah made from the reference_work_unit.sah, so will you check it? I've never got "Invalid"..so it's incredible to me.

There is another choice; ipp's fft has 3 options (fast/accuracy/none.) I have chosen "fast", but I hope none or accuracy will work. Now I'm trying "none"
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 115026 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 115030 - Posted: 26 May 2005, 2:32:25 UTC

Ouch - I've never had an invalid result with an optimized client before.

I'll take a look at your result.sah and compare to a reference result file.

By not using the fast option, I suspect you may lose what little gains you had. +4% (42 -> 46%) is certainly worth having, but not at the expense of invalid results.

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 115030 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 115044 - Posted: 26 May 2005, 3:10:35 UTC
Last modified: 26 May 2005, 3:19:54 UTC

LOL!!! I found my error and solution. It's not due to fast option. That's because of the wrong direction of FFT(time -> freq and freq-> time)!! And actually in result.sah, frequencies are out of validation limits, and my mkl version (Intel's Math Kernel Library) also produced the same errors....my BAD!! sorry for confusion.

I put "forward" function instead of "inverse" function :) It sometimes happens with fft. I did make the same mistake in another application around 20 years ago.

Now I've made a revised version with "fast" option and crunching reference wu. So far NO frequency errors in result.sah, which were present before.

but I've never thought the wrong direction of Fourier transformation makes a similar result.sah!!

regards,

EDIT: now I'm crunching the real WU's downloaded from Berkeley.
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 115044 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 115057 - Posted: 26 May 2005, 5:40:07 UTC
Last modified: 26 May 2005, 5:57:14 UTC

okay!! I was correct. My mistake was the direction of FFT!! My cruncher finished 2 WU's, sent the results back to the server and got validations for both!! (fast option as a matter of course!!)

this and this, but I don't know other people can see these...

Actually faster than before. I will send the final patch to Ned.....

Now I follow my father's will, "once done, recede asap"....ahh forgot benchmark, but it must be the same as the forward fft..
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 115057 · Report as offensive
Profile spacemeat
Avatar

Send message
Joined: 4 Oct 99
Posts: 239
Credit: 8,425,288
RAC: 0
United States
Message 115111 - Posted: 26 May 2005, 13:31:57 UTC

Ned, is your modified source available? i never had luck patching my own. i'd still like to optimize for sparc platform, and i have a FreeBSD disk that i can probably work on as well. also may be able to help get boinc/seti into the gentoo portage tree which has been a very slow process so far
ID: 115111 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 115132 - Posted: 26 May 2005, 15:00:26 UTC - in response to Message 115111.  

Ned, is your modified source available? i never had luck patching my own. i'd still like to optimize for sparc platform, and i have a FreeBSD disk that i can probably work on as well. also may be able to help get boinc/seti into the gentoo portage tree which has been a very slow process so far


Yes, of course:

http://www.pperry.f2s.com/files/seti_boinc-client-fftw3-2004-12-01.tar.gz

I also described in detail how to build an optimized seti client using fftw3 with links to all sources in this thread here:

http://forums.pcper.com/showthread.php?t=385265

This is based on the nightly seti source from 1 Dec 2004, patched to use fftw3. I usually compile it against the boinc nightly source from 28 Jan 2005, but most any boinc nightly source _should_ (may) work. I have the pristine berkeley unpatched seti source from 1 Dec 2004 somewhere too if you need that.

I patched the above to use the fftw3 sources provided by Eric Heien (also available on the Seti SourceForge site). So it includes his patches plus a patch from Paolo to avoid a seg fault on *nix, plus I patched the makefile.in to correctly link the fftw3 lib. It compiles cleanly (using boinc source above) without any modifications with gcc-3.4.3 provided you have the fftw3 (I use fftw v3.0.1) libs installed (fftw3 compiled cleanly from source using ./configure --enable-float && make && make install). You may want to optimizes fftw3 too ;)

Once you get it to compile, I'll show you my hack for making truely static binaries, needed if you wish to distribute to other users. I keep meaning to test a hack to makefile.in to automate the process but haven't had time to do it yet :(

If trying to get it into the portage tree for gentoo, remember you'll need to make fftw3 a dependency that must be installed first. The standard fftw3 source is fine for this as my seti source links against it, but must be configured with --enable-float for single precission :)

Maverick has sent me patches for IPP (to analyzeFuncs.cpp) but I haven't had a chance to try these yet. His patches are to original sources from 1st Jan 2005, so I'll need to check diffs between those and the slightly older version I use and try to incorporate his patches into my source if I want to try IPP for fft functions.

Ned





*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 115132 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 115137 - Posted: 26 May 2005, 15:52:00 UTC
Last modified: 26 May 2005, 16:00:19 UTC

Spacemeat,

Ned didn't modify the source...as such...beyond adding links to call FFTW3 library functions instead of Seti's Oourda functions (and patches to makefiles, etc.). He did discover optimization parameters for GCC compiler and Intel's ICC compiler to make the resulting binary much quicker. (and figure out which dates of source files to use ;)

So since Intel doesn't make a sparc compiler, and the GCC developers have spent faaar more time optimizing x86 code than Sparc code I doubt if they would improve it anywhere as radically as they did for x86. Also the Sparc SIMD component only includes integer functions. Sun did develop their own free FFT library for Sparc, but I would bet Oourda inside of Seti code is faster.

Tetsuji,

I see you included the fprintf() to stderr in your code, however for this different version (Intel FFT) you didn't change the stderr message to reflect this...just a thought.
[edit] Oops...my bad -- saw text on 2nd WU [/edit]

ID: 115137 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 115157 - Posted: 26 May 2005, 17:21:02 UTC - in response to Message 115137.  
Last modified: 26 May 2005, 17:21:26 UTC


Ned didn't modify the source...as such...beyond adding links to call FFTW3 library functions instead of Seti's Oourda functions (and patches to makefiles, etc.). He did discover optimization parameters for GCC compiler and Intel's ICC compiler to make the resulting binary much quicker. (and figure out which dates of source files to use ;)


Yes, that's right - all I've really done is bundle all the relevent patches into a single source tarball to make it easier for others if they'd like to compile. As I had it done anyway for my own compiles (and it wasn't easy for this non-programmer!!) and the GPL says the source should be made freely available, this just seemed like the right thing to do :)

Myself and Chris Bosshard have spent a lot of time experimenting with gcc optimization parameters whilst Maverick and Metod have both been experimenting with compiles using icc (the intel compiler).

Most of the gains we've found come from just 2 things - using the -ffast-math gcc compiler optimization and using the fftw3 libs for Fast Fourier Transforms. They seem to afford most benefit with other optimizations being relatively minor in comparison. A gcc optimized client without these two factors is almost identical in performance (+/- 2-3%) to the original berkeley seti client, even with all the normal gcc optimizations applied (-O3, -march, -fomit-frame-pointer etc). I think this just illustrates that the default berkeley clients are actually already fairly well optimized in terms of compiler optimizations.

Ben (benher), unlike me, is a _proper_ developer and I know he did some of the very early work experimenting with adding SIMD code, and is a project leader on the Seti SourceForge Project. Thanks for all your help and support Ben :)

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 115157 · Report as offensive
Profile spacemeat
Avatar

Send message
Joined: 4 Oct 99
Posts: 239
Credit: 8,425,288
RAC: 0
United States
Message 115200 - Posted: 26 May 2005, 20:25:52 UTC - in response to Message 115137.  

Spacemeat,

Ned didn't modify the source...as such...beyond adding links to call FFTW3 library functions instead of Seti's Oourda functions (and patches to makefiles, etc.). He did discover optimization parameters for GCC compiler and Intel's ICC compiler to make the resulting binary much quicker. (and figure out which dates of source files to use ;)

So since Intel doesn't make a sparc compiler, and the GCC developers have spent faaar more time optimizing x86 code than Sparc code I doubt if they would improve it anywhere as radically as they did for x86. Also the Sparc SIMD component only includes integer functions. Sun did develop their own free FFT library for Sparc, but I would bet Oourda inside of Seti code is faster.


i compiled fftw3 libs for the sparc and wanted to at least test the results of a seti client. when i tried before using various sources, patches, and manual text editing, it didnt work on sparc or x86 - same errors in both cases. with fully patched and working source, i'll at least know i didn't make a typo somewhere.
i'll also have to look into the sun libs just to see what performance difference there is, if any.
ID: 115200 · Report as offensive
Profile StokeyBob
Avatar

Send message
Joined: 31 Aug 03
Posts: 848
Credit: 2,218,691
RAC: 0
United States
Message 115649 - Posted: 28 May 2005, 4:06:58 UTC
Last modified: 28 May 2005, 4:08:33 UTC

Tetsuji Maverick Rai

I meant to comment on the description that was listed in that work unit result you posted. Pretty cool.



(Maniacally optimized with Intel C++ compiler/IPP library by Tetsuji Maverick Rai, rev.01 Thu May 26 12:02:47 2005)



Work unit result
ID: 115649 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 115682 - Posted: 28 May 2005, 8:06:14 UTC - in response to Message 115649.  
Last modified: 28 May 2005, 8:14:56 UTC

Tetsuji Maverick Rai

I meant to comment on the description that was listed in that work unit result you posted. Pretty cool.



(Maniacally optimized with Intel C++ compiler/IPP library by Tetsuji Maverick Rai, rev.01 Thu May 26 12:02:47 2005)



Work unit result


Thanks :) It's quite useful when something weird happens :) (I mean revision number and the timestamp of patched file.)
Luckiest in the world. WMD = Weapon of Mass Distraction.
Click this table.
ID: 115682 · Report as offensive

Message boards : Number crunching : compile a faster linux client using ipp instead of fftw


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.