Version 3.4 of Faster SETI cruncher for Linux

Message boards : Number crunching : Version 3.4 of Faster SETI cruncher for Linux
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 187607 - Posted: 10 Nov 2005, 18:42:30 UTC

I've released a new version (R3.4) of the community-effort SETI client for Linux. You may download it at http://naparst.name

The new version is statically linked using ICC and IPP.
There are three versions: SSE, SSE2, and SSE3.
The SSE3 version includes some hand-assembly code to multiply complex vectors, which is what SSE3 was designed for, and what SETI does a lot of.

As Tetsuji would say, I just did it on a whim. But I was trying to remove what I thought was an 8% bottleneck. It turns out that the bottleneck was mostly caused by the small size of my L2 cache (1MB) relative to the size of the vectors (2x8MB). This is causing a lot of L2 cache misses (the bad sort).

So, although the SSE3 code does shave about 30 seconds off the SSE2 time, it was kind of a disappointment for me. I really had been hoping for more. But the bottleneck was (and is) cache-related.

For many of you, the exciting news in this release is going to be the rationalization of CFLAGS. Now, you can use ACML, IPP, or FFTW, or any combination. For instance, on my AMD 64 x2 in 64-bit mode, using gcc-4.0.2, I have the following results:

-DACML 73 minutes, 17 seconds
-DUSE_FFTWF 64 minutes, 46 seconds
-DUSE_FFTWF -DIPP 57 minutes, 15 seconds.

So you all should have a good weekend playing with this.

A couple of more items:

1) Gentoo has released a beta version of gcc-4.1. However, the SETI code in my repository will not compile with gcc-4.1 now. gcc-4.1 is stricter than gcc-4.0.2 regarding the ANSI C++ standard, and SETI doesn't meet the standard. So this is something to work on.

2) There is a fairly obvious next speedup to make for those of you using ACML or FFTW. Both libraries provide a means of storing plans for FFTs, and thus avoiding recalculation of the plan. I didn't include this in this release. I am trying to follow the advice given in "Practical Subversion," to keep changes small and atomic.
Harold Naparst
ID: 187607 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 187613 - Posted: 10 Nov 2005, 18:57:16 UTC - in response to Message 187607.  
Last modified: 10 Nov 2005, 18:58:01 UTC

I've released a new version (R3.4) of the community-effort SETI client for Linux. You may download it at http://naparst.name

The new version is statically linked using ICC and IPP.
There are three versions: SSE, SSE2, and SSE3.
The SSE3 version includes some hand-assembly code to multiply complex vectors, which is what SSE3 was designed for, and what SETI does a lot of.

As Tetsuji would say, I just did it on a whim. But I was trying to remove what I thought was an 8% bottleneck. It turns out that the bottleneck was mostly caused by the small size of my L2 cache (1MB) relative to the size of the vectors (2x8MB). This is causing a lot of L2 cache misses (the bad sort).


Hi Harold,
I guess we're hitting a hard limit here, the FPU simply is faster than the FSB.
But there's still hope since intel announced a single core Xeon with 8MB L2 for next year :o)

Regards Hans
ID: 187613 · Report as offensive
Profile spacemeat
Avatar

Send message
Joined: 4 Oct 99
Posts: 239
Credit: 8,425,288
RAC: 0
United States
Message 187614 - Posted: 10 Nov 2005, 18:58:09 UTC - in response to Message 187607.  

-DUSE_FFTWF -DIPP 57 minutes, 15 seconds.


how does this work?
ID: 187614 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 187636 - Posted: 10 Nov 2005, 20:07:41 UTC - in response to Message 187614.  

Thanks again Harold - I'll definately have a play with this over the weekend :)

-DUSE_FFTWF -DIPP 57 minutes, 15 seconds.


how does this work?


I'm assuming you can now specify -DUSE_FFTWF AND -DIPP, and it will use FFTW for all FFT functions and use the IPP routines for your improved routines in spike.cpp. Perhaps Harold could confirm this as I haven't had a chance to look at the code changes yet.

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 187636 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 187637 - Posted: 10 Nov 2005, 20:11:29 UTC - in response to Message 187636.  
Last modified: 10 Nov 2005, 20:13:28 UTC

Thanks again Harold - I'll definately have a play with this over the weekend :)

-DUSE_FFTWF -DIPP 57 minutes, 15 seconds.


how does this work?


I'm assuming you can now specify -DUSE_FFTWF AND -DIPP, and it will use FFTW for all FFT functions and use the IPP routines for your improved routines in spike.cpp. Perhaps Harold could confirm this as I haven't had a chance to look at the code changes yet.

Ned


I hope you don't have to look at the code changes.
Were the instructions on my CFLAGS page not clear?
You can use the flags in any combination, and there is
a priority logic to how they are applied each bottleneck in the
code: First FFTW, then ACML, then IPP. So, if you specify
all three flags, for instance, then FFTW will do the Fourier Transform,
but obviously FFTW won't do the trig calculations, since it doesn't
have a routine for that.


Harold Naparst
ID: 187637 · Report as offensive
Profile spacemeat
Avatar

Send message
Joined: 4 Oct 99
Posts: 239
Credit: 8,425,288
RAC: 0
United States
Message 187644 - Posted: 10 Nov 2005, 20:34:38 UTC - in response to Message 187637.  


I hope you don't have to look at the code changes.
Were the instructions on my CFLAGS page not clear?
You can use the flags in any combination, and there is
a priority logic to how they are applied each bottleneck in the
code: First FFTW, then ACML, then IPP. So, if you specify
all three flags, for instance, then FFTW will do the Fourier Transform,
but obviously FFTW won't do the trig calculations, since it doesn't
have a routine for that.



sorry, i was going from the initial post and had not yet seen your new CFLAGS page. i neglected to check the link to see that it was your explanation and not a 3rd party CFLAGS reference. it makes better sense now, thanks.
ID: 187644 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 187676 - Posted: 10 Nov 2005, 22:33:58 UTC - in response to Message 187644.  


sorry, i was going from the initial post and had not yet seen your new CFLAGS page. i neglected to check the link to see that it was your explanation and not a 3rd party CFLAGS reference. it makes better sense now, thanks.


Well, please let me know if it needs to be explained better.
Since I've been immersed in the code for weeks, I'm sure I've lost
perspective on what needs to be explained to you all about what the
flags do and what the libraries do.
Harold Naparst
ID: 187676 · Report as offensive
JohnB175
Volunteer tester

Send message
Joined: 15 Oct 03
Posts: 124
Credit: 321,769
RAC: 0
United States
Message 187686 - Posted: 10 Nov 2005, 23:09:45 UTC

Great work on v3.4. On my P3 w/ SSE I got a 6% performance boost compared to your previous v3.2 when running the reference workunit. Thanks!
ID: 187686 · Report as offensive
Profile rattelschneck
Avatar

Send message
Joined: 14 Apr 01
Posts: 435
Credit: 842,179
RAC: 0
Germany
Message 187687 - Posted: 10 Nov 2005, 23:12:08 UTC


Harold,

just to let you know setiathome_SSE2-naparst-r3.4 is running without an issue on my P4 so far. Looks good :-)

Thanks and regards
rattelschneck


ID: 187687 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 187696 - Posted: 10 Nov 2005, 23:31:09 UTC - in response to Message 187676.  


sorry, i was going from the initial post and had not yet seen your new CFLAGS page. i neglected to check the link to see that it was your explanation and not a 3rd party CFLAGS reference. it makes better sense now, thanks.


Well, please let me know if it needs to be explained better.
Since I've been immersed in the code for weeks, I'm sure I've lost
perspective on what needs to be explained to you all about what the
flags do and what the libraries do.


Your explanation above makes perfect sense, but the details on your cflags page don't seem to cover combined usage of more than one flag as well as you've just explained it above (at least for me). For example, when combining -DUSE_FFTWF and -DIPP the following two statements may appear to contradict each other:

"-DIPP If this flag is defined, then as much math as possible will be done using Intel's IPP library."

"-DUSE_FFTWF If this flag is defined, then the FFTW library will be used to calculate Fourier Transforms."

I know what you mean, but it may not be immediately obvious to others :)

...or perhaps I'm just being overly pedantic :D

Anyway, really big thanks for allowing combined usage - much appreciated as it will no doubt be of great benefit for AMD users :)

Ned


*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 187696 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 187699 - Posted: 10 Nov 2005, 23:36:49 UTC - in response to Message 187696.  


Your explanation above makes perfect sense, but the details on your cflags page don't seem to cover combined usage of more than one flag as well as you've just explained it above (at least for me). For example, when combining -DUSE_FFTWF and -DIPP the following two statements may appear to contradict each other:

"-DIPP If this flag is defined, then as much math as possible will be done using Intel's IPP library."

"-DUSE_FFTWF If this flag is defined, then the FFTW library will be used to calculate Fourier Transforms."

I know what you mean, but it may not be immediately obvious to others :)

...or perhaps I'm just being overly pedantic :D




Heh-heh. You stepped right into my trap, Ned. Since you are English,
how would you phrase the explanation for us?
Harold Naparst
ID: 187699 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 187702 - Posted: 10 Nov 2005, 23:50:50 UTC - in response to Message 187699.  


Your explanation above makes perfect sense, but the details on your cflags page don't seem to cover combined usage of more than one flag as well as you've just explained it above (at least for me). For example, when combining -DUSE_FFTWF and -DIPP the following two statements may appear to contradict each other:

"-DIPP If this flag is defined, then as much math as possible will be done using Intel's IPP library."

"-DUSE_FFTWF If this flag is defined, then the FFTW library will be used to calculate Fourier Transforms."

I know what you mean, but it may not be immediately obvious to others :)

...or perhaps I'm just being overly pedantic :D




Heh-heh. You stepped right into my trap, Ned. Since you are English,
how would you phrase the explanation for us?


LOL - I didn't realise American was so different :)

IMHO what you have is fine for single usage, but a little additional explanation of order of priority for combined usage (as you explained above) would be great.

Also, having looked closer at your downloads page, it might be better to just link to Crunch3r's site as he has clients (for AMD) for download based on your source whereas I currently don't (mine still give weak similarity and I'm not 100% sure they're totally static). Of course I don't mind if you want to link me, just that users will get faster clients for AMD from Crunch3r atm based on your source :)

Ned


*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 187702 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 187753 - Posted: 11 Nov 2005, 3:09:48 UTC

@ Harold

1.Could you please specify the meaning of -DW7 and -DA6 flags compared to the icc flags -xN and -xP?

2. Could you please make a version of your client with the -DIPP -xB option? My icc experience has been painful so far, and I would appreciate an PentiumM build just to see if it's more efficient than P3-SSE and/or P4-SSE2.





ID: 187753 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 187761 - Posted: 11 Nov 2005, 3:42:43 UTC - in response to Message 187753.  

@ Harold

1.Could you please specify the meaning of -DW7 and -DA6 flags compared to the icc flags -xN and -xP?


I've added more explanation to my CFLAGS page


2. Could you please make a version of your client with the -DIPP -xB option? My icc experience has been painful so far, and I would appreciate an PentiumM build just to see if it's more efficient than P3-SSE and/or P4-SSE2.


Please run the B version of Release 1 on my web site and let me know if it significantly outperforms the N version on your Pentium M.

I am reluctant to do it until I know there is a benefit.
For instance, in this article:

http://www.digit-life.com/articles2/insidespeccpu/insidespeccpu2000-part-e.html

the author concludes:
-QxB always proves to be the best choice for the CPU, though it usually makes no great difference.


In the case of the FFT test, the flag made no difference. But please test R1.


Harold Naparst
ID: 187761 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 187768 - Posted: 11 Nov 2005, 4:02:04 UTC

Please note that I have fixed a bug in r3.4 that was causing
IPP not to be used in an important part of the code, in the case
that -DUSE_FFTWF or -DACML was specified along with -DIPP.

The new release is source-only, because it doesn't affect the posted
binaries. You should now check out the sources at:

svn co svn://hnaparst.homelinux.com/seti_boinc/tags/naparst-r3.41


Harold Naparst
ID: 187768 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 187850 - Posted: 11 Nov 2005, 13:16:47 UTC - in response to Message 187761.  

@ Harold

1.Could you please specify the meaning of -DW7 and -DA6 flags compared to the icc flags -xN and -xP?


I've added more explanation to my CFLAGS page

Many thanks.


2. Could you please make a version of your client with the -DIPP -xB option? My icc experience has been painful so far, and I would appreciate an PentiumM build just to see if it's more efficient than P3-SSE and/or P4-SSE2.


Please run the B version of Release 1 on my web site and let me know if it significantly outperforms the N version on your Pentium M.

I am reluctant to do it until I know there is a benefit.
For instance, in this article:

http://www.digit-life.com/articles2/insidespeccpu/insidespeccpu2000-part-e.html

the author concludes:
-QxB always proves to be the best choice for the CPU, though it usually makes no great difference.


In the case of the FFT test, the flag made no difference. But please test R1.


Here you go. Time for the reference workunit.

R1-B 84min 33sec
R1-P4-N 86min 43sec



ID: 187850 · Report as offensive
Profile spacemeat
Avatar

Send message
Joined: 4 Oct 99
Posts: 239
Credit: 8,425,288
RAC: 0
United States
Message 187865 - Posted: 11 Nov 2005, 14:14:44 UTC

In file included from ./../config.h:461,
from <command line>:1:
/opt/intel/ipp/5.0/ia32/tools/staticlib/ipp_a6.h:7866: error: stray '\\26' in program

ack! nothing can be easy!
ID: 187865 · Report as offensive
Metod, S56RKO
Volunteer tester

Send message
Joined: 27 Sep 02
Posts: 309
Credit: 113,221,277
RAC: 9
Slovenia
Message 187906 - Posted: 11 Nov 2005, 16:48:08 UTC

For those who want to see how different seti binaries for linux perform on the same WU: I've added timings for Harald's 3.4 cruncher to the timing section of my page. They outperform any other SETI cruncher by large margin.
Metod ...
ID: 187906 · Report as offensive
Harold Naparst
Volunteer tester

Send message
Joined: 11 May 05
Posts: 236
Credit: 91,803
RAC: 0
Sweden
Message 187966 - Posted: 11 Nov 2005, 20:54:18 UTC
Last modified: 11 Nov 2005, 20:57:17 UTC

I've created a new source version of the cruncher (r3.5).
You can download the source at:

svn co svn://hnaparst.homelinux.com/seti_boinc/tags/naparst-r3.5

There are no binaries associated with this release.
This release saves the plans of the Fourier Transforms in a file wisdom
so they don't have to be recalculated every workunit.

If the directory doesn't contain the file wisdom, then the first time the program is run, it will create this file, and it will use the file on the subsequent work units. Thus, the first run will be slower than the subsequent runs.

Of course, this only matters if you compile your sources with -DUSE_FFTWF.
The posted binaries on my web site are NOT compiled with this flag, so the wisdom file is [u]completely irrelevant for the posted binaries[/b].

The reason I'm not releasing binaries is that the posted binaries are still faster than anything I can generate using FFTW (for Pentium):

Posted v3.4 binary (uses IPP): 26 minutes
3.41 sources with -DUSE_FFTWF: 29 minutes, 4 seconds
3.5 sources with -DUSE_FFTWF: 28 minutes, 28 seconds

Now, because I'm such a great guy, I'm going to save you the pain and anguish of having to wait for that first workunit to finish calculating and saving the FFT plans. Just cut and paste these lines into a new file called wisdom in the same directory as the SETI cruncher that you've compiled using the 3.5 sources.
-----------CUT HERE----------
(fftw-3.0.1 fftwf_wisdom
(fftwf_dft_vrank_geq1_register 0 #xc058 #xac2c89ce #x620fc6b6 #xc7828817 #xd7bcefdc)
(fftwf_dft_buffered_register 0 #xc050 #xb9b0acb6 #x621fd39f #x70f6ab5d #x7dc1e353)
(fftwf_dft_buffered_register 0 #xc050 #x836f829d #x8acd20cd #x8c0f8e45 #x0337ecb7)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x3293898c #x3b042963 #x8ebf167a #xacdc90f7)
(fftwf_codelet_n2bv_16 0 #xc050 #x8b12bda4 #xfaecd94d #x93351901 #x3d5424b7)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x684a2058 #x1b9751fc #x23ac361c #x1883cb70)
(fftwf_codelet_t1fv_4 0 #xc050 #xf6ebda74 #x7d33c356 #x1ea8ab61 #x570de7c2)
(fftwf_dft_nop_register 0 #xc050 #x0fc5797a #x0b073cd6 #x19e53229 #xc4744b2c)
(fftwf_dft_buffered_register 0 #xc050 #xec43b2d8 #xad0cb700 #x10e7bd10 #x1fa78a76)
(fftwf_dft_buffered_register 0 #xc050 #xbb248454 #x7d8f7413 #x063335fe #x606a3a8a)
(fftwf_dft_nop_register 0 #xc050 #x1ff55d02 #xe994f223 #xc4503f10 #xc46261ec)
(fftwf_codelet_q1bv_4 0 #xc050 #x9857681a #x66e8e314 #x6b9e6eec #x9d232edf)
(fftwf_codelet_m1bv_32 0 #xc050 #xc962e910 #x3ecd69eb #xd29eb699 #x227cbfea)
(fftwf_codelet_m1bv_64 0 #xc058 #x0ddd812e #xba5db9b8 #x2af3a2db #xc47c1f11)
(fftwf_codelet_t1bv_8 0 #xc050 #xa90d8485 #x86788f78 #x8efcb734 #x4330e58d)
(fftwf_dft_nop_register 0 #xc050 #x8830b332 #x4feefb49 #x7a1426ad #x38ba25e1)
(fftwf_codelet_n2fv_16 0 #xc050 #xc69f9d9e #x2de35cc0 #xe27b0473 #x934bb3da)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x928de41f #x237e9106 #x3f70ccee #x11b58a91)
(fftwf_dft_rank0_register 4 #xc050 #x4cab0fe5 #x60186fe5 #x3d100729 #x7cae6124)
(fftwf_dft_rank0_register 4 #xc050 #x5de84c6d #x3926b8b9 #x7764b665 #xdb7ebe21)
(fftwf_codelet_q1bv_2 0 #xc050 #x96a46a87 #x4ba0f654 #xdce5574f #x3c4d31bc)
(fftwf_dft_buffered_register 0 #xc050 #xd97944d7 #xc9805490 #x40b0d010 #x58d74707)
(fftwf_dft_indirect_register 1 #xc050 #x32694c4c #xbbad91ec #x025684af #xf3010598)
(fftwf_codelet_n1_16 0 #xc050 #xaf4ba85b #xffb0f9da #x49814f7b #x6223373a)
(fftwf_dft_rank0_register 1 #xc050 #x4f8bb900 #x3fee403a #x86dd0a93 #x61aaa326)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x0b3cdd9a #xe6954c13 #x1855ad7e #x92287645)
(fftwf_codelet_q1bv_2 0 #xc050 #x7248963c #xe9582c3d #x5511361d #x54e3b388)
(fftwf_codelet_n1bv_16 0 #xc050 #x6685c831 #x6004fa69 #xb13bf03f #x3e569991)
(fftwf_codelet_t1bv_4 0 #xc050 #xac220b93 #xabb40885 #x4f3ecef1 #x4ef82d14)
(fftwf_dft_nop_register 0 #xc050 #x455a73a8 #x74cc8626 #x1d271497 #x0f969a85)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x35036cc8 #xae0a80ef #xddeb44ea #xb6347595)
(fftwf_dft_vrank_geq1_register 0 #xc050 #xa669443f #xfc5cc8d2 #x92abf32c #x5d9051b0)
(fftwf_codelet_t1bv_64 0 #xc050 #xb63be7fa #xcb00cde1 #x44a752a5 #x91f840c4)
(fftwf_codelet_t1bv_32 0 #xc050 #x51a41265 #x79cbdeb4 #xf12b461b #xea601400)
(fftwf_codelet_n2bv_16 0 #xc050 #xc8193d82 #xfe020d60 #x68e981ed #x0293aedc)
(fftwf_codelet_t1bv_16 0 #xc050 #x003add33 #x751d9b65 #xc91c018a #xc86c92e7)
(fftwf_codelet_q1bv_2 0 #xc050 #x43714248 #x616510a7 #x4d2baca1 #x7b393255)
(fftwf_dft_rank0_register 4 #xc050 #x25f4bb37 #x7c81c402 #x928478c9 #x731b4176)
(fftwf_codelet_t1bv_8 0 #xc050 #x88880144 #xedae1a78 #x42c7a7d9 #x418376a5)
(fftwf_codelet_n2bv_16 0 #xc050 #x32300cc6 #x97e2779c #xf027d7e8 #xadefa2a6)
(fftwf_codelet_t1bv_16 0 #xc050 #x2ba3287b #x7f03a1b6 #x1b69a489 #x2aa95256)
(fftwf_dft_buffered_register 0 #xc050 #x2969d48e #xe7b28d8f #x82845cf7 #xa030d93e)
(fftwf_dft_vrank2_transpose_register 0 #xc058 #xba1c5168 #x85f858f2 #x21e67bb7 #xb7ed7b15)
(fftwf_dft_indirect_register 1 #xc050 #xcf8fc6d2 #xab3e56b0 #x0d8ca2d9 #x03e2f25b)
(fftwf_dft_vrank_geq1_register 0 #xc050 #xca6f0a0c #x9d9125b7 #x6f599c11 #x460d8d93)
(fftwf_dft_nop_register 0 #xc050 #xa44b17b5 #x50e82b42 #x24783b6d #x72c8ee28)
(fftwf_codelet_n1_8 0 #xc050 #xcd3a914b #x79ca538f #x0e91327e #xb8faf502)
(fftwf_dft_vrank2_transpose_register 0 #xc058 #x95c0361e #xb5690541 #x53f57c00 #x4589a038)
(fftwf_dft_vrank_geq1_register 0 #xc050 #xec972188 #xb6494e8d #xfb5f499d #x54698666)
(fftwf_dft_buffered_register 0 #xc050 #x8650e6c8 #x3577a8e5 #xcd73b7c6 #x958d3d03)
(fftwf_codelet_t1bv_32 0 #xc050 #x42799a92 #x700cef47 #x5dce2b14 #x101cfd97)
(fftwf_codelet_q1fv_4 0 #xc050 #x021f3451 #x69515a1e #x4d9dc0bd #x83ce7e76)
(fftwf_codelet_n2bv_16 0 #xc050 #xa4f5606c #xb443d82d #xb5317175 #x35acc018)
(fftwf_dft_rank0_register 4 #xc050 #x8ce5eaec #x8d75697c #xe0a5cc9b #x72e96bfb)
(fftwf_dft_buffered_register 0 #xc050 #x5d79be75 #x4913cb69 #xc196159c #xfb363fca)
(fftwf_codelet_t1bv_8 0 #xc050 #xcaec2ef5 #x49c9b04a #xef0940c0 #xcab55d30)
(fftwf_dft_nop_register 0 #xc050 #x69efd115 #xf36e75a6 #xb4ce9f3d #x5f07f215)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x4c5111aa #xfd4181b9 #xf93abb9b #xd3b292ca)
(fftwf_codelet_t1bv_2 0 #xc050 #x2b2b45ba #x9fb6d44f #x0e972964 #xc56b9e2b)
(fftwf_codelet_t1bv_64 0 #xc050 #x6b7efec6 #x2566176e #x8c2d16b6 #xbe218c32)
(fftwf_codelet_n1bv_16 0 #xc058 #x297664e1 #x92fe9483 #x201e93cb #xcf0cdb10)
(fftwf_dft_rank0_register 1 #xc050 #x6cc9f9cb #xd3a3344d #xbed3c6df #x043daa71)
(fftwf_codelet_t1bv_8 0 #xc050 #xa7293d36 #x5d2c1382 #x24f8dcb8 #xe95be76f)
(fftwf_dft_rank0_register 4 #xc050 #x528ebfdf #x7cdafdbe #xb1042a1e #xe085d241)
(fftwf_codelet_m1bv_32 0 #xc058 #x2e74d85e #x59a7147f #x0c17551f #x47768c00)
(fftwf_codelet_t1bv_16 0 #xc050 #x321fa57c #x4ff4744a #xc6d8d3b9 #x94f7cc63)
(fftwf_dft_nop_register 0 #xc050 #x14b581da #xdae58599 #xf20d7f67 #x6ba5f281)
(fftwf_codelet_n1bv_8 0 #xc050 #x63f1fb77 #xc85c5669 #xa6321110 #x403787b0)
(fftwf_codelet_t1fv_4 0 #xc050 #x7a67eb0c #x9ae2efbf #xac811901 #x0dae7942)
(fftwf_codelet_n2bv_16 0 #xc050 #x228ee733 #x52bc246e #x88d3d2e1 #xfe5cc41e)
(fftwf_codelet_t1bv_32 0 #xc050 #xccf93b10 #xc0d44ec6 #xd4ba37a4 #x20549f00)
(fftwf_codelet_t1bv_2 0 #xc050 #xc7ec3bee #x701be4c7 #x8e01c889 #x3f8dc18c)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x1bbb3e1b #xd4e3dbea #x6a8d86f2 #xc63aa7d3)
(fftwf_codelet_t1bv_32 0 #xc050 #x5fab85aa #xf05259ef #xe8a10a93 #xf17663a2)
(fftwf_dft_indirect_register 0 #xc050 #xcd992ab6 #xc939a630 #xe11dcc04 #x0b30ee58)
(fftwf_codelet_t1bv_32 0 #xc050 #xb1e047be #x84330fbe #x4961936b #xfc09915e)
(fftwf_codelet_t1fv_32 0 #xc050 #x6277fafa #x5fbd73bc #x131d96ec #x5cd5dc5e)
(fftwf_dft_rank0_register 4 #xc050 #xd0aa3742 #x8477a624 #xda246d69 #x21181035)
(fftwf_codelet_t1bv_16 0 #xc050 #xe9c51cc3 #x98f99349 #x9da6c283 #xf13a14b5)
(fftwf_codelet_n2bv_16 0 #xc050 #x12444343 #x32325084 #xa557541d #xb33b7921)
(fftwf_codelet_t1bv_64 0 #xc050 #xc8061d51 #xf61d5936 #xd5ccb5f1 #x0e9b6bc8)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x907d8564 #xc4faf777 #x333673b9 #xfb02089a)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x43dc2b09 #x504f1324 #x6521f623 #x77c6f1d7)
(fftwf_dft_vrank_geq1_register 0 #xc050 #xbd4e71df #xd9a3de04 #x3c0a0219 #x73d15dda)
(fftwf_codelet_t1bv_2 0 #xc050 #xc79e379a #x99d0c879 #xffc650a6 #x529389b1)
(fftwf_dft_vrank_geq1_register 0 #xc050 #xdac621b7 #x8ecaa0f9 #x992d5ec0 #x50ee0f6d)
(fftwf_codelet_t1bv_8 0 #xc050 #xa4328c69 #x1fbea3c3 #x015c41fe #x597c8774)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x7e338f92 #x4b7c7b29 #x211e1b9e #x85885054)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x5584c608 #x151f4d19 #xa5220a64 #x20844bc9)
(fftwf_dft_vrank_geq1_register 0 #xc050 #x657130a3 #x73731a12 #x284c849c #x56780146)
(fftwf_dft_nop_register 0 #xc050 #xd0780e6e #x0d04749c #xe00219d8 #x1c647df1)
(fftwf_codelet_n2bv_16 0 #xc050 #x853bb5f0 #xebe7fa5b #xaa10324e #x3cf64572)
(fftwf_dft_vrank2_transpose_register 0 #xc058 #xee614467 #x22a76f06 #x8206eb83 #xbbaa7d86)
(fftwf_codelet_t1bv_16 0 #xc050 #xe285da11 #x79f85f2b #x80ab04ed #x9f1d6d4b)
)
-----------DO NOT INCLUDE THIS LINE----------

ID: 187966 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 187992 - Posted: 11 Nov 2005, 21:56:00 UTC - in response to Message 187966.  
Last modified: 11 Nov 2005, 21:58:33 UTC

I've created a new source version of the cruncher (r3.5).
You can download the source at:

svn co svn://hnaparst.homelinux.com/seti_boinc/tags/naparst-r3.5

There are no binaries associated with this release.
This release saves the plans of the Fourier Transforms in a file wisdom
so they don't have to be recalculated every workunit.

If the directory doesn't contain the file wisdom, then the first time the program is run, it will create this file, and it will use the file on the subsequent work units. Thus, the first run will be slower than the subsequent runs.

Of course, this only matters if you compile your sources with -DUSE_FFTWF.
The posted binaries on my web site are NOT compiled with this flag, so the wisdom file is [u]completely irrelevant for the posted binaries[/b].

The reason I'm not releasing binaries is that the posted binaries are still faster than anything I can generate using FFTW (for Pentium):

Posted v3.4 binary (uses IPP): 26 minutes
3.41 sources with -DUSE_FFTWF: 29 minutes, 4 seconds
3.5 sources with -DUSE_FFTWF: 28 minutes, 28 seconds

Now, because I'm such a great guy, I'm going to save you the pain and anguish of having to wait for that first workunit to finish calculating and saving the FFT plans. Just cut and paste these lines into a new file called wisdom in the same directory as the SETI cruncher that you've compiled using the 3.5 sources.




As i'm seeing this a question comes up to my mind:

Is your fftw3 lib sse enabled ? ("-enable-sse")


Join BOINC United now!
ID: 187992 · Report as offensive
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Number crunching : Version 3.4 of Faster SETI cruncher for Linux


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.