Message boards :
Number crunching :
Version 3.4 of Faster SETI cruncher for Linux
Message board moderation
Author | Message |
---|---|
Harold Naparst Send message Joined: 11 May 05 Posts: 236 Credit: 91,803 RAC: 0 ![]() |
I've released a new version (R3.4) of the community-effort SETI client for Linux. You may download it at http://naparst.name The new version is statically linked using ICC and IPP. There are three versions: SSE, SSE2, and SSE3. The SSE3 version includes some hand-assembly code to multiply complex vectors, which is what SSE3 was designed for, and what SETI does a lot of. As Tetsuji would say, I just did it on a whim. But I was trying to remove what I thought was an 8% bottleneck. It turns out that the bottleneck was mostly caused by the small size of my L2 cache (1MB) relative to the size of the vectors (2x8MB). This is causing a lot of L2 cache misses (the bad sort). So, although the SSE3 code does shave about 30 seconds off the SSE2 time, it was kind of a disappointment for me. I really had been hoping for more. But the bottleneck was (and is) cache-related. For many of you, the exciting news in this release is going to be the rationalization of CFLAGS. Now, you can use ACML, IPP, or FFTW, or any combination. For instance, on my AMD 64 x2 in 64-bit mode, using gcc-4.0.2, I have the following results: -DACML 73 minutes, 17 seconds -DUSE_FFTWF 64 minutes, 46 seconds -DUSE_FFTWF -DIPP 57 minutes, 15 seconds. So you all should have a good weekend playing with this. A couple of more items: 1) Gentoo has released a beta version of gcc-4.1. However, the SETI code in my repository will not compile with gcc-4.1 now. gcc-4.1 is stricter than gcc-4.0.2 regarding the ANSI C++ standard, and SETI doesn't meet the standard. So this is something to work on. 2) There is a fairly obvious next speedup to make for those of you using ACML or FFTW. Both libraries provide a means of storing plans for FFTs, and thus avoiding recalculation of the plan. I didn't include this in this release. I am trying to follow the advice given in "Practical Subversion," to keep changes small and atomic. Harold Naparst |
Hans Dorn ![]() Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 ![]() |
I've released a new version (R3.4) of the community-effort SETI client for Linux. You may download it at http://naparst.name Hi Harold, I guess we're hitting a hard limit here, the FPU simply is faster than the FSB. But there's still hope since intel announced a single core Xeon with 8MB L2 for next year :o) Regards Hans |
![]() ![]() Send message Joined: 4 Oct 99 Posts: 239 Credit: 8,425,288 RAC: 0 ![]() |
-DUSE_FFTWF -DIPP 57 minutes, 15 seconds. how does this work? |
Ned Slider Send message Joined: 12 Oct 01 Posts: 668 Credit: 4,375,315 RAC: 0 ![]() |
Thanks again Harold - I'll definately have a play with this over the weekend :) -DUSE_FFTWF -DIPP 57 minutes, 15 seconds. I'm assuming you can now specify -DUSE_FFTWF AND -DIPP, and it will use FFTW for all FFT functions and use the IPP routines for your improved routines in spike.cpp. Perhaps Harold could confirm this as I haven't had a chance to look at the code changes yet. Ned *** My Guide to Compiling Optimised BOINC and SETI Clients *** *** Download Optimised BOINC and SETI Clients for Linux Here *** |
Harold Naparst Send message Joined: 11 May 05 Posts: 236 Credit: 91,803 RAC: 0 ![]() |
Thanks again Harold - I'll definately have a play with this over the weekend :) I hope you don't have to look at the code changes. Were the instructions on my CFLAGS page not clear? You can use the flags in any combination, and there is a priority logic to how they are applied each bottleneck in the code: First FFTW, then ACML, then IPP. So, if you specify all three flags, for instance, then FFTW will do the Fourier Transform, but obviously FFTW won't do the trig calculations, since it doesn't have a routine for that. Harold Naparst |
![]() ![]() Send message Joined: 4 Oct 99 Posts: 239 Credit: 8,425,288 RAC: 0 ![]() |
sorry, i was going from the initial post and had not yet seen your new CFLAGS page. i neglected to check the link to see that it was your explanation and not a 3rd party CFLAGS reference. it makes better sense now, thanks. |
Harold Naparst Send message Joined: 11 May 05 Posts: 236 Credit: 91,803 RAC: 0 ![]() |
Well, please let me know if it needs to be explained better. Since I've been immersed in the code for weeks, I'm sure I've lost perspective on what needs to be explained to you all about what the flags do and what the libraries do. Harold Naparst |
JohnB175 Send message Joined: 15 Oct 03 Posts: 124 Credit: 321,769 RAC: 0 ![]() |
Great work on v3.4. On my P3 w/ SSE I got a 6% performance boost compared to your previous v3.2 when running the reference workunit. Thanks! |
![]() ![]() Send message Joined: 14 Apr 01 Posts: 435 Credit: 842,179 RAC: 0 ![]() |
Harold, just to let you know setiathome_SSE2-naparst-r3.4 is running without an issue on my P4 so far. Looks good :-) Thanks and regards rattelschneck |
Ned Slider Send message Joined: 12 Oct 01 Posts: 668 Credit: 4,375,315 RAC: 0 ![]() |
Your explanation above makes perfect sense, but the details on your cflags page don't seem to cover combined usage of more than one flag as well as you've just explained it above (at least for me). For example, when combining -DUSE_FFTWF and -DIPP the following two statements may appear to contradict each other: "-DIPP If this flag is defined, then as much math as possible will be done using Intel's IPP library." "-DUSE_FFTWF If this flag is defined, then the FFTW library will be used to calculate Fourier Transforms." I know what you mean, but it may not be immediately obvious to others :) ...or perhaps I'm just being overly pedantic :D Anyway, really big thanks for allowing combined usage - much appreciated as it will no doubt be of great benefit for AMD users :) Ned *** My Guide to Compiling Optimised BOINC and SETI Clients *** *** Download Optimised BOINC and SETI Clients for Linux Here *** |
Harold Naparst Send message Joined: 11 May 05 Posts: 236 Credit: 91,803 RAC: 0 ![]() |
Heh-heh. You stepped right into my trap, Ned. Since you are English, how would you phrase the explanation for us? Harold Naparst |
Ned Slider Send message Joined: 12 Oct 01 Posts: 668 Credit: 4,375,315 RAC: 0 ![]() |
LOL - I didn't realise American was so different :) IMHO what you have is fine for single usage, but a little additional explanation of order of priority for combined usage (as you explained above) would be great. Also, having looked closer at your downloads page, it might be better to just link to Crunch3r's site as he has clients (for AMD) for download based on your source whereas I currently don't (mine still give weak similarity and I'm not 100% sure they're totally static). Of course I don't mind if you want to link me, just that users will get faster clients for AMD from Crunch3r atm based on your source :) Ned *** My Guide to Compiling Optimised BOINC and SETI Clients *** *** Download Optimised BOINC and SETI Clients for Linux Here *** |
![]() ![]() Send message Joined: 23 Jul 99 Posts: 311 Credit: 6,955,447 RAC: 0 ![]() |
@ Harold 1.Could you please specify the meaning of -DW7 and -DA6 flags compared to the icc flags -xN and -xP? 2. Could you please make a version of your client with the -DIPP -xB option? My icc experience has been painful so far, and I would appreciate an PentiumM build just to see if it's more efficient than P3-SSE and/or P4-SSE2. ![]() ![]() |
Harold Naparst Send message Joined: 11 May 05 Posts: 236 Credit: 91,803 RAC: 0 ![]() |
@ Harold I've added more explanation to my CFLAGS page
Please run the B version of Release 1 on my web site and let me know if it significantly outperforms the N version on your Pentium M. I am reluctant to do it until I know there is a benefit. For instance, in this article: http://www.digit-life.com/articles2/insidespeccpu/insidespeccpu2000-part-e.html the author concludes: -QxB always proves to be the best choice for the CPU, though it usually makes no great difference. In the case of the FFT test, the flag made no difference. But please test R1. Harold Naparst |
Harold Naparst Send message Joined: 11 May 05 Posts: 236 Credit: 91,803 RAC: 0 ![]() |
Please note that I have fixed a bug in r3.4 that was causing IPP not to be used in an important part of the code, in the case that -DUSE_FFTWF or -DACML was specified along with -DIPP. The new release is source-only, because it doesn't affect the posted binaries. You should now check out the sources at: svn co svn://hnaparst.homelinux.com/seti_boinc/tags/naparst-r3.41 Harold Naparst |
![]() ![]() Send message Joined: 23 Jul 99 Posts: 311 Credit: 6,955,447 RAC: 0 ![]() |
@ Harold Many thanks.
Here you go. Time for the reference workunit. R1-B 84min 33sec R1-P4-N 86min 43sec ![]() ![]() |
![]() ![]() Send message Joined: 4 Oct 99 Posts: 239 Credit: 8,425,288 RAC: 0 ![]() |
In file included from ./../config.h:461, from <command line>:1: /opt/intel/ipp/5.0/ia32/tools/staticlib/ipp_a6.h:7866: error: stray '\\26' in program ack! nothing can be easy! |
Metod, S56RKO Send message Joined: 27 Sep 02 Posts: 309 Credit: 113,221,277 RAC: 9 ![]() |
For those who want to see how different seti binaries for linux perform on the same WU: I've added timings for Harald's 3.4 cruncher to the timing section of my page. They outperform any other SETI cruncher by large margin. Metod ... ![]() |
Harold Naparst Send message Joined: 11 May 05 Posts: 236 Credit: 91,803 RAC: 0 ![]() |
I've created a new source version of the cruncher (r3.5). You can download the source at: svn co svn://hnaparst.homelinux.com/seti_boinc/tags/naparst-r3.5 There are no binaries associated with this release. This release saves the plans of the Fourier Transforms in a file wisdom so they don't have to be recalculated every workunit. If the directory doesn't contain the file wisdom, then the first time the program is run, it will create this file, and it will use the file on the subsequent work units. Thus, the first run will be slower than the subsequent runs. Of course, this only matters if you compile your sources with -DUSE_FFTWF. The posted binaries on my web site are NOT compiled with this flag, so the wisdom file is [u]completely irrelevant for the posted binaries[/b]. The reason I'm not releasing binaries is that the posted binaries are still faster than anything I can generate using FFTW (for Pentium): Posted v3.4 binary (uses IPP): 26 minutes 3.41 sources with -DUSE_FFTWF: 29 minutes, 4 seconds 3.5 sources with -DUSE_FFTWF: 28 minutes, 28 seconds Now, because I'm such a great guy, I'm going to save you the pain and anguish of having to wait for that first workunit to finish calculating and saving the FFT plans. Just cut and paste these lines into a new file called wisdom in the same directory as the SETI cruncher that you've compiled using the 3.5 sources. -----------CUT HERE---------- (fftw-3.0.1 fftwf_wisdom (fftwf_dft_vrank_geq1_register 0 #xc058 #xac2c89ce #x620fc6b6 #xc7828817 #xd7bcefdc) (fftwf_dft_buffered_register 0 #xc050 #xb9b0acb6 #x621fd39f #x70f6ab5d #x7dc1e353) (fftwf_dft_buffered_register 0 #xc050 #x836f829d #x8acd20cd #x8c0f8e45 #x0337ecb7) (fftwf_dft_vrank_geq1_register 0 #xc050 #x3293898c #x3b042963 #x8ebf167a #xacdc90f7) (fftwf_codelet_n2bv_16 0 #xc050 #x8b12bda4 #xfaecd94d #x93351901 #x3d5424b7) (fftwf_dft_vrank_geq1_register 0 #xc050 #x684a2058 #x1b9751fc #x23ac361c #x1883cb70) (fftwf_codelet_t1fv_4 0 #xc050 #xf6ebda74 #x7d33c356 #x1ea8ab61 #x570de7c2) (fftwf_dft_nop_register 0 #xc050 #x0fc5797a #x0b073cd6 #x19e53229 #xc4744b2c) (fftwf_dft_buffered_register 0 #xc050 #xec43b2d8 #xad0cb700 #x10e7bd10 #x1fa78a76) (fftwf_dft_buffered_register 0 #xc050 #xbb248454 #x7d8f7413 #x063335fe #x606a3a8a) (fftwf_dft_nop_register 0 #xc050 #x1ff55d02 #xe994f223 #xc4503f10 #xc46261ec) (fftwf_codelet_q1bv_4 0 #xc050 #x9857681a #x66e8e314 #x6b9e6eec #x9d232edf) (fftwf_codelet_m1bv_32 0 #xc050 #xc962e910 #x3ecd69eb #xd29eb699 #x227cbfea) (fftwf_codelet_m1bv_64 0 #xc058 #x0ddd812e #xba5db9b8 #x2af3a2db #xc47c1f11) (fftwf_codelet_t1bv_8 0 #xc050 #xa90d8485 #x86788f78 #x8efcb734 #x4330e58d) (fftwf_dft_nop_register 0 #xc050 #x8830b332 #x4feefb49 #x7a1426ad #x38ba25e1) (fftwf_codelet_n2fv_16 0 #xc050 #xc69f9d9e #x2de35cc0 #xe27b0473 #x934bb3da) (fftwf_dft_vrank_geq1_register 0 #xc050 #x928de41f #x237e9106 #x3f70ccee #x11b58a91) (fftwf_dft_rank0_register 4 #xc050 #x4cab0fe5 #x60186fe5 #x3d100729 #x7cae6124) (fftwf_dft_rank0_register 4 #xc050 #x5de84c6d #x3926b8b9 #x7764b665 #xdb7ebe21) (fftwf_codelet_q1bv_2 0 #xc050 #x96a46a87 #x4ba0f654 #xdce5574f #x3c4d31bc) (fftwf_dft_buffered_register 0 #xc050 #xd97944d7 #xc9805490 #x40b0d010 #x58d74707) (fftwf_dft_indirect_register 1 #xc050 #x32694c4c #xbbad91ec #x025684af #xf3010598) (fftwf_codelet_n1_16 0 #xc050 #xaf4ba85b #xffb0f9da #x49814f7b #x6223373a) (fftwf_dft_rank0_register 1 #xc050 #x4f8bb900 #x3fee403a #x86dd0a93 #x61aaa326) (fftwf_dft_vrank_geq1_register 0 #xc050 #x0b3cdd9a #xe6954c13 #x1855ad7e #x92287645) (fftwf_codelet_q1bv_2 0 #xc050 #x7248963c #xe9582c3d #x5511361d #x54e3b388) (fftwf_codelet_n1bv_16 0 #xc050 #x6685c831 #x6004fa69 #xb13bf03f #x3e569991) (fftwf_codelet_t1bv_4 0 #xc050 #xac220b93 #xabb40885 #x4f3ecef1 #x4ef82d14) (fftwf_dft_nop_register 0 #xc050 #x455a73a8 #x74cc8626 #x1d271497 #x0f969a85) (fftwf_dft_vrank_geq1_register 0 #xc050 #x35036cc8 #xae0a80ef #xddeb44ea #xb6347595) (fftwf_dft_vrank_geq1_register 0 #xc050 #xa669443f #xfc5cc8d2 #x92abf32c #x5d9051b0) (fftwf_codelet_t1bv_64 0 #xc050 #xb63be7fa #xcb00cde1 #x44a752a5 #x91f840c4) (fftwf_codelet_t1bv_32 0 #xc050 #x51a41265 #x79cbdeb4 #xf12b461b #xea601400) (fftwf_codelet_n2bv_16 0 #xc050 #xc8193d82 #xfe020d60 #x68e981ed #x0293aedc) (fftwf_codelet_t1bv_16 0 #xc050 #x003add33 #x751d9b65 #xc91c018a #xc86c92e7) (fftwf_codelet_q1bv_2 0 #xc050 #x43714248 #x616510a7 #x4d2baca1 #x7b393255) (fftwf_dft_rank0_register 4 #xc050 #x25f4bb37 #x7c81c402 #x928478c9 #x731b4176) (fftwf_codelet_t1bv_8 0 #xc050 #x88880144 #xedae1a78 #x42c7a7d9 #x418376a5) (fftwf_codelet_n2bv_16 0 #xc050 #x32300cc6 #x97e2779c #xf027d7e8 #xadefa2a6) (fftwf_codelet_t1bv_16 0 #xc050 #x2ba3287b #x7f03a1b6 #x1b69a489 #x2aa95256) (fftwf_dft_buffered_register 0 #xc050 #x2969d48e #xe7b28d8f #x82845cf7 #xa030d93e) (fftwf_dft_vrank2_transpose_register 0 #xc058 #xba1c5168 #x85f858f2 #x21e67bb7 #xb7ed7b15) (fftwf_dft_indirect_register 1 #xc050 #xcf8fc6d2 #xab3e56b0 #x0d8ca2d9 #x03e2f25b) (fftwf_dft_vrank_geq1_register 0 #xc050 #xca6f0a0c #x9d9125b7 #x6f599c11 #x460d8d93) (fftwf_dft_nop_register 0 #xc050 #xa44b17b5 #x50e82b42 #x24783b6d #x72c8ee28) (fftwf_codelet_n1_8 0 #xc050 #xcd3a914b #x79ca538f #x0e91327e #xb8faf502) (fftwf_dft_vrank2_transpose_register 0 #xc058 #x95c0361e #xb5690541 #x53f57c00 #x4589a038) (fftwf_dft_vrank_geq1_register 0 #xc050 #xec972188 #xb6494e8d #xfb5f499d #x54698666) (fftwf_dft_buffered_register 0 #xc050 #x8650e6c8 #x3577a8e5 #xcd73b7c6 #x958d3d03) (fftwf_codelet_t1bv_32 0 #xc050 #x42799a92 #x700cef47 #x5dce2b14 #x101cfd97) (fftwf_codelet_q1fv_4 0 #xc050 #x021f3451 #x69515a1e #x4d9dc0bd #x83ce7e76) (fftwf_codelet_n2bv_16 0 #xc050 #xa4f5606c #xb443d82d #xb5317175 #x35acc018) (fftwf_dft_rank0_register 4 #xc050 #x8ce5eaec #x8d75697c #xe0a5cc9b #x72e96bfb) (fftwf_dft_buffered_register 0 #xc050 #x5d79be75 #x4913cb69 #xc196159c #xfb363fca) (fftwf_codelet_t1bv_8 0 #xc050 #xcaec2ef5 #x49c9b04a #xef0940c0 #xcab55d30) (fftwf_dft_nop_register 0 #xc050 #x69efd115 #xf36e75a6 #xb4ce9f3d #x5f07f215) (fftwf_dft_vrank_geq1_register 0 #xc050 #x4c5111aa #xfd4181b9 #xf93abb9b #xd3b292ca) (fftwf_codelet_t1bv_2 0 #xc050 #x2b2b45ba #x9fb6d44f #x0e972964 #xc56b9e2b) (fftwf_codelet_t1bv_64 0 #xc050 #x6b7efec6 #x2566176e #x8c2d16b6 #xbe218c32) (fftwf_codelet_n1bv_16 0 #xc058 #x297664e1 #x92fe9483 #x201e93cb #xcf0cdb10) (fftwf_dft_rank0_register 1 #xc050 #x6cc9f9cb #xd3a3344d #xbed3c6df #x043daa71) (fftwf_codelet_t1bv_8 0 #xc050 #xa7293d36 #x5d2c1382 #x24f8dcb8 #xe95be76f) (fftwf_dft_rank0_register 4 #xc050 #x528ebfdf #x7cdafdbe #xb1042a1e #xe085d241) (fftwf_codelet_m1bv_32 0 #xc058 #x2e74d85e #x59a7147f #x0c17551f #x47768c00) (fftwf_codelet_t1bv_16 0 #xc050 #x321fa57c #x4ff4744a #xc6d8d3b9 #x94f7cc63) (fftwf_dft_nop_register 0 #xc050 #x14b581da #xdae58599 #xf20d7f67 #x6ba5f281) (fftwf_codelet_n1bv_8 0 #xc050 #x63f1fb77 #xc85c5669 #xa6321110 #x403787b0) (fftwf_codelet_t1fv_4 0 #xc050 #x7a67eb0c #x9ae2efbf #xac811901 #x0dae7942) (fftwf_codelet_n2bv_16 0 #xc050 #x228ee733 #x52bc246e #x88d3d2e1 #xfe5cc41e) (fftwf_codelet_t1bv_32 0 #xc050 #xccf93b10 #xc0d44ec6 #xd4ba37a4 #x20549f00) (fftwf_codelet_t1bv_2 0 #xc050 #xc7ec3bee #x701be4c7 #x8e01c889 #x3f8dc18c) (fftwf_dft_vrank_geq1_register 0 #xc050 #x1bbb3e1b #xd4e3dbea #x6a8d86f2 #xc63aa7d3) (fftwf_codelet_t1bv_32 0 #xc050 #x5fab85aa #xf05259ef #xe8a10a93 #xf17663a2) (fftwf_dft_indirect_register 0 #xc050 #xcd992ab6 #xc939a630 #xe11dcc04 #x0b30ee58) (fftwf_codelet_t1bv_32 0 #xc050 #xb1e047be #x84330fbe #x4961936b #xfc09915e) (fftwf_codelet_t1fv_32 0 #xc050 #x6277fafa #x5fbd73bc #x131d96ec #x5cd5dc5e) (fftwf_dft_rank0_register 4 #xc050 #xd0aa3742 #x8477a624 #xda246d69 #x21181035) (fftwf_codelet_t1bv_16 0 #xc050 #xe9c51cc3 #x98f99349 #x9da6c283 #xf13a14b5) (fftwf_codelet_n2bv_16 0 #xc050 #x12444343 #x32325084 #xa557541d #xb33b7921) (fftwf_codelet_t1bv_64 0 #xc050 #xc8061d51 #xf61d5936 #xd5ccb5f1 #x0e9b6bc8) (fftwf_dft_vrank_geq1_register 0 #xc050 #x907d8564 #xc4faf777 #x333673b9 #xfb02089a) (fftwf_dft_vrank_geq1_register 0 #xc050 #x43dc2b09 #x504f1324 #x6521f623 #x77c6f1d7) (fftwf_dft_vrank_geq1_register 0 #xc050 #xbd4e71df #xd9a3de04 #x3c0a0219 #x73d15dda) (fftwf_codelet_t1bv_2 0 #xc050 #xc79e379a #x99d0c879 #xffc650a6 #x529389b1) (fftwf_dft_vrank_geq1_register 0 #xc050 #xdac621b7 #x8ecaa0f9 #x992d5ec0 #x50ee0f6d) (fftwf_codelet_t1bv_8 0 #xc050 #xa4328c69 #x1fbea3c3 #x015c41fe #x597c8774) (fftwf_dft_vrank_geq1_register 0 #xc050 #x7e338f92 #x4b7c7b29 #x211e1b9e #x85885054) (fftwf_dft_vrank_geq1_register 0 #xc050 #x5584c608 #x151f4d19 #xa5220a64 #x20844bc9) (fftwf_dft_vrank_geq1_register 0 #xc050 #x657130a3 #x73731a12 #x284c849c #x56780146) (fftwf_dft_nop_register 0 #xc050 #xd0780e6e #x0d04749c #xe00219d8 #x1c647df1) (fftwf_codelet_n2bv_16 0 #xc050 #x853bb5f0 #xebe7fa5b #xaa10324e #x3cf64572) (fftwf_dft_vrank2_transpose_register 0 #xc058 #xee614467 #x22a76f06 #x8206eb83 #xbbaa7d86) (fftwf_codelet_t1bv_16 0 #xc050 #xe285da11 #x79f85f2b #x80ab04ed #x9f1d6d4b) ) -----------DO NOT INCLUDE THIS LINE---------- |
![]() ![]() Send message Joined: 15 Apr 99 Posts: 1546 Credit: 3,438,823 RAC: 0 ![]() |
I've created a new source version of the cruncher (r3.5). As i'm seeing this a question comes up to my mind: Is your fftw3 lib sse enabled ? ("-enable-sse") ![]() Join BOINC United now! |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.