Message boards :
Number crunching :
Optimized Linux clients for P4(and M)
Message board moderation
Author | Message |
---|---|
Tetsuji Maverick Rai ![]() Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 ![]() |
ahhhh...I did it again...(but this is my main purpose. Linux is my primary OS and Intel compiler/IPP library for Linux are free!) I've made optimized clients for P4 (sse2 and up to sse3) using IPP library instead of FFTW, and with PGO (profile guided optimization) with Intel C++ Compiler. I asked Metod for benchmark test (sse3 version) and he told me it took 4603 seconds while the official 4.02 took 9245 seconds to crunch the reference work unit. app_info.xml is also included. Be careful, if it doesn't work, you may lose/damage your data. So I recommend you to make a backup copy of the whole directory. And app_info.xml included in this .tar.gz is the one I've been using for more than a month, but at first you may lose your workunits when you switch the version of the client. Don't blame me.....I hope someone will make an improved app_info.xml like the one for Windows.... I hope Metod will make a complete set for every Intel processor. The latter may work for Athlon64, but I'm afraid it's slow on AMD processors, because IPP is aggressively optimized for Intel processors. As for IPP, see this page. BTW version number is 4.07. And in addition, you will have my name in the result section like this But these versions are linked with version 4.1. version 5.0beta is not distributable (and the speed is the same.) For SSE3 seti-lin-sse3-r7.tar.gz For SSE2 (P4 Pentium M also) seti-lin-sse2-r7.tar.gz These are freely redistributable. Put them wherever you like on the Internet. hmmmm blacklist.... Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ![]() |
Tetsuji Maverick Rai ![]() Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 ![]() |
PS: Again, I should write this---- Disclaimer These binaries are provided as is. No guarantees about proper operation can be given. Only you will be to blame if binaries cash your computer, damage data or hardware, kick your dog or insult your mother. From Metod's page. You may want optimized core client from his page. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ![]() |
Tetsuji Maverick Rai ![]() Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 ![]() |
And for those who are interested in source (patch) files, you can download patch files and simple readmes (just my personal memo's and may contain several minor errors:) ) from here seti-ipp-patches-r7.tar.gz. Four files were derived from 2005-01-01 build, I modified and put them over 2005-05-08 nightly builds (boinc_public and seti_boinc) BTW according to my "inaccurate" benchmark test, mine (SSE3) took 7122 seconds to crunch the reference work unit, while official 4.02 took 16961 seconds....so mine is 2.38 times as fast as the official one according to my "inaccurate" benchmark. Though my benchmark is inaccurate, I'm sure this is faster than fftw. This is about 8% faster than fftw cruncher. I tried mkl's fft also, and found it's faster than fftw, but slower than ipp. That's the reason I use ipp. And the biggest reason I use ipp is I found IPP's fft was very fast in single-precision, complex 1d data (powers of two) in this page. IPP is faster than fftw. I confirmed Intel is a software company as well. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ![]() |
![]() Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 ![]() |
|
Tetsuji Maverick Rai ![]() Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 ![]() |
I have the sse2 version running right now. Predicted to be under 1 hour 45 mins....which is an hour faster than Metod's I was using. I'll let you know the final time when it completes and whether it validates. Thanks. I bet it works fine ;) I've been using this (revising almost everyday) for more than a week. And I also applied this to Windows source tree, made SETI@home clients for P4 (SSE3 and SSE2), and they've been working perfectly on my computers. But....the biggest bottleneck with Window$ is...licensing problem again!! When it comes to Window$, some problems (usually money) come up. I am using IPP 5.0beta for Windows, which is available until the end of this year (beta testing period), but it's not redistributable. See this thread about the licensing problem with IPP 5.0beta. For Linux, IPP 4.1 is free for non-commercial use and the binary is freely redistributable. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ![]() |
Ned Slider Send message Joined: 12 Oct 01 Posts: 668 Credit: 4,375,315 RAC: 0 ![]() |
Great job Maverick :) You've almost made the P4 look like a respectable processor!! :D I think we have the athlon64 issues solved now so we should have an AMD64 linux client up by the end of the week too. Ned *** My Guide to Compiling Optimised BOINC and SETI Clients *** *** Download Optimised BOINC and SETI Clients for Linux Here *** |
Tetsuji Maverick Rai ![]() Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 ![]() |
Great job Maverick :) Thanks!! But it's not me but Intel that makes P4 respectable :) It has nice support even to customers having free non-commercial licenses like me ;) I have a single license for Intel compiler for windows (thanks to Mr.X), but I've never got support on it; I always get support for free products :) SIMD is a very nice set of instructions. This time I've learned about it and found it very useful and guess it should be very difficult for gcc/g++ to make full advantage of it. So Intel as a processor company can make a compiler very good at vectorization utilizing SIMD. But in some cases, assembler is better; I put a hand-assembled code in analyzeFuncs.cpp to make it faster. If AMD made a compiler, it might make a nice one to utilize SSE/3dnow, and so on. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ![]() |
Ned Slider Send message Joined: 12 Oct 01 Posts: 668 Credit: 4,375,315 RAC: 0 ![]() |
Yes - my point was that without optimized clients, the P4 performs like a dog at seti given it's clock speeds :) A Pentium M or Athlon XP gave far more bang for your buck for the prospective SETI'er, but the optimized clients have made the P4 look far more respectable than they did before. You made the clients, so you get the credit!!! (you're far too modest because I actually know what you did with IPP!!) Anyhow - great work!! Ned *** My Guide to Compiling Optimised BOINC and SETI Clients *** *** Download Optimised BOINC and SETI Clients for Linux Here *** |
Tetsuji Maverick Rai ![]() Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 ![]() |
Yes - my point was that without optimized clients, the P4 performs like a dog at seti given it's clock speeds :) I got what you meant. Thanks! (but again Intel did it ;) ) Now I'm downloading VTune performance analyzer from Intel...for free for Linux again. Intel says it finds the bottleneck(s) of an application. BTW I'd never thought Pentium M is so fast until I moved from seti-classic to boinc in April. EDIT: I took a loot at AMD homepage just now, but I didn't find any software development tools....hmmmm that's a weak point of AMD...inline assembly of SIMD may help....but it will be tough. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ![]() |
![]() Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 ![]() |
|
ampoliros ![]() Send message Joined: 24 Sep 99 Posts: 152 Credit: 3,542,579 RAC: 5 ![]() |
I am such a complete doof... Downloaded your new binaries, tried to run them but could not, error, error, error. If I had just read your post carefully: ...these versions are linked with version 4.1. Anyway, am going to install IPP from intel and give it another try. I'll let you know if I've gotten any smarter. ![]() 7,049 S@H Classic Credits |
Tetsuji Maverick Rai ![]() Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 ![]() |
I am such a complete doof... You don't need IPP yourself. It's already been linked statically, though standard glibc is linked dynamically. At first make a test directory, put the seti client and a workunit renamed to "work_unit.sah" (the standard reference workunit is available [url=http://www.geocities.jp/badtrans666/boinc/referenceunits.zip]here<a> along with the result.sah), then run the client with...for ex, "nice -n 19 ./seti*" and see what happens. You don't have to run it until it finishes. Just watch it for 1 or 2 minuites. If you get errors, please post the error message, with the content of stderr.txt. If you don't get errors, then the client is working on your machine. Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ![]() |
Tetsuji Maverick Rai ![]() Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 ![]() |
|
![]() Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 ![]() |
|
Tetsuji Maverick Rai ![]() Send message Joined: 25 Apr 99 Posts: 518 Credit: 90,863 RAC: 0 ![]() |
Hey Tetsuji....have you produced a PIII no sse variant yet? here hehe.. But sse/mmx versions are just byproducts. hehehe...I found a nice thing. This cruncher is faster than my Windows fftw version; Look at this page (benchmark test, P4 3.2G in the lower part). My windows 4.11 took 4927 seconds, while my new Linux IPP client took only 4603 seconds on the same machine as I wrote in the first message in this thread :) And that, this client is completely free!! Luckiest in the world. WMD = Weapon of Mass Distraction. Click this table. ![]() |
ampoliros ![]() Send message Joined: 24 Sep 99 Posts: 152 Credit: 3,542,579 RAC: 5 ![]() |
I could have sword that glibc had been installed on that computer, it's not like the seti app would be the only thing that requires it. I went back home last night and installed IPP on another computer before I read your post. Your client ran so I set the clients to run the refernece unit through cron (I estimated completion times) to test all three clients out (standard, Metod, Tetsuji). It wasn't until after doing that and coming to work that I read your post so I couldn't check if it had glibc installed, but I would assume it does. Preliminary readings looked very good and I will post the results for that PC when I get home. As for the other computer, I haven't figured out why your client did not run, but I will try again and post any errors that I get. ![]() 7,049 S@H Classic Credits |
![]() Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 ![]() |
Hi Tetsuji. I have to say this has transformed my Linux box....from a dog to a tiger! It has knocked 50-60 minutes off the Metod client I was using. I have looked at 15 results now and its consistent! First class! The results are being acepted too although I only got about 3/4 in so far as servers are a bit flakey right now. Great peice of work....and the little sig in the stderr !!!!!! hehe Regards Ian Edit - no i got 7 valid results with a few pending. ![]() |
ampoliros ![]() Send message Joined: 24 Sep 99 Posts: 152 Credit: 3,542,579 RAC: 5 ![]() |
Ok I ran the standard client (setiathome_4.02_i686-pc-linux-gnu) vs Metod's Client (setiathome-4.07-fftw.i686-p4-linux-gnu) vs Tetsuji's new client (setiathome-4.7.i686-pc-linux-gnu, SSE2) and here's what I got. Both Metod's client and Tetsuji's client were found to be valid and Tetsuji's came out as the fastest. Pentium4 Willamette, 2.0GHz, 256K L2 100MHz FSB, 768MB PC800 RAM 2.6.8 Kernel Standard: 17,121 sec Metod: 9944 sec (-42%) Tetsuji: 7341 sec (-57%) I'll set my clients to get no new work and then switch to the new client when they are empty (I would just edit the app_info.xml to add another listing but I'm not sure if you can have two listings of the same version.) ![]() 7,049 S@H Classic Credits |
ampoliros ![]() Send message Joined: 24 Sep 99 Posts: 152 Credit: 3,542,579 RAC: 5 ![]() |
Oops, picked up the xml tags... Lets try again. Anyway, what I would like to do is use an app_info.xml file to transition the work units but I don't know if I am allowed to have multiple versions. I'd like to get a second opinion on this before I or anyone else tries this thing. This one should theoretically move the standard client, all of the Metod clients and all new work units to Tetsuji's client. And now for my disclaimer: This app_info.xml is untested and more than likely won't work I just want another opinion. <app_info> <app> <name>setiathome</name> </app> <file_info> <name>setiathome-4.7.i686-pc-linux-gnu</name> <executable/> </file_info> <app_version> <app_name>setiathome</app_name> <version_num>407</version_num> <file_ref> <file_name>setiathome-4.7.i686-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome</name> </app> <file_info> <name>setiathome-4.07-fftw.i686-p4-linux-gnu</name> <executable/> </file_info> <app_version> <app_name>setiathome</app_name> <version_num>407</version_num> <file_ref> <file_name>setiathome-4.7.i686-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome</name> </app> <file_info> <name>setiathome-4.07-fftw.i686-p2-linux-gnu</name> <executable/> </file_info> <app_version> <app_name>setiathome</app_name> <version_num>407</version_num> <file_ref> <file_name>setiathome-4.7.i686-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome</name> </app> <file_info> <name>setiathome-4.07-fftw.i686-pc-linux-gnu</name> <executable/> </file_info> <app_version> <app_name>setiathome</app_name> <version_num>407</version_num> <file_ref> <file_name>setiathome-4.7.i686-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome</name> </app> <file_info> <name>setiathome_4.02_i686-pc-linux-gnu</name> <executable/> </file_info> <app_version> <app_name>setiathome</app_name> <version_num>402</version_num> <file_ref> <file_name>setiathome-4.7.i686-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> </app_info> Wow, that still looks bad, well if anyone wants the real one send me an email: [my user id] at hotmail ![]() 7,049 S@H Classic Credits |
ampoliros ![]() Send message Joined: 24 Sep 99 Posts: 152 Credit: 3,542,579 RAC: 5 ![]() |
OK, I decided to go ahead and plug things in without waiting for the cache to empty. I shut things down, and after a backup, I put in the new client and that mess of an app_info.xml file and everything worked. BOINC CC is happy, Tetsuji's seti app is happy and ps confirms the new app is running. Work units are flowing. ![]() 7,049 S@H Classic Credits |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.