Message boards :
Number crunching :
Contributing code? Amd64 build for Windows
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
> Paul, > It is our team's intention to make this applicable to all platforms. I > assume (but can't speak for Ben) that that Ben's SSE optimizations also > translate to Altivec instructions for Macs and he's also intending (if I > remember earlier posts correctly) other CPU support. Cool! > As for the number of significant digits, I find that 12-13 is as far as we > can really push it. Given Ben's statements about sampling and BioScience > (P@H) starting with high-def data, I think we're in good shape if we can pull > in 32 bit for now, operate in 64 or 80 bit resolution (using 128 bit to store > the 80 bit interim results), and then output 32 bit again with significantly > reduced roundoff error(s). This is where I believe the science (as a whole) > takes the greatest leap forward. Yeah, that is about what I would expect. I am hoping to move into the faster machines too ... :) > If Team MacNN has optimizations for Altivec, etc... can we somehow find a > way to get all this integrated? We now have Ben's work, our work, > Francophone's work, and MacNN's work.... and I am sure COUNTLESS others > working on the same thing. How do we pull this together for the good of the > science as a whole??? I am open to direct email (you have my address), so > please... let's make this happen. I just looked and can not find the end to the string. I can find the download pages where they have the compiles available, but there is no tying to a person. To be honest, I would have joind the team, but I cannot make heads nor tails out of the web site ... > We are about to start final testing on Win32/Win64/Linux32/Linux64... cpus > Pentium, P2, P3, P4, and all the AMDs available on in all possible > configurations possile because they are simply the most readily available to > us from all the volunteers of the team. If we have some Macs, then it will be > tested there, but I don't know of any of of yet. Well, I have one ... but it is my main work platform so, I am not sure if I can unequivically jump in as a volunteer ... especially as I am ALREADY way behind on updates to my site... > Our goal when we started this was to bring everything up to 64 bit. I > personally am working on getting a couple SGIs and SUNs loaned to me as well > just to ensure that big and little endian machines all behave the same. These > will be added insurance for portability and adherence to the Cobblestone > model. > > I personally would like us (TPR) to present to SSL software that is solid > and easily integratable into the main stream (integrated with Ben's if > possible), tested to their standards, and then released per their license and > GPL requirements. It makes the most sense to let BOINC-Dev be the focal > point for final integration and release to the general public through their > standard channels. They are afterall fundamentally responsible for the > project. I did suggest in the beta that this was something that should be done. Along the same line as the "related" sites and Alpha/Beta testers UCB would host the binaries. So, we would have the code updates being baselined, selected volunteers would then compile the system into optimized binaries. Second alternative is to have this as a hosted feature, like what is done in the BOINC download network. Though I have not visitied them and don't know what they have available. I mean, I would not mind making a compile, but I do not have the time to debug the scripts. And my rumors say that it is not necessarily a slam dunk for the Macintosh code at this time. So, I don't have good answers for you... just more questions ... I was hoping the rabid (I am saying it with a smile) speed demons would have already started doing this publically. I mean, it might be going on, and I am just not aware of it (no surprise there). > Does this answer your questions? Yes :) ... warm fuzzies all over! <p> For BOINC Documentaion: Click Me! |
slavko.sk Send message Joined: 27 Jun 00 Posts: 346 Credit: 417,028 RAC: 0 |
> PS: Yes, it's 5am... I had an idea on the benchmarks and it was easier to > code than write it down, and it does fix an issue I had with register > allocations on the difference processor families... :) Whoaw! Chuck, if you need some tester (me) for Win64/AMD64 keep me posted, drop a message ... slavko@slavko.sk. [b]S@h Berkeley's Staff Friends Club © member |
WildWeasel Send message Joined: 2 Jun 99 Posts: 5 Credit: 485,315 RAC: 0 |
Chuck et al, While you're doing this magnificent work, are you comparing the run times on code compiled by different compilers? I saw the Intel compiler mentioned together with the MS one... The Weasel |
Chuck Lasher Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 |
> Chuck et al, > > While you're doing this magnificent work, are you comparing the run times on > code compiled by different compilers? > > I saw the Intel compiler mentioned together with the MS one... > > The Weasel > > > > > > Yes, and I am even looking at the generated code and comparing as much as possible... but I question whether that is a rather moot point now or not. There have been changes submitted to Boinc-dev and implemented into 4.09 (or 4.10) now. Until our team discusses all this, we won't be making any further posts. Chuck Team Phoenix Rising. |
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
Ok updates. There is now an official optimization group mailing list from boinc (seti). Go to the home page for the link. If you are a programmer, I think this is where you want to be. --- Verification: Write the code - Compare the output to original routine's output. Determine margin of error. Verifying a compiled client's output. 1. Put the original seti client .exe in a directory with a given WU. 2. Rename the wu to "work_unit.sah". 3. Start the original client. 4. Let it finish. Now there is a new file in the folder called "result.sah". 5. Copy this file away and save it. 6. Put new client in same directory. 7. Erase any "result.sah" stderr.txt state.sah files 8. Run new client. 9. Let it finish. 10. Use a file compare or 'diff' program to compare the new output to the original. ----- 64 bit code. The 64 bits applies to integers, as in standard registers. The rest of the code around the floating point parts will go somewhat quicker. When working with floats you still have the FPU's 32 bit single, 64 bit double stored in 80 bit internal floating registers. When working with SIMD, you have 3DNow and SSE for 32 bit floats, SSE2 and SSE3 for 64 bit floats. Altivec is 32 bit float SIMD. --- The conversions I've performed in able to make the SSE code work will be of benefit to Altivec and other SIMD programmers, but I haven't written Altivec routines yet (nor have access to Mac for testing). Brad Anderson (Mr. Anderson to you all ;) has written Altivec routines for the most active floating point routines in seti_boinc. He has compiled the application for 3 flavors of Mac and released them on his website. This is covereed in another thread here in "crunch" started by him. |
Chuck Lasher Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 |
> Ok updates. > > There is now an official optimization group mailing list from boinc (seti). > Go to the home page for the link. > > If you are a programmer, I think this is where you want to be. > --- > Verification: Write the code - Compare the output to original routine's > output. Determine margin of error. > > Verifying a compiled client's output. > 1. Put the original seti client .exe in a directory with a given WU. > 2. Rename the wu to "work_unit.sah". > 3. Start the original client. > 4. Let it finish. Now there is a new file in the folder called "result.sah". > 5. Copy this file away and save it. > 6. Put new client in same directory. > 7. Erase any "result.sah" stderr.txt state.sah files > 8. Run new client. > 9. Let it finish. > 10. Use a file compare or 'diff' program to compare the new output to the > original. > > ----- > > 64 bit code. The 64 bits applies to integers, as in standard registers. > The rest of the code around the floating point parts will go somewhat > quicker. > > When working with floats you still have the FPU's 32 bit single, 64 bit double > stored in 80 bit internal floating registers. > > When working with SIMD, you have 3DNow and SSE for 32 bit floats, SSE2 and > SSE3 for 64 bit floats. > > Altivec is 32 bit float SIMD. > --- > The conversions I've performed in able to make the SSE code work will be of > benefit to Altivec and other SIMD programmers, but I haven't written Altivec > routines yet (nor have access to Mac for testing). > > Brad Anderson (Mr. Anderson to you all ;) has written Altivec routines for > the most active floating point routines in seti_boinc. He has compiled the > application for 3 flavors of Mac and released them on his website. > > This is covereed in another thread here in "crunch" started by him. > > Thank you for the input. I shall relay the information to the team Mr Herndon. :) Sincerely, Chuck Lasher |
audiforum.nl Send message Joined: 3 Dec 03 Posts: 7 Credit: 427,391 RAC: 0 |
Any news? My 64bit is begging me to get a 64 clien of boinc ;) |
sniperbait Send message Joined: 15 Feb 04 Posts: 67 Credit: 56,828 RAC: 0 |
mine too <a href="http://usa.duane-n-lisa.net"><img src="http://usa.duane-n-lisa.net/signature.php?id=7654"></a> <IMG SRC="http://boinc.mundayweb.com/seti2/stats.php/userID:1028/trans:off/.png"> |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
I'd love to see a 64 bit Seti@Home application written that also supports the free AMD Core Math Library. http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_869_2282,00.html |
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
I've looked a little more into AMD 64bit since my last entry. There is one other major advantage to AMD 64bit. [pre] Intel/Athlon XP 64Bit Integer Registers 7 + stack 15 + stack Floating Stack regs 8 8 shared w/MMX SSE regs 8 16 [pre] So with 8 more regular integer registers, a good compiler could avoid several register to memory saves/loads, and thus fewer instructions. Here is a good overview: http://www.cpuid.com/K8/index.php |
Chuck Lasher Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 |
Ben, If I may add to that.. the FX class (Sledhammer and above cores, possibly more as processors are announced) is the FXSAV (?) instruction. It is a high speed special register save instruction for doing high speed context switches. I can see it being used really nicely. I can also see the use of the SSE regs for non-SSE math by simply writing macros which do __ASM directives.... That link is a good one.... I've pointed it out to a few people myself. Every time someone asks what the difference between the AMD64 and P4 is... I send tem there. I am curious if Intel has formally announced what is going to be in the P5 and P6 architectures or if those were originally slated to be the 'now-sidelined' 32 bit cores. Any idea? I ask because I am puttting together a table of processor capabilities, based on George Woltman's and Jean Penne's code for primes that taylor runtime ops based on processor features ... specifically CMOV, SSE, SSE2.. Those make a big difference in FFTs which I believe would help seti a great deal if done properly. The nice part is that it would be Intel/AMD independent detection and feature management.... totally transparent to the user. just a few runtime 'if()'s in the code at critical places. I've been working on the the 64-bit implementation of prime factoring. The math is nice and clean and gives a good opportunity to learn how to deal with a huge variety of FFT array sizes for all processors support by BOINC/Seti. I hope to bring this experience back to the project when complete. I also hope to see it put to use in the proteins & cpdn work (all fortran). Both those projects would benefit huge amounts with 64 bit math and fast 64 bit based FFT work. The error factor cuts way down and performance comes way up. Both things that we need to start preparing for as Intel gets closer (hopefully soon) to announcing & rolling out its 64 bit processors to the user community. I think we should be ready for it so AMD and Intel users can really push all boinc projects forward. Your thoughts? Chuck |
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 |
Chuck, Join the boinc_opt mailing list. Sounds like right up your alley. Eric Korpela (main programmer for SETI@home) is working on some things which will should eventually make seti (at least) quite a bit faster for many platforms. (Not related to the 4.05 issue, although he is aware of that) Personally I have written SSE and 3DNow versions of the Oourda FFT code and several other routines in seti, others have implemented seti with the FFTW code, and others still have written Altivec. Eric has access to all all of these sources. Until then I suggest you look into FFTW and the history of emails in boinc_opt mailing list. Look for it on this page: http://boinc.berkeley.edu/community.php Regarding ALL of the other boinc projects... Their worker source is currently not public, and I've read nothing about any intentions to make them so. I know that CPDN at least has proprietary info in their source and are very unlikely to release it. So I don't know if they use FFT or what math functions they use. They might use optimized libraries (ie Intel SSE, AMD 3Dnow, Apple Altivec), or might not. I've seen no mentions of it. |
Chuck Lasher Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 |
Ben, You have my address, etc on the boinc_opt list.... would you please email me privately and we can talk about this when you have time? I have some NASTY 386 dedicated assembler that will not handle cache's greater than 8k (P4) nor will it handle FFT's greater than 131072. Also, being "Masm", won't slide into GCC very well, nor use the extended FFT or FPU capabilities of the FX class machines (which AMD has and Intel is about to have). What I need is to replace all the FFT ASM (currently done LONGHANG in C or 386 ASM) with some good code like your SIMD class and finish getting automake setup up (almost done there). It would make bringing the Altivec's into the projects a SNAP... and we both know what they will do! May we discuss and share ideas and code? I have a few things I am fixing for primes that Seti can use and You are the man to implement it. I will look at the FFT code you've referenced again.. Oourda sounds very famiiliar. I've seen so much 'brute force FPU' FFTs recently for Linux, etc that I am overwhelmed and dont remember actual equations from memory. Regarding CPDN... they are proprietary, which I do respect. I also sense in the tone of email text that the model is fragile and old.. and also respect that only a few touch it. Predictor is part public, part proprietary (NDA is all that's required). It will benefit greatly from FFT and 64 bit. Let's chat offline and see where we can help each other. I'll show you some great ones i've got already for primes that are about to hit suse 9.1 64 bit linux (as brute force in rev 1) and some wild little macros that cut out a great deal of FFT ASM. ALL of which I'm sure you've done or seen before. As of tonight, I am switching this FX-53+ to dual boot (but staying mostly in Suse 9.1 -64) and and leaving the A-64 754 pin machine in win/32. I do have Suse 9.1 Pro (32 and 64 bit versions on flip/flop dvd). Write direct when you can, (Rom can give you my email if need be... permission granted as required) Chuck |
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
> > Personally I have written SSE and 3DNow versions of the Oourda FFT code and > several other routines in seti, others have implemented seti with the FFTW > code, and others still have written Altivec. Eric has access to all all of > these sources. > > Until then I suggest you look into FFTW and the history of emails in boinc_opt > mailing list. > Hi Ben! I hope you don't mind if I jump in here. I've created a seti client that makes use of fftw3f instead of ooura, but I didn't have any luck with it. When I run it against the test WU, the found spikes look reasonable, but it detects different gaussians and fails... Does that sound reasonable, or should I look for bugs in my code. P.S: I've also played around with SSE and created a ooura version that does 4 FFTs in a row. It's reasonably faster the the original one (50%), but all the required shifting around of the data eats up the saved time, and I end up with exactly the same performance as with the standard fft... Could you give me some hints how your SSE version looks like? Regards Hans |
Sir Ulli Send message Joined: 21 Oct 99 Posts: 2246 Credit: 6,136,250 RAC: 0 |
dont know if this was postet before, but there is a Side with Severall optimissed Clients for Linux, not test yet http://boinc.us.tt/ Greetings from Germany NRW Ulli S@h Berkeley's Staff Friends Club m7 © |
Sir Ulli Send message Joined: 21 Oct 99 Posts: 2246 Credit: 6,136,250 RAC: 0 |
dont know if this was postet before, but there is a Side with Severall optimissed Clients for Linux, not test yet http://boinc.us.tt/ Greetings from Germany NRW Ulli S@h Berkeley's Staff Friends Club m7 © |
Sir Ulli Send message Joined: 21 Oct 99 Posts: 2246 Credit: 6,136,250 RAC: 0 |
dont know if this was postet before, but there is a Side with Severall optimissed Clients for Linux, not test yet http://boinc.us.tt/ Greetings from Germany NRW Ulli S@h Berkeley's Staff Friends Club m7 © |
Sir Ulli Send message Joined: 21 Oct 99 Posts: 2246 Credit: 6,136,250 RAC: 0 |
dont know if this was postet before, but there is a Side with Severall optimissed Clients for Linux, not test yet http://boinc.us.tt/ Greetings from Germany NRW Ulli S@h Berkeley's Staff Friends Club m7 © |
Hans Dorn Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0 |
> dont know if this was postet before, Yep several times ;-) > but there is a Side with Severall > optimissed Clients for Linux, not test yet > > http://boinc.us.tt/ > Thanks, but I'm rather looking for sources for the seti client, not the boinc client. Regards Hans |
Sir Ulli Send message Joined: 21 Oct 99 Posts: 2246 Credit: 6,136,250 RAC: 0 |
also a good and intersting Side http://www.ssl.berkeley.edu/pipermail/boinc_opt/2004-October/date.html original side http://www.ssl.berkeley.edu/mailman/listinfo/boinc_opt Sorry for double posting, but i have Probs ... no responding... Greetings from Germany NRW Ulli S@h Berkeley's Staff Friends Club m7 © |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.