Message boards :
Number crunching :
Contributing code? Amd64 build for Windows
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 21 Oct 99 Posts: 2246 Credit: 6,136,250 RAC: 0 ![]() |
> Hi all, > > I have compiled versions of seti and seti_boinc for linux and > visual C++. > > There are bugs in the benchmark code. I have forwarded the changes to > the devel community and they are being added, though probably not > released to public for a bit yet. > > The bugs are overoptimization by VC and other compilers. > The optimization process had been omitting code in the benchmark > as the results of those portions of the benchmark were not used > elsewhere (compiler says, don't use...don't bother to compute). > Note: I tried Intel C++ 8.0 and it figured out almost none > of the benchmark results were used. The result was 7x higher > benchmark results. > > So, if you are using a newer VC, your results might look great > but the actual crunching code might remain the same speed. > > Bottom line, benchmark wasn't doing stuff it should have been. > Now that it will be doing the stuff, the numbers will be lower. > > In addition I have also profiled the code to find the most used > functions. Mostly they are the Fast Fourier Transform code > routines. I have rewritten them in SSE and am now working on > 3Dnow. My code determines CPU type and capabilities at > startup and uses appropriate SIMD subroutines. > Is anyone out there good with Altivec, and CPU identification > on Mac/Power PC? > > =Ben > > hi Ben have you any Files to be finished for Download, so we can test this... this were great, if you don have the Webspace for this, look at my Sig, you can send this via E-mail to me, so i will host, this Greetings from Germany NRW Ulli ![]() S@h Berkeley's Staff Friends Club m7 © |
sniperbait Send message Joined: 15 Feb 04 Posts: 67 Credit: 56,828 RAC: 0 ![]() |
sorry I know its off topic but @ chuck its the nic drivers that dont work everthing is fine with WIN XP32 bit just not 64 bit windows just doesnt see the cnet card and as for the 3com card none of the drivers I found for WIN 64 work. BTW I'm on a nr041 router and a SB4100 cable modem :) [url=http://usa.duane-n-lisa.net] ![]() |
![]() Send message Joined: 1 Jun 99 Posts: 6 Credit: 1,482,176 RAC: 0 ![]() |
> Seems similar like my 1st try. > Try this manual, I did it and it started to look much better. I'm still not > able to compile it but developmnet enviroment works. In your case it looks > like it doesn't work. > You have problem with enviroment variables, PATH, LIB and INCLUDE, I think. > I hope that this article at > planetamd64.com will help you. > One more hint: > As it is written in article, add mentioned there paths to the INCLUDE env. > variable, I had to do it ("PlatformSDK include followed by the ones from > visual studio"). > And at the end. Did you install Platform SDK on 64bit Win XP or 32? I'm not > able to install it on 64bit version, so I have to develop under32bit version > and then reboot to 64bit and give a try, which is very bad. > Wish you luck. I forgot to mention that I'm doing all this under Win32. Win64 is installed on another partition. From the PDF that I read on AMD's site, that is actually the preferred development environment (ironically, if you ask me). Maybe the build tools that run under XP & Win2K3 x64 aren't as stable. Its too nice outside to work on this now though ;-) traviblog|64|seti@diesel |
![]() Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 ![]() |
Hey all, Another item to consider... Faster computation doesn't equal more credit as things are now. Right now the credit you claim for a WU is * (algorithm is somewhat more complex). So for science faster = better, but for credit no go. Also, your claimed credit for that WU will be compared to 2 other crunchers claimed credits...and the lowest value will be granted to all 3 users. =Ben |
![]() Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 ![]() |
> Hey all, > > Another item to consider... > > Faster computation doesn't equal more credit as things are now. > > Right now the credit you claim for a WU is > * > (algorithm is somewhat more complex). > > So for science faster = better, but for credit no go. > > Also, your claimed credit for that WU will be compared to 2 other > crunchers claimed credits...and the lowest value will be granted > to all 3 users. > > =Ben > > Ben, I do not speak for all of us, but for those that I am permitted to speak for.... the purpose of going to 64 bit is to both improve the science and improve the performance. We don't expect more credit. We do expect the science now to be higher quality because we have reduced the rounding errors and given our scientists more finished / processed data to work with. I honestly expect credt to be the same... or more correctly stated.. to be properly calculated (which has been a problem since the beginning). Now we can be fair, fast, and accurate. What more can we ask for? true? Chuck |
![]() ![]() Send message Joined: 27 Jun 00 Posts: 346 Credit: 417,028 RAC: 0 ![]() |
I don't care very much about credit. I think that 64bit platfor is a future and that is more important then credit. On the 1st place is science, that's clear. Guy from M$, when I asked if exists 64bit compiler, told that 64bit is not a "miraculous word" and I my not expect huge performance. So I asked him why then is anybody working on 64bit platform and why we are not on 8bit computers now? He wished a nice weekend in his short answer :o)))) > Hey all, > > Another item to consider... > > Faster computation doesn't equal more credit as things are now. > > Right now the credit you claim for a WU is S@h Berkeley's Staff Friends Club © member |
![]() ![]() Send message Joined: 27 Jun 00 Posts: 346 Credit: 417,028 RAC: 0 ![]() |
Ok, so impossibility to instal Platform SDK on Win x64 is "normal". Did you solved that "error spawning cl.exe" problem? > I forgot to mention that I'm doing all this under Win32. Win64 is installed > on another partition. From the PDF that I read on AMD's site, that is > actually the preferred development environment (ironically, if you ask me). > Maybe the build tools that run under XP & Win2K3 x64 aren't as stable. S@h Berkeley's Staff Friends Club © member |
![]() Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 ![]() |
Chuck, > I do not speak for all of us, but for those that I am permitted to speak > for.... the purpose of going to 64 bit is to both improve the science and > improve the performance. We don't expect more credit. We do expect the > science now to be higher quality because we have reduced the rounding errors > and given our scientists more finished / processed data to work with. > > I honestly expect credt to be the same... or more correctly stated.. to be > properly calculated (which has been a problem since the beginning). Now we > can be fair, fast, and accurate. What more can we ask for? true? A super-optimized Macintosh version with a GUI ... Well, you said what more! <p> Click Me! |
jstelly Send message Joined: 3 Apr 99 Posts: 5 Credit: 0 RAC: 0 ![]() |
Well, I haven't been able to get back to look at this. I'm assuming the servers have stabilized but the client I built was working for me at least a few times after I posted this. I'm getting a "GUI RPC failed to initialize socket" message now from the boinc client that I wasn't seeing before. As was pointed out, the benchmark figures are clearly a case of the compiler optimizing out some of the computation entirely, it sounds like Ben is working on that. I don't think we'll see anything more than a 5% performance gain from just rebuilding the code as 64-bit on Windows. From looking at some profile output the real potential for performance gains is the FFT code, at least that's where a large portion of the time is spent during execution. It needs to be optimized for SSE etc... and it's good to see Ben working on that as that's not really my thing. |
![]() Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0 ![]() |
Hi all, My SSE conversion of the FFT seem to be approx 125% faster than the Oourda code. Still testing, seeing if they produce substantially identical results to FPU code. Errors seem to be on numbers 5th digit or so. Almost all of the current seti calculations, including the buffer of data from the telescope is currently in a single precision floating point format. (32 bits) We could use double precision, double the size of the buffers, read the data from the WU file as doubles, but I don't know if that would generate more accurate results. As I recall reading the data is gathered from the telescope in a 2 bits per sample format. So I don't know if the precision is there to enhance detection. Regarding AMD 64bit, the AMD 64 has some interesting features here for greater accuracy in computations. It has 16 SSE2 registers vs 8 in Intel P4. (each 128bits long) It still has only 8 FPU registers (as in orignal 80386 each is 80 bits long) |
![]() Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 ![]() |
> Hi all, > > My SSE conversion of the FFT seem to be approx 125% faster than the Oourda > code. Still testing, seeing if they produce substantially identical results > to FPU code. Errors seem to be on numbers 5th digit or so. > > Almost all of the current seti calculations, including the buffer of data from > the telescope is currently in a single precision floating point format. > (32 bits) > > We could use double precision, double the size of the buffers, read the data > from the WU file as doubles, but I don't know if that would generate more > accurate results. > > As I recall reading the data is gathered from the telescope in a 2 bits per > sample format. So I don't know if the precision is there to enhance > detection. > > Regarding AMD 64bit, the AMD 64 has some interesting features here for greater > accuracy in computations. > > It has 16 SSE2 registers vs 8 in Intel P4. (each 128bits long) > It still has only 8 FPU registers (as in orignal 80386 each is 80 bits long) > > Your question about reading and operating in double precision would gain us a lot... remember, we lose bits with each calc. Single precision is good for only 6.8 digits of precision. Once you start operating on that... you start losing due to roundoff. The longer we can keep out the roundoff errors, the better. Have you seen the addition registers in the FX (Sledgehammer and above) class machines for AMD? They have more registers. Also, would you like some verification of operation on Linux? I have some BSD/SYSV runtime corrections for code that I think should be integrated and am intersted in puttting our works together. (I'm mostly working on CC and a benchmark that is balanced with Integer, FP, direct & indirect addressing for a better 'cross-prject' benchmark.) I think the 1988-based benchmarks have outlived their usefullness. So far, I've compiled on VS, and GCC and gotten almost identical results. I am curious how it behaves on HT p4s. I've also put in a memory test (from code I wrote 3 years ago) that stresses and does a balanced intermix of byte, word, dword, and qword r/w. If you plot the results for each 'segment' (4k, 8k, 16k, .... 4M), you can see where a processor and it's caches 'break', and where bandwidth goes flatline max. How do we put this all together? Chuck |
![]() Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 ![]() |
> > Hi all, > > > > My SSE conversion of the FFT seem to be approx 125% faster than the > Oourda > > code. Still testing, seeing if they produce substantially identical > results > > to FPU code. Errors seem to be on numbers 5th digit or so. > > > > Almost all of the current seti calculations, including the buffer of data > from > > the telescope is currently in a single precision floating point format. > > (32 bits) > > > > We could use double precision, double the size of the buffers, read the > data > > from the WU file as doubles, but I don't know if that would generate > more > > accurate results. > > > > As I recall reading the data is gathered from the telescope in a 2 bits > per > > sample format. So I don't know if the precision is there to enhance > > detection. > > > > Regarding AMD 64bit, the AMD 64 has some interesting features here for > greater > > accuracy in computations. > > > > It has 16 SSE2 registers vs 8 in Intel P4. (each 128bits long) > > It still has only 8 FPU registers (as in orignal 80386 each is 80 bits > long) > > > > > > > Your question about reading and operating in double precision would gain us a > lot... remember, we lose bits with each calc. Single precision is good for > only 6.8 digits of precision. Once you start operating on that... you start > losing due to roundoff. The longer we can keep out the roundoff errors, the > better. This is known from direct experience with orbital work I now do. > > > Have you seen the additional registers in the FX (Sledgehammer and above) class > machines for AMD? They have more registers. > > Also, would you like some verification of operation on Linux? I have some > BSD/SYSV runtime corrections for code that I think should be integrated and am > intersted in puttting our works together. (I'm mostly working on CC and a > benchmark that is balanced with Integer, FP, direct & indirect addressing > for a better 'cross-prject' benchmark.) I think the 1988-based benchmarks > have outlived their usefullness. So far, I've compiled on VS, and GCC and > gotten almost identical results. I am curious how it behaves on HT p4s. > I've also put in a memory test (from code I wrote 3 years ago) that stresses > and does a balanced intermix of byte, word, dword, and qword r/w. If you > plot the results for each 'segment' (4k, 8k, 16k, .... 4M), you can see where > a processor and it's caches 'break', and where bandwidth goes flatline max. > > How do we put this all together? > > Chuck > > > > |
![]() ![]() Send message Joined: 5 Aug 04 Posts: 6 Credit: 279,532 RAC: 0 ![]() |
Let's skip all the the computer geek speak, and get right down to it. When will the software be available for all 64 bit users? |
![]() Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 ![]() |
> Let's skip all the the computer geek speak, and get right down to it. When > will the software be available for all 64 bit users? > > Please forgive my 'non-geek, standard answer'......"When we know it's ready... I've started testing on my stuff on 32bit and 64 bit machines. I'm sure Ben is testing his. Next step is to put it together and give it to the appropriate folks to test more, bless, and then distribute when they are happy with it." Chuck Team Phoenix Rising |
![]() ![]() Send message Joined: 5 Aug 04 Posts: 6 Credit: 279,532 RAC: 0 ![]() |
> > Let's skip all the the computer geek speak, and get right down to it. > When > > will the software be available for all 64 bit users? > > > > > > Please forgive my 'non-geek, standard answer'......"When we know it's ready... > I've started testing on my stuff on 32bit and 64 bit machines. I'm sure Ben > is testing his. Next step is to put it together and give it to the appropriate > folks to test more, bless, and then distribute when they are happy with it." > > > Chuck > Team Phoenix Rising > > > |
![]() ![]() Send message Joined: 5 Aug 04 Posts: 6 Credit: 279,532 RAC: 0 ![]() |
> > Let's skip all the the computer geek speak, and get right down to it. > When > > will the software be available for all 64 bit users? > > > > > > Please forgive my 'non-geek, standard answer'......"When we know it's ready... > I've started testing on my stuff on 32bit and 64 bit machines. I'm sure Ben > is testing his. Next step is to put it together and give it to the appropriate > folks to test more, bless, and then distribute when they are happy with it." > > > Chuck > Team Phoenix Rising > Thank you for your time and answer. I do and will look forward to the new software. > > |
![]() Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 ![]() |
> > > Let's skip all the the computer geek speak, and get right down to > it. > > When > > > will the software be available for all 64 bit users? > > > > > > > > > > Please forgive my 'non-geek, standard answer'......"When we know it's > ready... > > I've started testing on my stuff on 32bit and 64 bit machines. I'm sure > Ben > > is testing his. Next step is to put it together and give it to the > appropriate > > folks to test more, bless, and then distribute when they are happy with > it." > > > > > > Chuck > > Team Phoenix Rising > > > > Thank you for your time and answer. I do and will look forward to the new > software. > > > > > > We *ALL* anxiously await the formal ok to use it.... a big step for science, performance, and fairness to all.. I will do my best to see how quickly the appropriate folks (a few hours drive from me if they ever needed any help) can work it into their schedule... I'm sure both Ben's and my work will be welcome.... It's been a nagging issue to them for a long time. I'm sure they will be happy to get this off their plate as it will permit them to remove all those 'fudge factors' from the system (which you see as the differences between estimated completion and how long it actually takes you to run it). Chuck, Team Phoenix Rising |
![]() Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 ![]() |
> Hi all, > > My SSE conversion of the FFT seem to be approx 125% faster than the Oourda > code. Still testing, seeing if they produce substantially identical results > to FPU code. Errors seem to be on numbers 5th digit or so. Based on my limited experience that is in line with what I saw with my experience with iterative mathematics. If you want to get those digits firmed up, you will have to go to double precision. Double, in theory is out to 16 digits, but you only realize only about 10-12 ... Still, you have to ask the question, as you did, if this is needed. <p> For BOINC Documentaion: Click Me! |
![]() Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 ![]() |
> > Let's skip all the the computer geek speak, and get right down to it. > When > > will the software be available for all 64 bit users? > > > > > > Please forgive my 'non-geek, standard answer'......"When we know it's ready... > I've started testing on my stuff on 32bit and 64 bit machines. I'm sure Ben > is testing his. Next step is to put it together and give it to the appropriate > folks to test more, bless, and then distribute when they are happy with it." Chuck, Is this going to be a windows only 64-bit version? If not, that is fine, but if it is, you should state that... I know the Team MacNN has optimized compiles for the Mac, though I have not used them yet because of where we are I have no way to really see if it is working correctly... I know it is possible, but, I am waiting until the GUI comes out and I can "watch" ... <p> For BOINC Documentaion: Click Me! |
![]() Send message Joined: 21 Aug 03 Posts: 37 Credit: 3,511 RAC: 0 ![]() |
> > > Let's skip all the the computer geek speak, and get right down to > it. > > When > > > will the software be available for all 64 bit users? > > > > > > > > > > Please forgive my 'non-geek, standard answer'......"When we know it's > ready... > > I've started testing on my stuff on 32bit and 64 bit machines. I'm sure > Ben > > is testing his. Next step is to put it together and give it to the > appropriate > > folks to test more, bless, and then distribute when they are happy with > it." > > Chuck, > > Is this going to be a windows only 64-bit version? If not, that is fine, but > if it is, you should state that... I know the Team MacNN has optimized > compiles for the Mac, though I have not used them yet because of where we are > I have no way to really see if it is working correctly... I know it is > possible, but, I am waiting until the GUI comes out and I can "watch" ... > <p> > For BOINC Documentaion: Click Me! > > > > Paul, It is our team's intention to make this applicable to all platforms. I assume (but can't speak for Ben) that that Ben's SSE optimizations also translate to Altivec instructions for Macs and he's also intending (if I remember earlier posts correctly) other CPU support. As for the number of significant digits, I find that 12-13 is as far as we can really push it. Given Ben's statements about sampling and BioScience (P@H) starting with high-def data, I think we're in good shape if we can pull in 32 bit for now, operate in 64 or 80 bit resolution (using 128 bit to store the 80 bit interim results), and then output 32 bit again with significantly reduced roundoff error(s). This is where I believe the science (as a whole) takes the greatest leap forward. If Team MacNN has optimizations for Altivec, etc... can we somehow find a way to get all this integrated? We now have Ben's work, our work, Francophone's work, and MacNN's work.... and I am sure COUNTLESS others working on the same thing. How do we pull this together for the good of the science as a whole??? I am open to direct email (you have my address), so please... let's make this happen. We are about to start final testing on Win32/Win64/Linux32/Linux64... cpus Pentium, P2, P3, P4, and all the AMDs available on in all possible configurations possile because they are simply the most readily available to us from all the volunteers of the team. If we have some Macs, then it will be tested there, but I don't know of any of of yet. Our goal when we started this was to bring everything up to 64 bit. I personally am working on getting a couple SGIs and SUNs loaned to me as well just to ensure that big and little endian machines all behave the same. These will be added insurance for portability and adherence to the Cobblestone model. I personally would like us (TPR) to present to SSL software that is solid and easily integratable into the main stream (integrated with Ben's if possible), tested to their standards, and then released per their license and GPL requirements. It makes the most sense to let BOINC-Dev be the focal point for final integration and release to the general public through their standard channels. They are afterall fundamentally responsible for the project. Does this answer your questions? Chuck Team Phoenix Rising. PS: Yes, it's 5am... I had an idea on the benchmarks and it was easier to code than write it down, and it does fix an issue I had with register allocations on the difference processor families... :) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.