Message boards :
Number crunching :
64-bit App Build Windows XP x64
Message board moderation
Author | Message |
---|---|
Bob Delkhoon Send message Joined: 15 May 99 Posts: 11 Credit: 201,827 RAC: 0 ![]() |
I installed Windows x64 on my main machine over the weekend so I wanted to see how a 64-bit client would run. I used the sources from Simon Zadra (KWSN - Chicken of Angnor)'s site as a base for the build. I let it run overnight on my x64 box and all the results that have been verified are good. (3 errors's on my results list are accidents from changing the app_info.xml). I don't see a significant change in performance yet on my machine, but I'm hoping for some feedback on other machines. I'm assuming there's more that can be changed to give more gains from the 64-bit build. I've only had x64 installed a few days so I'm still kind of inexperienced on code optimizations that can be done. If anyone had any tips on 64-bit optimizations, that would be awesome. Anyone who'd like to try it, Please let me know how it works for you and your CPU type. PLEASE keep an eye on your results, I only tested this for about 24 hours on 1 box. Windows XP x64 64-bit client http://tiger.towson.edu/~bdelkh1/setiathome-5.15-DeNitro-emt64.zip Thanks guys =) -Bob Delkhoon (DeNitro) |
![]() ![]() Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 ![]() |
Hi Bob, saw you online at my site recently :o) Good job on the 64-Bit app, I haven't gotten one to link correctly yet. I'm very interested in your build settings, including what SDK you used, what version of Visual Studio/ICC/IPP and such, as well as any necessary source edits (there were a couple, as I recall). Running some tests with your app on 64-Bit Windows (Pentium-D) as I type this message, let's see what it can do! Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
EricVonDaniken Send message Joined: 17 Apr 04 Posts: 177 Credit: 67,881 RAC: 0 ![]() |
Going from IA32 to x86-64 gives you a few optimization opportunities: 1= The number of registers of each type doubles from 8 to 16 2= Each register is also 2x wider. This is particularly important when talking about MMX/SIMD registers. 3= There are a few instructions in x86-64 that are not in IA32. The easiest way to take advantage of all this is to get intel's compiler, icc, and their MKL libraries and build s@h using them as aggressively as possible while still getting correct results. These tools cost money. Sun's Performance Evaluator tool is also useful as it will profile the code for you, make suggestions as to useful source transformations, make suggestions as to in memory data structure layout, etc, etc. This tool is free. |
![]() ![]() Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 ![]() |
Eric, that is exactly why I am keen on invading 64-Bit space ;o) I'm well aware of the architectural advantages. Also - I have posted this before - I've managed to acquire commercial ICC and IPP licenses quite a while ago. For anyone interested in building their own, trial versions are available from Intel online (as noted in the Windows and Linux How-To threads stickied on this board). So really, there is nothing holding anyone back from participating - obviously, releasing apps to the public is a different thing, as the trial licenses do not really allow such usage. To work around this issue, I've tried to assemble as many capable people in my test and development group on http://lunatics.at. I can only say, their pace is hard to keep up with :o) I will continue to offer apps compiled on the sources this team produces, as the licenses themselves do not run out (only updates and premium support next July). You posted elsewhere that you would like to stay anonymous but still like to participate in coding. As someone else noted, you could always acquire a generic email address to register with at my site and use the same username as here - and let me know about it. As things stand, you would not be the only one. That has not stopped Mr. and Ms. Anonymous from contributing, though. Bob, sorry for sort of hijacking your thread - the same goes for you. If you're interested in working together on the S@H code, you're welcome! Let me know, and I'll bump your access. Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
![]() ![]() Send message Joined: 11 Jun 99 Posts: 42 Credit: 1,443,674 RAC: 0 ![]() |
Is this optimization for the Intel 64 bit processors or the AMD 64 bit? ![]() |
![]() ![]() Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 ![]() |
Hm, it didn't run on my XP64 system at all - didn't even produce an stderr.txt file, in fact. Not sure why, did you try it on other XP64 systems, Bob? Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
![]() ![]() Send message Joined: 3 Aug 99 Posts: 305 Credit: 6,157,052 RAC: 0 ![]() |
Anyone try it yet? I just did, got lots of client errors 378761173 90781347 12 Sep 2006 9:08:03 UTC 13 Sep 2006 0:59:00 UTC Over Client error Done 0.00 0.00 --- 378761169 90781360 12 Sep 2006 9:08:03 UTC 13 Sep 2006 0:59:00 UTC Over Client error Done 0.00 0.00 --- 378761166 90781338 12 Sep 2006 9:08:03 UTC 13 Sep 2006 0:59:00 UTC Over Client error Done 0.00 0.00 --- 378761161 90781346 12 Sep 2006 9:08:03 UTC 13 Sep 2006 0:59:00 UTC Over Client error Done 0.00 0.00 --- 378761158 90781361 12 Sep 2006 9:08:03 UTC 13 Sep 2006 0:59:00 UTC Over Client error Done 0.00 0.00 --- 378761154 90781341 12 Sep 2006 9:08:03 UTC 13 Sep 2006 0:59:00 UTC Over Client error Done 0.00 0.00 --- 378761151 90781333 12 Sep 2006 9:08:03 UTC 13 Sep 2006 0:59:00 UTC Over Client error Done 0.00 0.00 --- Running it on Windows Server 2003 Standard x64 Edition - Dual core 3.4. ![]() ![]() ![]() |
Bob Delkhoon Send message Joined: 15 May 99 Posts: 11 Credit: 201,827 RAC: 0 ![]() |
Bah getting errors? It's been working fine for me. I may take the link down till I can work with it some more. This is the machine I've been running it on. All the results since the 3 client errors are from the Exact build I posted. (The 3 client errors where accidents from me messing with the app_info.xml) http://setiathome.berkeley.edu/show_host_detail.php?hostid=2624714 The machine is a Core Duo 2 running Windows XP X64. I may have accidentally linked the ipp wrong but I'm pretty sure it's static and correct. Build Enviroment was Visual Studio 2005 using Intel Compiler 9.1 using IPP 5.1.1 Unfortunatly this is the only 64-bit machine I have access to so hard to test on a clean machine. --Bob Delkhoon (DeNitro) |
![]() ![]() Send message Joined: 3 Aug 99 Posts: 305 Credit: 6,157,052 RAC: 0 ![]() |
Bah getting errors? It's been working fine for me. I may take the link down till I can work with it some more. ![]() ![]() ![]() |
Bob Delkhoon Send message Joined: 15 May 99 Posts: 11 Credit: 201,827 RAC: 0 ![]() |
Ok I *may* have something. I was doing 32-bit client testing before 64-bit and I had /QaxT and /QaT set(core duo 2 flags). I read /QaxT and simmilar settings (SSE/SSE2 etc.) are ignored when compiling for EMT64 as it takes precidence, BUT /QaT *may* not be, I'm not sure. I going to remove them both just to be safe and do another build. I'll replace the link in a few minutes (and let you know). It should not effect speed any since SSE/SSE2/MMX don't work with EMT64 as emt64 use overrides them. I may be wrong, as I'm new to 64-bit coding but I'll give it a shot --Bob Delkhoon (DeNitro) |
Bob Delkhoon Send message Joined: 15 May 99 Posts: 11 Credit: 201,827 RAC: 0 ![]() |
Ok new client is build and up at the exact same link. The only change is it's built without core duo 2 specific flags (/QaxT and /QaT). It may or may not work still, since I thought emt64 would override the sse4. PLEASE use this on a test work unit, or wait to hear if it works on test work units from others BEFORE you use this on the actual boinc app. It *may produce errors!!* Thanks again guys, --Bob Delkhoon (DeNitro) |
![]() ![]() Send message Joined: 3 Aug 99 Posts: 305 Credit: 6,157,052 RAC: 0 ![]() |
That seems to have fixed something DeNitro. Its running now. I first tested it with the chickens kwsn-test program. Running it now on a AMD 3500 Windows xp 64 bit edition. Will post results tomorrow. Thanks a million DeNitro. ![]() ![]() ![]() |
Bob Delkhoon Send message Joined: 15 May 99 Posts: 11 Credit: 201,827 RAC: 0 ![]() |
Great to hear it's working, very sorry about that mix up. Again, I'm not sure what the speed difference will be. On my machine, I don't see a significant difference over Simon's (awesome) optimized app that this was based off but hopefully it's a start. Simon, Sure I'd like to work together. There's some more I want to change in the code base to make it totally cross compatabile with 32-bit and 64-bit builds. There's also still a bit of warnings I want to check into. This was just a rought edit to get valid results and check initial performacne since I'm kinda time limited with school atm. As long as it's not any slower than the 32-bit apps, I hope it's an ok start. Again PLEASE remember, this is just a test. Make sure you do a test run first on your platform with test work units before you try it on the actuall BOINC app!!! Thanks, --Bob Delkhoon (DeNitro) |
Bob Delkhoon Send message Joined: 15 May 99 Posts: 11 Credit: 201,827 RAC: 0 ![]() |
cool Thanks. You can probably tell form my post count I'm not a frequent poster here but I love the project =) The filesize outside the zip should be 10,616,832. If you re-downloaded after I put up the new build, your ISP may have pulled it from a proxy/cache or something. Just in case, I also put it up under a modified filename. (same build as top post link) http://tiger.towson.edu/~bdelkh1/setiathome-5.15-DeNitro-emt64_test1.zip Thanks, --Bob Delkhoon (DeNitro) |
Bob Delkhoon Send message Joined: 15 May 99 Posts: 11 Credit: 201,827 RAC: 0 ![]() |
Oops thanks, title changed. Even on my intel chip, I felt like the see2 app worked faster than sse3. I didn't really test sse4 since I was eager to try out emt64. When I have some time this week I might try a sse4 app too. Please Note: Most of the work that went into the app over the default was already in place in Simon's source. I just made the modifications to get it to do a 64-bit compile. --Bob Delkhoon (DeNitro) |
EricVonDaniken Send message Joined: 17 Apr 04 Posts: 177 Credit: 67,881 RAC: 0 ![]() |
Oops thanks, title changed. Even on my intel chip, I felt like the see2 app worked faster than sse3. I didn't really test sse4 since I was eager to try out emt64. When I have some time this week I might try a sse4 app too. On dual core AMD and Intel chips, one should compile for at least SSE3. Intel Core2 chips have support for SSE4 in them and if you have icc and intel's MKL you should compile w/ SSE4 support. If you are getting weird results like SSE2 being better then SSE3 or SSE3 being better than SSE4, something is wrong. Note that to make best use of Intel's MKL may require some source changes to use functions that Intel has optimized for the Core2 architecture rather than their more generic equivalents. Benher mentioned that the code is not taking good advantage of the all the registers available in the IA32 or x86-64 architecture. Fixing this might require non-trivial understanding of the source. Also, it's =E=xtended =M=emory =64=b =T=echnology. EM64T. |
![]() ![]() Send message Joined: 3 Aug 99 Posts: 305 Credit: 6,157,052 RAC: 0 ![]() |
With the exception of the unit that was crunched partially by Chickens app and finnished by DeNitros, which finnished with a computation error the next 4 finnihesd without error. One verified and three pending as of this time. Times show little improvement over the 32 app. But would need to pull a unit over to comparison run with both apps. ![]() ![]() ![]() |
![]() ![]() Send message Joined: 11 Jun 99 Posts: 42 Credit: 1,443,674 RAC: 0 ![]() |
Downloaded and installed on a AMD X2 3800+. Seems to be running at a few hundred Mflops faster then the SSE2 app I was running earlier, at least according to boincview. Hope this helps some. ![]() |
![]() ![]() Send message Joined: 3 Aug 99 Posts: 305 Credit: 6,157,052 RAC: 0 ![]() |
DeNitro I would appreciate very much the steps you took to get this client working. I've been trying for a couple months with no luck. Maybe the Chicken can do a how to with your help. Borg :-) ![]() ![]() ![]() |
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0 ![]() |
On dual core AMD and Intel chips, one should compile for at least SSE3. It seems a bit simplistic to assume that using SSE3 and SSE4 wherever possible will automatically be faster than not using it. For example, using HADDPS to sum across a register has greater latency and lower throughput than other combinations of instructions, especially on Core2 chips. As for SSE4 (which I suppose I should be calling SSSE3 now), I think it's been discussed a couple times before--the only new instructions that could be useful for us are PSHUFB and PALIGNR, but only if we find places where using them is faster. If you can find a use for all those other SIMD integer instructions in a primarily floating-point application, all the more power to you. I imagine the Intel compiler knows much more about choosing and scheduling these instructions than I do, but it's definitely interesting that the SSE2 app has performed so well. Benher mentioned that the code is not taking good advantage of the all the registers available in the IA32 or x86-64 architecture. Fixing this might require non-trivial understanding of the source. I think when Ben originally said this, he wasn't implying that it was a problem to be fixed. :) A lot of SETI code is simple loopy code, and there are arguments against doing a ton of loop unrolling on Core2, like a 64-byte loop buffer and the increase in code size that you've taken by moving to x86-64. So, EricVonDaniken, have you signed up on Simon's board yet? |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.