Contributing code? Amd64 build for Windows

Message boards : Number crunching : Contributing code? Amd64 build for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 26351 - Posted: 14 Sep 2004, 12:24:51 UTC - in response to Message 26341.  

> Paul,
> It is our team's intention to make this applicable to all platforms. I
> assume (but can't speak for Ben) that that Ben's SSE optimizations also
> translate to Altivec instructions for Macs and he's also intending (if I
> remember earlier posts correctly) other CPU support.

Cool!

> As for the number of significant digits, I find that 12-13 is as far as we
> can really push it. Given Ben's statements about sampling and BioScience
> (P@H) starting with high-def data, I think we're in good shape if we can pull
> in 32 bit for now, operate in 64 or 80 bit resolution (using 128 bit to store
> the 80 bit interim results), and then output 32 bit again with significantly
> reduced roundoff error(s). This is where I believe the science (as a whole)
> takes the greatest leap forward.

Yeah, that is about what I would expect. I am hoping to move into the faster machines too ... :)

> If Team MacNN has optimizations for Altivec, etc... can we somehow find a
> way to get all this integrated? We now have Ben's work, our work,
> Francophone's work, and MacNN's work.... and I am sure COUNTLESS others
> working on the same thing. How do we pull this together for the good of the
> science as a whole??? I am open to direct email (you have my address), so
> please... let's make this happen.

I just looked and can not find the end to the string. I can find the download pages where they have the compiles available, but there is no tying to a person. To be honest, I would have joind the team, but I cannot make heads nor tails out of the web site ...

> We are about to start final testing on Win32/Win64/Linux32/Linux64... cpus
> Pentium, P2, P3, P4, and all the AMDs available on in all possible
> configurations possile because they are simply the most readily available to
> us from all the volunteers of the team. If we have some Macs, then it will be
> tested there, but I don't know of any of of yet.

Well, I have one ... but it is my main work platform so, I am not sure if I can unequivically jump in as a volunteer ... especially as I am ALREADY way behind on updates to my site...

> Our goal when we started this was to bring everything up to 64 bit. I
> personally am working on getting a couple SGIs and SUNs loaned to me as well
> just to ensure that big and little endian machines all behave the same. These
> will be added insurance for portability and adherence to the Cobblestone
> model.
>
> I personally would like us (TPR) to present to SSL software that is solid
> and easily integratable into the main stream (integrated with Ben's if
> possible), tested to their standards, and then released per their license and
> GPL requirements. It makes the most sense to let BOINC-Dev be the focal
> point for final integration and release to the general public through their
> standard channels. They are afterall fundamentally responsible for the
> project.

I did suggest in the beta that this was something that should be done. Along the same line as the "related" sites and Alpha/Beta testers UCB would host the binaries. So, we would have the code updates being baselined, selected volunteers would then compile the system into optimized binaries.

Second alternative is to have this as a hosted feature, like what is done in the BOINC download network. Though I have not visitied them and don't know what they have available.

I mean, I would not mind making a compile, but I do not have the time to debug the scripts. And my rumors say that it is not necessarily a slam dunk for the Macintosh code at this time.

So, I don't have good answers for you... just more questions ... I was hoping the rabid (I am saying it with a smile) speed demons would have already started doing this publically. I mean, it might be going on, and I am just not aware of it (no surprise there).

> Does this answer your questions?

Yes :) ... warm fuzzies all over!



<p>
For BOINC Documentaion: Click Me!


ID: 26351 · Report as offensive
Profile slavko.sk
Avatar

Send message
Joined: 27 Jun 00
Posts: 346
Credit: 417,028
RAC: 0
Slovakia
Message 26356 - Posted: 14 Sep 2004, 12:36:01 UTC - in response to Message 26341.  
Last modified: 14 Sep 2004, 12:36:49 UTC

> PS: Yes, it's 5am... I had an idea on the benchmarks and it was easier to
> code than write it down, and it does fix an issue I had with register
> allocations on the difference processor families... :)
Whoaw!

Chuck, if you need some tester (me) for Win64/AMD64 keep me posted, drop a message ... slavko@slavko.sk.



[b]S@h Berkeley's Staff Friends Club © member
ID: 26356 · Report as offensive
WildWeasel

Send message
Joined: 2 Jun 99
Posts: 5
Credit: 485,315
RAC: 0
United Kingdom
Message 26715 - Posted: 15 Sep 2004, 9:30:21 UTC - in response to Message 26356.  

Chuck et al,

While you're doing this magnificent work, are you comparing the run times on code compiled by different compilers?

I saw the Intel compiler mentioned together with the MS one...

The Weasel




ID: 26715 · Report as offensive
Profile Chuck Lasher

Send message
Joined: 21 Aug 03
Posts: 37
Credit: 3,511
RAC: 0
United States
Message 26718 - Posted: 15 Sep 2004, 9:36:25 UTC - in response to Message 26715.  

> Chuck et al,
>
> While you're doing this magnificent work, are you comparing the run times on
> code compiled by different compilers?
>
> I saw the Intel compiler mentioned together with the MS one...
>
> The Weasel
>
>
>
>
>
>

Yes, and I am even looking at the generated code and comparing as much as possible... but I question whether that is a rather moot point now or not. There have been changes submitted to Boinc-dev and implemented into 4.09 (or 4.10) now.

Until our team discusses all this, we won't be making any further posts.


Chuck
Team Phoenix Rising.


ID: 26718 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 27253 - Posted: 17 Sep 2004, 7:22:29 UTC
Last modified: 17 Sep 2004, 7:23:15 UTC

Ok updates.

There is now an official optimization group mailing list from boinc (seti).
Go to the home page for the link.

If you are a programmer, I think this is where you want to be.
---
Verification: Write the code - Compare the output to original routine's output. Determine margin of error.

Verifying a compiled client's output.
1. Put the original seti client .exe in a directory with a given WU.
2. Rename the wu to "work_unit.sah".
3. Start the original client.
4. Let it finish. Now there is a new file in the folder called "result.sah".
5. Copy this file away and save it.
6. Put new client in same directory.
7. Erase any "result.sah" stderr.txt state.sah files
8. Run new client.
9. Let it finish.
10. Use a file compare or 'diff' program to compare the new output to the original.

-----

64 bit code. The 64 bits applies to integers, as in standard registers.
The rest of the code around the floating point parts will go somewhat quicker.

When working with floats you still have the FPU's 32 bit single, 64 bit double stored in 80 bit internal floating registers.

When working with SIMD, you have 3DNow and SSE for 32 bit floats, SSE2 and SSE3 for 64 bit floats.

Altivec is 32 bit float SIMD.
---
The conversions I've performed in able to make the SSE code work will be of benefit to Altivec and other SIMD programmers, but I haven't written Altivec routines yet (nor have access to Mac for testing).

Brad Anderson (Mr. Anderson to you all ;) has written Altivec routines for the most active floating point routines in seti_boinc. He has compiled the application for 3 flavors of Mac and released them on his website.

This is covereed in another thread here in "crunch" started by him.

ID: 27253 · Report as offensive
Profile Chuck Lasher

Send message
Joined: 21 Aug 03
Posts: 37
Credit: 3,511
RAC: 0
United States
Message 27265 - Posted: 17 Sep 2004, 8:09:22 UTC - in response to Message 27253.  
Last modified: 17 Sep 2004, 8:10:11 UTC

> Ok updates.
>
> There is now an official optimization group mailing list from boinc (seti).
> Go to the home page for the link.
>
> If you are a programmer, I think this is where you want to be.
> ---
> Verification: Write the code - Compare the output to original routine's
> output. Determine margin of error.
>
> Verifying a compiled client's output.
> 1. Put the original seti client .exe in a directory with a given WU.
> 2. Rename the wu to "work_unit.sah".
> 3. Start the original client.
> 4. Let it finish. Now there is a new file in the folder called "result.sah".
> 5. Copy this file away and save it.
> 6. Put new client in same directory.
> 7. Erase any "result.sah" stderr.txt state.sah files
> 8. Run new client.
> 9. Let it finish.
> 10. Use a file compare or 'diff' program to compare the new output to the
> original.
>
> -----
>
> 64 bit code. The 64 bits applies to integers, as in standard registers.
> The rest of the code around the floating point parts will go somewhat
> quicker.
>
> When working with floats you still have the FPU's 32 bit single, 64 bit double
> stored in 80 bit internal floating registers.
>
> When working with SIMD, you have 3DNow and SSE for 32 bit floats, SSE2 and
> SSE3 for 64 bit floats.
>
> Altivec is 32 bit float SIMD.
> ---
> The conversions I've performed in able to make the SSE code work will be of
> benefit to Altivec and other SIMD programmers, but I haven't written Altivec
> routines yet (nor have access to Mac for testing).
>
> Brad Anderson (Mr. Anderson to you all ;) has written Altivec routines for
> the most active floating point routines in seti_boinc. He has compiled the
> application for 3 flavors of Mac and released them on his website.
>
> This is covereed in another thread here in "crunch" started by him.
>
>

Thank you for the input. I shall relay the information to the team Mr Herndon. :)



Sincerely,
Chuck Lasher
ID: 27265 · Report as offensive
audiforum.nl

Send message
Joined: 3 Dec 03
Posts: 7
Credit: 427,391
RAC: 0
Netherlands
Message 39540 - Posted: 24 Oct 2004, 0:39:49 UTC

Any news?

My 64bit is begging me to get a 64 clien of boinc ;)
ID: 39540 · Report as offensive
sniperbait

Send message
Joined: 15 Feb 04
Posts: 67
Credit: 56,828
RAC: 0
United States
Message 39544 - Posted: 24 Oct 2004, 1:32:05 UTC

mine too
<a href="http://usa.duane-n-lisa.net"><img src="http://usa.duane-n-lisa.net/signature.php?id=7654"></a>
<IMG SRC="http://boinc.mundayweb.com/seti2/stats.php/userID:1028/trans:off/.png">
ID: 39544 · Report as offensive
Divide Overflow
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 365
Credit: 131,684
RAC: 0
United States
Message 39594 - Posted: 24 Oct 2004, 5:39:53 UTC - in response to Message 39544.  

I'd love to see a 64 bit Seti@Home application written that also supports the free AMD Core Math Library.

http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_869_2282,00.html

ID: 39594 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 39605 - Posted: 24 Oct 2004, 6:25:42 UTC

I've looked a little more into AMD 64bit since my last entry.

There is one other major advantage to AMD 64bit.

[pre]
Intel/Athlon XP 64Bit
Integer Registers 7 + stack 15 + stack
Floating Stack regs 8 8
shared w/MMX
SSE regs 8 16
[pre]
So with 8 more regular integer registers, a good compiler could avoid several register to memory saves/loads, and thus fewer instructions.

Here is a good overview:
http://www.cpuid.com/K8/index.php

ID: 39605 · Report as offensive
Profile Chuck Lasher

Send message
Joined: 21 Aug 03
Posts: 37
Credit: 3,511
RAC: 0
United States
Message 39619 - Posted: 24 Oct 2004, 8:20:59 UTC


Ben,
If I may add to that.. the FX class (Sledhammer and above cores, possibly more as processors are announced) is the FXSAV (?) instruction. It is a high speed special register save instruction for doing high speed context switches. I can see it being used really nicely. I can also see the use of the SSE regs for non-SSE math by simply writing macros which do __ASM directives....

That link is a good one.... I've pointed it out to a few people myself. Every time someone asks what the difference between the AMD64 and P4 is... I send tem there.

I am curious if Intel has formally announced what is going to be in the P5 and P6 architectures or if those were originally slated to be the 'now-sidelined' 32 bit cores.

Any idea? I ask because I am puttting together a table of processor capabilities, based on George Woltman's and Jean Penne's code for primes that taylor runtime ops based on processor features ... specifically CMOV, SSE, SSE2.. Those make a big difference in FFTs which I believe would help seti a great deal if done properly. The nice part is that it would be Intel/AMD independent detection and feature management.... totally transparent to the user. just a few runtime 'if()'s in the code at critical places.


I've been working on the the 64-bit implementation of prime factoring. The math is nice and clean and gives a good opportunity to learn how to deal with a huge variety of FFT array sizes for all processors support by BOINC/Seti. I hope to bring this experience back to the project when complete. I also hope to see it put to use in the proteins & cpdn work (all fortran). Both those projects would benefit huge amounts with 64 bit math and fast 64 bit based FFT work. The error factor cuts way down and performance comes way up. Both things that we need to start preparing for as Intel gets closer (hopefully soon) to announcing & rolling out its 64 bit processors to the user community.
I think we should be ready for it so AMD and Intel users can really push all boinc projects forward.



Your thoughts?

Chuck

ID: 39619 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 39814 - Posted: 24 Oct 2004, 23:36:21 UTC
Last modified: 24 Oct 2004, 23:39:53 UTC

Chuck,

Join the boinc_opt mailing list. Sounds like right up your alley.

Eric Korpela (main programmer for SETI@home) is working on some things which will should eventually make seti (at least) quite a bit faster for many platforms. (Not related to the 4.05 issue, although he is aware of that)

Personally I have written SSE and 3DNow versions of the Oourda FFT code and several other routines in seti, others have implemented seti with the FFTW code, and others still have written Altivec. Eric has access to all all of these sources.

Until then I suggest you look into FFTW and the history of emails in boinc_opt mailing list.

Look for it on this page:
http://boinc.berkeley.edu/community.php

Regarding ALL of the other boinc projects... Their worker source is currently not public, and I've read nothing about any intentions to make them so. I know that CPDN at least has proprietary info in their source and are very unlikely to release it. So I don't know if they use FFT or what math functions they use. They might use optimized libraries (ie Intel SSE, AMD 3Dnow, Apple Altivec), or might not. I've seen no mentions of it.

ID: 39814 · Report as offensive
Profile Chuck Lasher

Send message
Joined: 21 Aug 03
Posts: 37
Credit: 3,511
RAC: 0
United States
Message 39842 - Posted: 25 Oct 2004, 1:33:07 UTC
Last modified: 25 Oct 2004, 1:36:04 UTC

Ben,
You have my address, etc on the boinc_opt list.... would you please email me privately and we can talk about this when you have time? I have some NASTY 386 dedicated assembler that will not handle cache's greater than 8k (P4) nor will it handle FFT's greater than 131072. Also, being "Masm", won't slide into GCC very well, nor use the extended FFT or FPU capabilities of the FX class machines (which AMD has and Intel is about to have).

What I need is to replace all the FFT ASM (currently done LONGHANG in C or 386 ASM) with some good code like your SIMD class and finish getting automake setup up (almost done there). It would make bringing the Altivec's into the projects a SNAP... and we both know what they will do!

May we discuss and share ideas and code? I have a few things I am fixing
for primes that Seti can use and You are the man to implement it.

I will look at the FFT code you've referenced again.. Oourda sounds very famiiliar. I've seen so much 'brute force FPU' FFTs recently for Linux, etc that I am overwhelmed and dont remember actual equations from memory.

Regarding CPDN... they are proprietary, which I do respect. I also sense in the tone of email text that the model is fragile and old.. and also respect that only a few touch it.

Predictor is part public, part proprietary (NDA is all that's required). It will benefit greatly from FFT and 64 bit.

Let's chat offline and see where we can help each other. I'll show you some great ones i've got already for primes that are about to hit suse 9.1 64 bit linux (as brute force in rev 1) and some wild little macros that cut out a great deal of FFT ASM. ALL of which I'm sure you've done or seen before.

As of tonight, I am switching this FX-53+ to dual boot (but staying mostly in Suse 9.1 -64) and and leaving the A-64 754 pin machine in win/32. I do have Suse 9.1 Pro (32 and 64 bit versions on flip/flop dvd).


Write direct when you can,
(Rom can give you my email if need be... permission granted as required)
Chuck
ID: 39842 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 42037 - Posted: 1 Nov 2004, 22:38:28 UTC - in response to Message 39814.  

>
> Personally I have written SSE and 3DNow versions of the Oourda FFT code and
> several other routines in seti, others have implemented seti with the FFTW
> code, and others still have written Altivec. Eric has access to all all of
> these sources.
>
> Until then I suggest you look into FFTW and the history of emails in boinc_opt
> mailing list.
>

Hi Ben!

I hope you don't mind if I jump in here.

I've created a seti client that makes use of fftw3f instead of ooura,
but I didn't have any luck with it.
When I run it against the test WU, the found spikes look reasonable,
but it detects different gaussians and fails...

Does that sound reasonable, or should I look for bugs in my code.

P.S:

I've also played around with SSE and created a ooura version that does
4 FFTs in a row. It's reasonably faster the the original one (50%),
but all the required shifting around of the data eats up the saved time,
and I end up with exactly the same performance as with the
standard fft...

Could you give me some hints how your SSE version looks like?

Regards Hans





ID: 42037 · Report as offensive
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 42043 - Posted: 1 Nov 2004, 23:23:29 UTC

dont know if this was postet before, but there is a Side with Severall optimissed Clients for Linux, not test yet

http://boinc.us.tt/

Greetings from Germany NRW
Ulli S@h Berkeley's Staff Friends Club m7 ©

ID: 42043 · Report as offensive
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 42044 - Posted: 1 Nov 2004, 23:29:37 UTC

dont know if this was postet before, but there is a Side with Severall optimissed Clients for Linux, not test yet

http://boinc.us.tt/

Greetings from Germany NRW
Ulli S@h Berkeley's Staff Friends Club m7 ©

ID: 42044 · Report as offensive
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 42047 - Posted: 1 Nov 2004, 23:38:36 UTC

dont know if this was postet before, but there is a Side with Severall optimissed Clients for Linux, not test yet

http://boinc.us.tt/

Greetings from Germany NRW
Ulli S@h Berkeley's Staff Friends Club m7 ©

ID: 42047 · Report as offensive
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 42049 - Posted: 1 Nov 2004, 23:42:26 UTC

dont know if this was postet before, but there is a Side with Severall optimissed Clients for Linux, not test yet

http://boinc.us.tt/

Greetings from Germany NRW
Ulli S@h Berkeley's Staff Friends Club m7 ©

ID: 42049 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 42051 - Posted: 1 Nov 2004, 23:44:54 UTC - in response to Message 42049.  

> dont know if this was postet before,
Yep several times ;-)

> but there is a Side with Severall
> optimissed Clients for Linux, not test yet
>
> http://boinc.us.tt/
>

Thanks, but I'm rather looking for sources for the seti client,
not the boinc client.

Regards Hans

ID: 42051 · Report as offensive
Profile Sir Ulli
Volunteer tester
Avatar

Send message
Joined: 21 Oct 99
Posts: 2246
Credit: 6,136,250
RAC: 0
Germany
Message 42052 - Posted: 1 Nov 2004, 23:55:05 UTC

also a good and intersting Side

http://www.ssl.berkeley.edu/pipermail/boinc_opt/2004-October/date.html

original side

http://www.ssl.berkeley.edu/mailman/listinfo/boinc_opt

Sorry for double posting, but i have Probs ...

no responding...

Greetings from Germany NRW
Ulli S@h Berkeley's Staff Friends Club m7 ©
ID: 42052 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Contributing code? Amd64 build for Windows


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.