Are you ready for the next generation CPU?



Advanced search

Message boards : Number crunching : Are you ready for the next generation CPU?

AuthorMessage
Profile Francois Piednoel
Avatar
Send message
Joined: Jun 14 00
Posts: 898
Credit: 5,969,320
RAC: 0
United States
Message 397881 - Posted 15 Aug 2006 4:38:06 UTC

    demo:
    http://setiathome.berkeley.edu/top_hosts.php

    going up ... going up ...

    FrancoisP

    Profile sterling0466
    Avatar
    Send message
    Joined: Oct 5 00
    Posts: 204
    Credit: 742,621
    RAC: 0
    United States
    Message 397889 - Posted 15 Aug 2006 5:11:47 UTC

      Yes, these new Apple machines are nice, and very fast...but that is partially due to having FOUR processors. Take out three of those four processors and put it up against my AMD Athlon 64 FX 51 and let's see what happens when one processor is tested against one processor. Not trying to brag, start anything, or spread any 'flame' postings...just simply stating the facts. One processor -vs- one processor, fair is fair.

      As I have stated in the past, Apple has had some really cool and great ideas, both hardware and software related...it is a shame that Apple and some IBM/PC Clone company cannot share ideas and see what happens...the entire computer science field would be generations ahead of where we are now.
      ____________

      Profile Francois Piednoel
      Avatar
      Send message
      Joined: Jun 14 00
      Posts: 898
      Credit: 5,969,320
      RAC: 0
      United States
      Message 397893 - Posted 15 Aug 2006 5:18:34 UTC - in response to Message 397889.

        Last modified: 15 Aug 2006 5:21:38 UTC

        Not trying to brag, start anything, or spread any 'flame' postings...just simply stating the facts. One processor -vs- one processor, fair is fair.


        I am actually running on a 975XBX with one processor package only :)

        you have one CPU, and I have one too... I just have 4 cores.

        Francois

        ____________
        who?
        Skulltrail D5400XS

        Profile KWSN - Chicken of Angnor
        Volunteer developer
        Volunteer tester
        Avatar
        Send message
        Joined: Jul 9 99
        Posts: 1199
        Credit: 5,756,696
        RAC: 947
        Austria
        Message 397896 - Posted 15 Aug 2006 5:21:43 UTC

          Francois,

          I saw you seem to be using my code base for your apps :o)

          Now, what's interesting me is the 4 cores part - you wouldn't perchance be running an ES quad-core, would you? To my knowledge, no Core 2 CPU has HT enabled, do they?

          Impressive performance numbers there, to be sure.

          Regards,
          Simon.
          ____________
          Donate to SETI@Home via PayPal!

          Optimized SETI@Home apps + Information

          Profile Francois Piednoel
          Avatar
          Send message
          Joined: Jun 14 00
          Posts: 898
          Credit: 5,969,320
          RAC: 0
          United States
          Message 397900 - Posted 15 Aug 2006 5:29:02 UTC - in response to Message 397896.

            Last modified: 15 Aug 2006 5:29:22 UTC

            Yes, I toke your nice code and re-compiled it for the Merom new instruction, added some more FFT hand coding, and yes, you are seeing a ES Quad core 2 at work, and yes, it is insanely fast. I do that as my hobby, and I am lucky enough to have some nice hardware.

            Thanks for making the recompile easy, as soon as we are done with the tuning, i ll be happy to give you back the entiere code modification.

            I am actually looking at doing 4 FFT in parallel using SIMD. In the case of SETI FFT, we can probably archive 99% SIMDed efficency.

            do you know if anybody tryed before?

            Francois

            Profile KWSN - Chicken of Angnor
            Volunteer developer
            Volunteer tester
            Avatar
            Send message
            Joined: Jul 9 99
            Posts: 1199
            Credit: 5,756,696
            RAC: 947
            Austria
            Message 397903 - Posted 15 Aug 2006 5:35:52 UTC

              Last modified: 15 Aug 2006 6:14:00 UTC

              No, not with 4 in parallel.

              There have been a couple of people who have done some inline assembly for doing two operations at the same time to feed the execution units on pre-Core 2 CPU models, but so far, no X86 CPU has been able to execute as many ops in one cycle so there was no need :o)

              I'm very interested in your code changes - what I'm trying to do with my site and the apps and How-Tos is to gather together as many capable people working on optimizations as possible. Your input is most welcome.

              Regards,
              Simon.
              ____________
              Donate to SETI@Home via PayPal!

              Optimized SETI@Home apps + Information

              Profile Francois Piednoel
              Avatar
              Send message
              Joined: Jun 14 00
              Posts: 898
              Credit: 5,969,320
              RAC: 0
              United States
              Message 397908 - Posted 15 Aug 2006 5:54:37 UTC - in response to Message 397903.


                I'm very interested in your code changes - what I'm trying to do with my site and the apps and How-Tos is to gather together as many capable people working on optimizations as possible. Your input is most welcome.



                I figured out that SmartHeap 8.0 gives some nice % on the seti code, the heap allocation and stack allocation are pretty intensive, and smartheap gave a nice boost.
                Intel compiler 9.1 provide the support for MNI. MKL and IPP new versions are supporting it too. 1 cycle per SSEx instruction is awesome, you can transform most of the algorythm and get much more efficent. I am still exploring it, but the scaling looks WOW...

                Notice that I am using the machine that is crunching seti, for compiling and web browsing etc ... the average of the machine just passed 2800 :)
                Seti is running in the back ground and I dont really feel it.

                Very exciting times!
                FrancoisP



                Ned Ludd
                Volunteer tester
                Avatar
                Send message
                Joined: Apr 3 99
                Posts: 7692
                Credit: 264,326
                RAC: 338
                United States
                Message 397912 - Posted 15 Aug 2006 6:04:22 UTC

                  http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665

                  It may be a single die, but four cores equals four cpus.
                  ____________

                  Profile Francois Piednoel
                  Avatar
                  Send message
                  Joined: Jun 14 00
                  Posts: 898
                  Credit: 5,969,320
                  RAC: 0
                  United States
                  Message 397914 - Posted 15 Aug 2006 6:11:02 UTC - in response to Message 397912.

                    http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665

                    It may be a single die, but four cores equals four cpus.


                    ok :) if you want to count like this, it is ok.

                    Francois

                    Alex Kan
                    Volunteer developer
                    Send message
                    Joined: Dec 4 03
                    Posts: 127
                    Credit: 29,269
                    RAC: 0
                    United States
                    Message 397927 - Posted 15 Aug 2006 7:11:29 UTC - in response to Message 397889.

                      Yes, these new Apple machines are nice, and very fast...but that is partially due to having FOUR processors. Take out three of those four processors and put it up against my AMD Athlon 64 FX 51 and let's see what happens when one processor is tested against one processor. Not trying to brag, start anything, or spread any 'flame' postings...just simply stating the facts. One processor -vs- one processor, fair is fair.

                      If you're going to compare your FX-51 to the G5 Quads, at least put an optimized client on it first! Prior to anything on the Core microarchitecture, I haven't seen anything close to the Quads running v6, in terms of work unit times. (It's also worth nothing that each of those four cores is running its own separate SETI process.)

                      Also, I haven't seen anyone on SETI running a Mac Pro yet. I've been looking to test my Intel Mac clients on them for quite some time, but people I've talked to seem to be waiting to get their money's worth from their current machines before they take the plunge.

                      Francois, I've noticed that your connection to Intel is more than just being on their SETI team and linking to their website, so I'm curious as to what your findings are about SETI performance. You mentioned SSE4--I was of the impression that most of the new instructions are for integer arithmetic, so which of these have actually been useful? Also, you mentioned the idea of hand-coding a replacement FFT and using SIMD to do four FFTs in parallel--is that actually faster than using SIMD for a single FFT at the lengths that SETI uses?

                      And for my final question...who is "we?" :)

                      Profile Francois Piednoel
                      Avatar
                      Send message
                      Joined: Jun 14 00
                      Posts: 898
                      Credit: 5,969,320
                      RAC: 0
                      United States
                      Message 397935 - Posted 15 Aug 2006 7:52:23 UTC - in response to Message 397927.


                        Francois, I've noticed that your connection to Intel is more than just being on their SETI team and linking to their website, so I'm curious as to what your findings are about SETI performance. You mentioned SSE4--I was of the impression that most of the new instructions are for integer arithmetic, so which of these have actually been useful? Also, you mentioned the idea of hand-coding a replacement FFT and using SIMD to do four FFTs in parallel--is that actually faster than using SIMD for a single FFT at the lengths that SETI uses?

                        And for my final question...who is "we?" :)


                        As you probably notice, I am playing with Seti since 2000. Seti is always an interesting problem of distributed computer and the FFT is a chalenge for my little brain by itself.
                        If you look at the FFT using 4 vectors in parallel, you have to try to code your FFT in a way you minimize the penalities: Branching, Memory footprint, and in the case of Core, you want to use as many SSEx 128Bits instruction as you can.
                        to use SIMD efficenly, you want to move your data from Array of Structure to Structure of Structure.

                        For example, in 3D, it is very common to store X,Y,Z,W in memory like this:
                        X,Y,Z,W,X,Y,Z,W,X,Y,Z,W,X,Y,Z,W,X,Y,Z,W... (Array of Structure)

                        The natural way to store your SIMD data is
                        XXXXXXXXXXX....
                        YYYYYYYYYYYY...
                        ZZZZZZZZ.....
                        WWWWWWWWWWW... (Structure of Array)

                        But this have the bad side effect to open more memory streams and most of the modern processors allow only 4 or 8 streams open in the some time.
                        One of my co-worker, AlexK came up with this data structure in 1998 call Structure of Structure:

                        XXXX,YYYY,ZZZZ,WWWW,XXXX,YYYY,ZZZZ,WWWW,XXXX,YYYY,ZZZZ,WWWW...

                        Like this, you access only with one or 2 memory streams, your data locality is tight, and your cache lines get really efficent.

                        What I am doing today in SETI code is simply trying to apply Alex idea to FFT.
                        I ll need few more weeks to get it done, it is a nice mind game, but it should increase dramatically the intruction per clock on the FFT side.

                        Let's be clear, I am doing SETI for fun, I am a very happy/lucky man, my hobby and my Job are very interlaced, i rarely have the feeling of working, intel did not ask me to do anything on seti. Intel gives me access to the best toys I can dream of. Performance is general is a very interesting problem, and not only about computers, I do it as well on cars.

                        Anybody who wants to help on the SIMDized of SETI is welcome :)

                        FrancoisP



                        msattler
                        Volunteer tester
                        Avatar
                        Send message
                        Joined: Jul 9 00
                        Posts: 15614
                        Credit: 44,009,733
                        RAC: 115,094
                        United States
                        Message 397939 - Posted 15 Aug 2006 8:05:53 UTC

                          Very interesting stuff, although I must admit it is far beyond my level of knowledge. Hopefully Simon can make use of some of these ideas or coding schemes in some of his upcoming releases. I very much appreciate the fact that Simon's approach is to elicit input from other programmers, and work together with them for a common cause. He has already done some great work, but who knows what working with other like minded people could come up with? Thank You to Simon and all you others who are willing to share your expertise and work along with him!!
                          ____________
                          4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!!


                          The Genuine Kittyman..........accept no substitutes.



                          Profile sterling0466
                          Avatar
                          Send message
                          Joined: Oct 5 00
                          Posts: 204
                          Credit: 742,621
                          RAC: 0
                          United States
                          Message 398050 - Posted 15 Aug 2006 12:37:24 UTC - in response to Message 397912.

                            http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665

                            It may be a single die, but four cores equals four cpus.



                            Thank you, even people with an Intel or an AMD Dual Core or Core 2 need to realize this fact...you really are running multiple CPUs, they are just packaged into one piece of hardware. (I must admit, my next home unit may just have one of those Dual Core or Core 2 AMD Chips...I just have to mow a heck of a lot of yards over the summer to afford the hardware!!!)
                            ____________

                            Saimek
                            Send message
                            Joined: Jan 25 00
                            Posts: 121
                            Credit: 454,423
                            RAC: 0
                            Poland
                            Message 398068 - Posted 15 Aug 2006 13:25:41 UTC

                              wow.. Kentsfield onboard =) i'm impressed =] i just sold my X2 3800+ and i'm getting an 6400 + gigabyte DS3 =] hoping to get an 3,6 Ghz 24/7 stable overclock... =]
                              ____________

                              msattler
                              Volunteer tester
                              Avatar
                              Send message
                              Joined: Jul 9 00
                              Posts: 15614
                              Credit: 44,009,733
                              RAC: 115,094
                              United States
                              Message 398105 - Posted 15 Aug 2006 14:48:17 UTC - in response to Message 398050.

                                Last modified: 15 Aug 2006 14:53:56 UTC

                                http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665

                                It may be a single die, but four cores equals four cpus.



                                Thank you, even people with an Intel or an AMD Dual Core or Core 2 need to realize this fact...you really are running multiple CPUs, they are just packaged into one piece of hardware. (I must admit, my next home unit may just have one of those Dual Core or Core 2 AMD Chips...I just have to mow a heck of a lot of yards over the summer to afford the hardware!!!)


                                This is not a mystery, Seti knows how many cpus you have. Just click on the computer id and look at the computer summary screen. It reports my Conroe as having 2 cpus, which it does, and Francois' cpu reports 4 cpus, which it has. It just shows up under one computer (host) id. But it is just like having multiple computers installed in one piece of hardware. Kind of like having a small Seti crunching team in one computer.
                                ____________
                                4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!!


                                The Genuine Kittyman..........accept no substitutes.



                                msattler
                                Volunteer tester
                                Avatar
                                Send message
                                Joined: Jul 9 00
                                Posts: 15614
                                Credit: 44,009,733
                                RAC: 115,094
                                United States
                                Message 398109 - Posted 15 Aug 2006 14:53:23 UTC

                                  BTW, would Francois' processor be considered a core 2 quad? I didn't think they had been released yet. Heck, you can't hardly even buy an E6600 or E6700 off the shelf yet. Or do his connections to Intel get him an engineering sample or such?
                                  ____________
                                  4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!!


                                  The Genuine Kittyman..........accept no substitutes.



                                  Paydirt
                                  Send message
                                  Joined: Sep 17 00
                                  Posts: 53
                                  Credit: 37,938
                                  RAC: 0
                                  United States
                                  Message 398124 - Posted 15 Aug 2006 15:39:30 UTC

                                    I think he works for Intel?

                                    I'm surprised by some of the responses I've seen to this thread. People are soo stuck in needing to be right or having to see things one specific way that they cannot get excited about something that is new and cool in computing. So what if it is 4 CPUs? Who cares? Are we trying to prove a point, because Francois isn't trying to prove one.

                                    It's great for SETI and awesome for the volunteer grid computing community! WE ARE ALL IN THIS TOGETHER! Apple, AMD, Intel, IBM, etc. Whatever it takes!

                                    msattler
                                    Volunteer tester
                                    Avatar
                                    Send message
                                    Joined: Jul 9 00
                                    Posts: 15614
                                    Credit: 44,009,733
                                    RAC: 115,094
                                    United States
                                    Message 398187 - Posted 15 Aug 2006 16:50:09 UTC - in response to Message 398124.

                                      Last modified: 15 Aug 2006 16:53:09 UTC

                                      I think he works for Intel?

                                      I'm surprised by some of the responses I've seen to this thread. People are soo stuck in needing to be right or having to see things one specific way that they cannot get excited about something that is new and cool in computing. So what if it is 4 CPUs? Who cares? Are we trying to prove a point, because Francois isn't trying to prove one.

                                      It's great for SETI and awesome for the volunteer grid computing community! WE ARE ALL IN THIS TOGETHER! Apple, AMD, Intel, IBM, etc. Whatever it takes!


                                      Sure Francois is trying to prove a point! He's trying to prove that Intel finally has a butt-kicking architecture available with the new Core 2 cpus. After all, he works for Intel, and I am sure he is excited about what is new and cool in computing, 'cuz Intel is it. I'm sure excited about it, my new X6800 is doing things my AMD FX60 can't even touch!
                                      I think what Francois is doing is absolutely fantastic!! Even if the rest of us cannot afford some of the grand toys that he has access to directly from Intel, what he is doing scales down to the processors that are coming on the market in a price range most of us can afford.
                                      He has not tried to hide the fact that he works for Intel, and he has already said that Intel did not instruct him to work on Seti, I truly believe he is doing this as a very excited hobbyist. And the manner in which he is doing it is beyond reproach...being openly willing to share optimized code.
                                      As far as I am concerned, Francois can beat the Intel drum all he wants. What could be better than an Intel insider who is willing to work with Simon on his optimized apps? This is win-win for everybody!
                                      ____________
                                      4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!!


                                      The Genuine Kittyman..........accept no substitutes.



                                      Bart Barenbrug
                                      Send message
                                      Joined: Jul 7 04
                                      Posts: 52
                                      Credit: 337,401
                                      RAC: 0
                                      Netherlands
                                      Message 398766 - Posted 15 Aug 2006 19:28:54 UTC

                                        Indeed. Parallel is the way to go (us boinc users should know a thing or two about that), and working towards using this kind of parallellism effectively is a great step forward. One day, when we're all using dual-processor machines, with each processor being quad-core, and each of those cores hyperthreaded, we'll still be benefitting from this work (I just don't want to be the one to write the task balancing and task migration code for such a beast, with all the different penalties of migrating a task between hyperthreads on the same core, between cores on the same processor, or between processors etc. *g*).
                                        ____________

                                        Profile KWSN - Chicken of Angnor
                                        Volunteer developer
                                        Volunteer tester
                                        Avatar
                                        Send message
                                        Joined: Jul 9 99
                                        Posts: 1199
                                        Credit: 5,756,696
                                        RAC: 947
                                        Austria
                                        Message 398788 - Posted 15 Aug 2006 19:46:10 UTC - in response to Message 397935.

                                          [...]
                                          If you look at the FFT using 4 vectors in parallel, you have to try to code your FFT in a way you minimize the penalities: Branching, Memory footprint, and in the case of Core, you want to use as many SSEx 128Bits instruction as you can.
                                          to use SIMD efficenly, you want to move your data from Array of Structure to Structure of Structure.

                                          For example, in 3D, it is very common to store X,Y,Z,W in memory like this:
                                          X,Y,Z,W,X,Y,Z,W,X,Y,Z,W,X,Y,Z,W,X,Y,Z,W... (Array of Structure)

                                          The natural way to store your SIMD data is
                                          XXXXXXXXXXX....
                                          YYYYYYYYYYYY...
                                          ZZZZZZZZ.....
                                          WWWWWWWWWWW... (Structure of Array)

                                          But this have the bad side effect to open more memory streams and most of the modern processors allow only 4 or 8 streams open in the some time.
                                          One of my co-worker, AlexK came up with this data structure in 1998 call Structure of Structure:

                                          XXXX,YYYY,ZZZZ,WWWW,XXXX,YYYY,ZZZZ,WWWW,XXXX,YYYY,ZZZZ,WWWW...

                                          Like this, you access only with one or 2 memory streams, your data locality is tight, and your cache lines get really efficent.
                                          [...]
                                          Anybody who wants to help on the SIMDized of SETI is welcome :)

                                          FrancoisP


                                          Salut Francois,

                                          do you believe this could also be adapted for pre-Core 2 CPUs, with 2 FFTs in parallel instead of 4? I'm pretty sure that current code does not specifically do this, as Ben Herndon pointed out to me - you may be interested in his (and Dr. Korpela's) Sourceforge project (regrettably, it's not current code, but has lots of inline assembly as well as some specific code to feed execution units in parallel with minimal penalties).

                                          As others have posted here, I'm very much in favour of getting all people working on optimizations in contact with each other (and hence, pooling resources towards a common goal). Regrettably, I still cannot code C/C++ or indeed know assembly, though those will be skills to acquire in the future.

                                          If you would like, you could head over to my Seti@Home site and register - I would be glad to give you access to the pre-release application board and have your input.

                                          There is already another Intel employee registered - Intel being quite a large company though, you probably don't know each other - his name is Greg Eckert, and he works as Instructor training manager in the Intel Software College.

                                          The more, the merrier ;o)

                                          Regards,
                                          Simon.
                                          ____________
                                          Donate to SETI@Home via PayPal!

                                          Optimized SETI@Home apps + Information

                                          OzzFan
                                          Forum moderator
                                          Avatar
                                          Send message
                                          Joined: Apr 9 02
                                          Posts: 8301
                                          Credit: 10,421,276
                                          RAC: 7,329
                                          United States
                                          Message 398868 - Posted 15 Aug 2006 22:03:26 UTC - in response to Message 398187.

                                            And the manner in which he is doing it is beyond reproach...being openly willing to share optimized code.


                                            I don't mean to be the resident nitpicker, but as I was reading this, I started to get the wrong impression. "Beyond reproach" is a bad thing, as defined by a dictionary:

                                            Noun: reproach
                                            1. A mild rebuke or criticism; "Words of reproach"
                                            2. Disgrace or shame; "He brought reproach upon his family"

                                            Verb: reproach
                                            1. Express criticism towards; "The President reproached the General for his irresponsible behavior"


                                            That isn't what you meant, is it?
                                            ____________
                                            BOINC FAQ Service
                                            BOINC & Optimized SETI download repository

                                            Profile Geek@Play
                                            Volunteer tester
                                            Avatar
                                            Send message
                                            Joined: Jul 31 01
                                            Posts: 1815
                                            Credit: 25,513,942
                                            RAC: 84,403
                                            United States
                                            Message 398872 - Posted 15 Aug 2006 22:09:04 UTC - in response to Message 398868.

                                              Last modified: 15 Aug 2006 22:09:56 UTC

                                              And the manner in which he is doing it is beyond reproach...being openly willing to share optimized code.


                                              I don't mean to be the resident nitpicker, but as I was reading this, I started to get the wrong impression. "Beyond reproach" is a bad thing, as defined by a dictionary:

                                              Noun: reproach
                                              1. A mild rebuke or criticism; "Words of reproach"
                                              2. Disgrace or shame; "He brought reproach upon his family"

                                              Verb: reproach
                                              1. Express criticism towards; "The President reproached the General for his irresponsible behavior"


                                              That isn't what you meant, is it?


                                              The word "beyond" modify's the meaning. Now he cannot be reproached. Now it's a compliment.



                                              ____________
                                              Boinc....Boinc....Boinc....Boinc

                                              OzzFan
                                              Forum moderator
                                              Avatar
                                              Send message
                                              Joined: Apr 9 02
                                              Posts: 8301
                                              Credit: 10,421,276
                                              RAC: 7,329
                                              United States
                                              Message 398952 - Posted 15 Aug 2006 23:35:47 UTC - in response to Message 398872.

                                                The word "beyond" modify's the meaning. Now he cannot be reproached. Now it's a compliment.


                                                Interesting. I took "beyond" to modify it differently, such as "beyond disgrace" or "beyond criticism", like going "beyond the depths of hell". Like a criticism worse than disgrace or contempt.
                                                ____________
                                                BOINC FAQ Service
                                                BOINC & Optimized SETI download repository

                                                archae86
                                                Send message
                                                Joined: Aug 31 99
                                                Posts: 694
                                                Credit: 860,941
                                                RAC: 501
                                                United States
                                                Message 399023 - Posted 16 Aug 2006 2:06:01 UTC - in response to Message 398952.

                                                  Last modified: 16 Aug 2006 2:06:47 UTC

                                                  The word "beyond" modify's the meaning. Now he cannot be reproached. Now it's a compliment.


                                                  Interesting. I took "beyond" to modify it differently, such as "beyond disgrace" or "beyond criticism", like going "beyond the depths of hell". Like a criticism worse than disgrace or contempt.

                                                  Your logic is good, your awareness of common usage is less good.

                                                  It is a common enough idiom to make it into the American Heritage Dictionary, thusly:

                                                  IDIOM: beyond reproach So good as to preclude any possibility of criticism.

                                                  ____________

                                                  Profile KWSN - Chicken of Angnor
                                                  Volunteer developer
                                                  Volunteer tester
                                                  Avatar
                                                  Send message
                                                  Joined: Jul 9 99
                                                  Posts: 1199
                                                  Credit: 5,756,696
                                                  RAC: 947
                                                  Austria
                                                  Message 399033 - Posted 16 Aug 2006 2:15:00 UTC

                                                    Please, let's not digress ;o)

                                                    Constructive intellectual exchange is never a bad thing, though grasp of language or lack thereof kind of wasn't the original topic.

                                                    Anyway, I'd be interested to know whether I'm correct in the assumption that fundamentally, quad-core Core2 chips are feature-identical to current dual-core models.

                                                    I'm not sure whether this is information that is still under NDA or not, so of course understand if you cannot answer Francois ;o)

                                                    Regards,
                                                    Simon.
                                                    ____________
                                                    Donate to SETI@Home via PayPal!

                                                    Optimized SETI@Home apps + Information

                                                    Alex Kan
                                                    Volunteer developer
                                                    Send message
                                                    Joined: Dec 4 03
                                                    Posts: 127
                                                    Credit: 29,269
                                                    RAC: 0
                                                    United States
                                                    Message 399104 - Posted 16 Aug 2006 3:04:45 UTC - in response to Message 397935.

                                                      Last modified: 16 Aug 2006 3:10:14 UTC

                                                      If you look at the FFT using 4 vectors in parallel, you have to try to code your FFT in a way you minimize the penalities: Branching, Memory footprint, and in the case of Core, you want to use as many SSEx 128Bits instruction as you can.

                                                      <snip>

                                                      One of my co-worker, AlexK came up with this data structure in 1998 call Structure of Structure:

                                                      XXXX,YYYY,ZZZZ,WWWW,XXXX,YYYY,ZZZZ,WWWW,XXXX,YYYY,ZZZZ,WWWW...

                                                      Like this, you access only with one or 2 memory streams, your data locality is tight, and your cache lines get really efficent.

                                                      What I am doing today in SETI code is simply trying to apply Alex idea to FFT.
                                                      I ll need few more weeks to get it done, it is a nice mind game, but it should increase dramatically the intruction per clock on the FFT side.

                                                      It sounds like you're planning to write your own FFT implementation. Are you using some other implementation as a base, or are you starting from first principles? I'd always assumed that IPP's FFT performance was already pretty good, and keeping your data in split-complex format already eliminates most of your data shuffling, while still only using two memory streams.

                                                      Also, what effects does this have with regard to the amount of memory touched per FFT (or group of FFTs)? If you're doing four 128K complex-to-complex in-place FFTs simultaneously, you're touching 4 MB per FFT per core, which is already pushing the limits of L2 cache.

                                                      Randy Hancock
                                                      Avatar
                                                      Send message
                                                      Joined: Aug 10 06
                                                      Posts: 169
                                                      Credit: 220,579
                                                      RAC: 0
                                                      United States
                                                      Message 399110 - Posted 16 Aug 2006 3:13:00 UTC

                                                        I run 4 cpu's myself and 16 Gig of ram so far it's been really good I run seti 24/7 and none of my other programs have slowed down while seti is running

                                                        Profile Francois Piednoel
                                                        Avatar
                                                        Send message
                                                        Joined: Jun 14 00
                                                        Posts: 898
                                                        Credit: 5,969,320
                                                        RAC: 0
                                                        United States
                                                        Message 399160 - Posted 16 Aug 2006 4:47:03 UTC - in response to Message 399110.

                                                          Last modified: 16 Aug 2006 4:47:56 UTC

                                                          I run 4 cpu's myself and 16 Gig of ram so far it's been really good I run seti 24/7 and none of my other programs have slowed down while seti is running


                                                          Based on your rendering time,and your log, you are not using optimized code, please go to Simon web site and get an SSE2 or SSE3 binaries, and install it.
                                                          you have to drop the XML in the project Seti directory, and his faster binary.

                                                          good luck ;)
                                                          FrancoisP

                                                          msattler
                                                          Volunteer tester
                                                          Avatar
                                                          Send message
                                                          Joined: Jul 9 00
                                                          Posts: 15614
                                                          Credit: 44,009,733
                                                          RAC: 115,094
                                                          United States
                                                          Message 399178 - Posted 16 Aug 2006 7:09:31 UTC - in response to Message 398868.

                                                            Last modified: 16 Aug 2006 7:17:04 UTC

                                                            And the manner in which he is doing it is beyond reproach...being openly willing to share optimized code.


                                                            I don't mean to be the resident nitpicker, but as I was reading this, I started to get the wrong impression. "Beyond reproach" is a bad thing, as defined by a dictionary:

                                                            Noun: reproach
                                                            1. A mild rebuke or criticism; "Words of reproach"
                                                            2. Disgrace or shame; "He brought reproach upon his family"

                                                            Verb: reproach
                                                            1. Express criticism towards; "The President reproached the General for his irresponsible behavior"


                                                            That isn't what you meant, is it?


                                                            Oh, lord no! I posted that shortly before I left for work today, just got home and started catching up on the forums.
                                                            What I meant to say, I think, was "above reproach", meaning that I do not believe his actions can, could, or should be criticized. I hope that the overall tone of my post conveyed the proper sentiment.
                                                            And no, I am not at all offended by you questioning my wording. I am certainly not an English major.
                                                            EDIT....and after reading the rest of the posts on the subject, maybe my usage was OK after all. I think in the context of the whole post at least, my intended meaning came through. But I would rather this post continue with the open disussions of Alex, Simon, and Francois. Or any other posts concerning the number crunching optimization they are working on.
                                                            ____________
                                                            4 kitties on a Seti mission...Meeeeeeooowwwrrrrr!!!


                                                            The Genuine Kittyman..........accept no substitutes.



                                                            Ned Ludd
                                                            Volunteer tester
                                                            Avatar
                                                            Send message
                                                            Joined: Apr 3 99
                                                            Posts: 7692
                                                            Credit: 264,326
                                                            RAC: 338
                                                            United States
                                                            Message 399398 - Posted 16 Aug 2006 16:35:54 UTC - in response to Message 397914.

                                                              Last modified: 16 Aug 2006 16:36:15 UTC

                                                              http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665

                                                              It may be a single die, but four cores equals four cpus.


                                                              ok :) if you want to count like this, it is ok.

                                                              Francois

                                                              Back in the day, a single CPU was hundreds of vacuum tubes or a few thousand transistors.

                                                              Then we got integrated circuits, and a single (more capable) CPU was thousands of chips spread across dozens of circuit boards....

                                                              ... then we could squeeze all of that onto one board.

                                                              Then the 4004 came out, and we had all of that on one chip.

                                                              Then dual-core chips which behave exactly like you had two chips on the same motherboard.

                                                              Quad-core chips are no different.

                                                              It seems intuitively obvious that the CPU is not packaging, and I see no valid engineering reason to call a chip with four CPUs anything other than four CPUs.
                                                              ____________

                                                              Josef W. Segur
                                                              Volunteer developer
                                                              Volunteer tester
                                                              Send message
                                                              Joined: Oct 30 99
                                                              Posts: 2244
                                                              Credit: 404,655
                                                              RAC: 450
                                                              United States
                                                              Message 399409 - Posted 16 Aug 2006 16:45:17 UTC - in response to Message 399398.

                                                                http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665

                                                                It may be a single die, but four cores equals four cpus.


                                                                ok :) if you want to count like this, it is ok.

                                                                Francois

                                                                Back in the day, a single CPU was hundreds of vacuum tubes or a few thousand transistors.

                                                                Then we got integrated circuits, and a single (more capable) CPU was thousands of chips spread across dozens of circuit boards....

                                                                ... then we could squeeze all of that onto one board.

                                                                Then the 4004 came out, and we had all of that on one chip.


                                                                Somewhat later, many boards had a 386 for integer calculations and a 387 FPU for floating point.

                                                                Then the FPU was integrated, then a second FPU was integrated.

                                                                How do we count these?

                                                                Ned Ludd
                                                                Volunteer tester
                                                                Avatar
                                                                Send message
                                                                Joined: Apr 3 99
                                                                Posts: 7692
                                                                Credit: 264,326
                                                                RAC: 338
                                                                United States
                                                                Message 399443 - Posted 16 Aug 2006 17:20:59 UTC - in response to Message 399409.

                                                                  http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665

                                                                  It may be a single die, but four cores equals four cpus.


                                                                  ok :) if you want to count like this, it is ok.

                                                                  Francois

                                                                  Back in the day, a single CPU was hundreds of vacuum tubes or a few thousand transistors.

                                                                  Then we got integrated circuits, and a single (more capable) CPU was thousands of chips spread across dozens of circuit boards....

                                                                  ... then we could squeeze all of that onto one board.

                                                                  Then the 4004 came out, and we had all of that on one chip.


                                                                  Somewhat later, many boards had a 386 for integer calculations and a 387 FPU for floating point.

                                                                  Then the FPU was integrated, then a second FPU was integrated.

                                                                  How do we count these?


                                                                  As one processor. An 80387 by itself is practically useless.

                                                                  Packaging does not count (unless you are in marketing, then it's the only thing).
                                                                  ____________

                                                                  OzzFan
                                                                  Forum moderator
                                                                  Avatar
                                                                  Send message
                                                                  Joined: Apr 9 02
                                                                  Posts: 8301
                                                                  Credit: 10,421,276
                                                                  RAC: 7,329
                                                                  United States
                                                                  Message 399550 - Posted 16 Aug 2006 20:21:07 UTC - in response to Message 399178.

                                                                    EDIT....and after reading the rest of the posts on the subject, maybe my usage was OK after all. I think in the context of the whole post at least, my intended meaning came through. But I would rather this post continue with the open disussions of Alex, Simon, and Francois. Or any other posts concerning the number crunching optimization they are working on.


                                                                    Fair enough. Apparently I was wrong, as I was not aware of this "common idiom" (though, you'd think if it were so common, I would have heard of it, but I digress). My apologies for hijacking the thread on this matter, 'twas not my intention. This will be the last I speak of on this matter.

                                                                    Please continue with the original topic.
                                                                    ____________
                                                                    BOINC FAQ Service
                                                                    BOINC & Optimized SETI download repository

                                                                    OzzFan
                                                                    Forum moderator
                                                                    Avatar
                                                                    Send message
                                                                    Joined: Apr 9 02
                                                                    Posts: 8301
                                                                    Credit: 10,421,276
                                                                    RAC: 7,329
                                                                    United States
                                                                    Message 399561 - Posted 16 Aug 2006 20:26:10 UTC - in response to Message 399409.

                                                                      Somewhat later, many boards had a 386 for integer calculations and a 387 FPU for floating point.

                                                                      Then the FPU was integrated, then a second FPU was integrated.

                                                                      How do we count these?


                                                                      I'd have to agree with Ned here. An 80387 is actually a co-processor, not a central processing unit, meaning it requires a main processor to operate.

                                                                      Thusly, on multicore processors, they have multiple CPUs (all being main processors capable of individual calculations without requiring a host processor) in one packaging. Essentially a dual core processor has two CPUs in one package. You could disable one through software (theoretically) and still be able to operate with the other CPU.
                                                                      ____________
                                                                      BOINC FAQ Service
                                                                      BOINC & Optimized SETI download repository

                                                                      Profile Benher
                                                                      Volunteer developer
                                                                      Volunteer tester
                                                                      Send message
                                                                      Joined: Jul 25 99
                                                                      Posts: 517
                                                                      Credit: 465,152
                                                                      RAC: 0
                                                                      United States
                                                                      Message 399616 - Posted 16 Aug 2006 21:36:42 UTC - in response to Message 399104.

                                                                        Last modified: 16 Aug 2006 21:39:20 UTC

                                                                        It sounds like you're planning to write your own FFT implementation. Are you using some other implementation as a base, or are you starting from first principles? I'd always assumed that IPP's FFT performance was already pretty good, and keeping your data in split-complex format already eliminates most of your data shuffling, while still only using two memory streams.

                                                                        Also, what effects does this have with regard to the amount of memory touched per FFT (or group of FFTs)? If you're doing four 128K complex-to-complex in-place FFTs simultaneously, you're touching 4 MB per FFT per core, which is already pushing the limits of L2 cache.


                                                                        Hi Alex (and Francois),

                                                                        Back in the day...pre FFTW3...I converted ooura's FFT to simd. (SSE only). Its on the sourceforge pages.

                                                                        Didn't get around to benchmarking it against FTTW3 but I think the ooura SIMD was about the same speed as intel's (at that time).

                                                                        The benchmark pages on www.fftw.org showed that FFTW beat everyone's FFTs except intel's IPP in speed.

                                                                        The main problems with FFT at the larger sizes (32K, 64K, 128K...) were memory and cache access times...although with Hypertransport and DDR2 that may no longer be the case. Reorganizing in Francois way avoids a lot of twiddling...and the SSE3 opcodes for sideways adds and subs should also speed things up.

                                                                        But the biggest boost I believe, would be some method to compute passes over L1 or L2 cache sized blocks of data. These would have to include all memory used for the computation, and somehow localizing it in blocks.

                                                                        Just my 2c

                                                                        P.S.: Hey Francois...you work at Intel...obviously you are a coder...probably a coder at intel also. Maybe you can get them to change the IPP Libraries CPU identification code to remove that check for "GenuineIntel" and just check the flags for SSE, SSE2, and SSE3 on any CPU brand. ;)

                                                                        Message boards : Number crunching : Are you ready for the next generation CPU?

                                                                        Copyright © 2009 University of California