Message boards :
Number crunching :
Are you ready for the next generation CPU?
Message board moderation
| Author | Message |
|---|---|
Benher Send message Joined: 25 Jul 99 Posts: 517 Credit: 465,152 RAC: 0
|
It sounds like you're planning to write your own FFT implementation. Are you using some other implementation as a base, or are you starting from first principles? I'd always assumed that IPP's FFT performance was already pretty good, and keeping your data in split-complex format already eliminates most of your data shuffling, while still only using two memory streams. Hi Alex (and Francois), Back in the day...pre FFTW3...I converted ooura's FFT to simd. (SSE only). Its on the sourceforge pages. Didn't get around to benchmarking it against FTTW3 but I think the ooura SIMD was about the same speed as intel's (at that time). The benchmark pages on www.fftw.org showed that FFTW beat everyone's FFTs except intel's IPP in speed. The main problems with FFT at the larger sizes (32K, 64K, 128K...) were memory and cache access times...although with Hypertransport and DDR2 that may no longer be the case. Reorganizing in Francois way avoids a lot of twiddling...and the SSE3 opcodes for sideways adds and subs should also speed things up. But the biggest boost I believe, would be some method to compute passes over L1 or L2 cache sized blocks of data. These would have to include all memory used for the computation, and somehow localizing it in blocks. Just my 2c P.S.: Hey Francois...you work at Intel...obviously you are a coder...probably a coder at intel also. Maybe you can get them to change the IPP Libraries CPU identification code to remove that check for "GenuineIntel" and just check the flags for SSE, SSE2, and SSE3 on any CPU brand. ;) |
OzzFan ![]() Send message Joined: 9 Apr 02 Posts: 15687 Credit: 84,761,841 RAC: 62
|
Somewhat later, many boards had a 386 for integer calculations and a 387 FPU for floating point. I'd have to agree with Ned here. An 80387 is actually a co-processor, not a central processing unit, meaning it requires a main processor to operate. Thusly, on multicore processors, they have multiple CPUs (all being main processors capable of individual calculations without requiring a host processor) in one packaging. Essentially a dual core processor has two CPUs in one package. You could disable one through software (theoretically) and still be able to operate with the other CPU. |
OzzFan ![]() Send message Joined: 9 Apr 02 Posts: 15687 Credit: 84,761,841 RAC: 62
|
EDIT....and after reading the rest of the posts on the subject, maybe my usage was OK after all. I think in the context of the whole post at least, my intended meaning came through. But I would rather this post continue with the open disussions of Alex, Simon, and Francois. Or any other posts concerning the number crunching optimization they are working on. Fair enough. Apparently I was wrong, as I was not aware of this "common idiom" (though, you'd think if it were so common, I would have heard of it, but I digress). My apologies for hijacking the thread on this matter, 'twas not my intention. This will be the last I speak of on this matter. Please continue with the original topic. |
|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0
|
http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665 As one processor. An 80387 by itself is practically useless. Packaging does not count (unless you are in marketing, then it's the only thing). |
|
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0
|
http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665 Somewhat later, many boards had a 386 for integer calculations and a 387 FPU for floating point. Then the FPU was integrated, then a second FPU was integrated. How do we count these? |
|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0
|
http://setiathome.berkeley.edu/show_host_detail.php?hostid=2302665 Back in the day, a single CPU was hundreds of vacuum tubes or a few thousand transistors. Then we got integrated circuits, and a single (more capable) CPU was thousands of chips spread across dozens of circuit boards.... ... then we could squeeze all of that onto one board. Then the 4004 came out, and we had all of that on one chip. Then dual-core chips which behave exactly like you had two chips on the same motherboard. Quad-core chips are no different. It seems intuitively obvious that the CPU is not packaging, and I see no valid engineering reason to call a chip with four CPUs anything other than four CPUs. |
kittyman ![]() Send message Joined: 9 Jul 00 Posts: 50494 Credit: 1,018,363,574 RAC: 2,276
|
And the manner in which he is doing it is beyond reproach...being openly willing to share optimized code. Oh, lord no! I posted that shortly before I left for work today, just got home and started catching up on the forums. What I meant to say, I think, was "above reproach", meaning that I do not believe his actions can, could, or should be criticized. I hope that the overall tone of my post conveyed the proper sentiment. And no, I am not at all offended by you questioning my wording. I am certainly not an English major. EDIT....and after reading the rest of the posts on the subject, maybe my usage was OK after all. I think in the context of the whole post at least, my intended meaning came through. But I would rather this post continue with the open disussions of Alex, Simon, and Francois. Or any other posts concerning the number crunching optimization they are working on. "Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein "With cats." kittyman
|
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0
|
I run 4 cpu's myself and 16 Gig of ram so far it's been really good I run seti 24/7 and none of my other programs have slowed down while seti is running Based on your rendering time,and your log, you are not using optimized code, please go to Simon web site and get an SSE2 or SSE3 binaries, and install it. you have to drop the XML in the project Seti directory, and his faster binary. good luck ;) FrancoisP |
|
Randy Hancock Send message Joined: 10 Aug 06 Posts: 169 Credit: 220,579 RAC: 0
|
I run 4 cpu's myself and 16 Gig of ram so far it's been really good I run seti 24/7 and none of my other programs have slowed down while seti is running |
|
Alex Kan Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0
|
If you look at the FFT using 4 vectors in parallel, you have to try to code your FFT in a way you minimize the penalities: Branching, Memory footprint, and in the case of Core, you want to use as many SSEx 128Bits instruction as you can. It sounds like you're planning to write your own FFT implementation. Are you using some other implementation as a base, or are you starting from first principles? I'd always assumed that IPP's FFT performance was already pretty good, and keeping your data in split-complex format already eliminates most of your data shuffling, while still only using two memory streams. Also, what effects does this have with regard to the amount of memory touched per FFT (or group of FFTs)? If you're doing four 128K complex-to-complex in-place FFTs simultaneously, you're touching 4 MB per FFT per core, which is already pushing the limits of L2 cache. |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0
|
Please, let's not digress ;o) Constructive intellectual exchange is never a bad thing, though grasp of language or lack thereof kind of wasn't the original topic. Anyway, I'd be interested to know whether I'm correct in the assumption that fundamentally, quad-core Core2 chips are feature-identical to current dual-core models. I'm not sure whether this is information that is still under NDA or not, so of course understand if you cannot answer Francois ;o) Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
|
archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0
|
The word "beyond" modify's the meaning. Now he cannot be reproached. Now it's a compliment. Your logic is good, your awareness of common usage is less good. It is a common enough idiom to make it into the American Heritage Dictionary, thusly: IDIOM: beyond reproach So good as to preclude any possibility of criticism. |
OzzFan ![]() Send message Joined: 9 Apr 02 Posts: 15687 Credit: 84,761,841 RAC: 62
|
The word "beyond" modify's the meaning. Now he cannot be reproached. Now it's a compliment. Interesting. I took "beyond" to modify it differently, such as "beyond disgrace" or "beyond criticism", like going "beyond the depths of hell". Like a criticism worse than disgrace or contempt. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0
|
And the manner in which he is doing it is beyond reproach...being openly willing to share optimized code. The word "beyond" modify's the meaning. Now he cannot be reproached. Now it's a compliment. Boinc....Boinc....Boinc....Boinc.... |
OzzFan ![]() Send message Joined: 9 Apr 02 Posts: 15687 Credit: 84,761,841 RAC: 62
|
And the manner in which he is doing it is beyond reproach...being openly willing to share optimized code. I don't mean to be the resident nitpicker, but as I was reading this, I started to get the wrong impression. "Beyond reproach" is a bad thing, as defined by a dictionary: Noun: reproach That isn't what you meant, is it? |
KWSN - Chicken of Angnor Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0
|
[...] Salut Francois, do you believe this could also be adapted for pre-Core 2 CPUs, with 2 FFTs in parallel instead of 4? I'm pretty sure that current code does not specifically do this, as Ben Herndon pointed out to me - you may be interested in his (and Dr. Korpela's) Sourceforge project (regrettably, it's not current code, but has lots of inline assembly as well as some specific code to feed execution units in parallel with minimal penalties). As others have posted here, I'm very much in favour of getting all people working on optimizations in contact with each other (and hence, pooling resources towards a common goal). Regrettably, I still cannot code C/C++ or indeed know assembly, though those will be skills to acquire in the future. If you would like, you could head over to my Seti@Home site and register - I would be glad to give you access to the pre-release application board and have your input. There is already another Intel employee registered - Intel being quite a large company though, you probably don't know each other - his name is Greg Eckert, and he works as Instructor training manager in the Intel Software College. The more, the merrier ;o) Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
|
Bart Barenbrug Send message Joined: 7 Jul 04 Posts: 52 Credit: 337,401 RAC: 0
|
Indeed. Parallel is the way to go (us boinc users should know a thing or two about that), and working towards using this kind of parallellism effectively is a great step forward. One day, when we're all using dual-processor machines, with each processor being quad-core, and each of those cores hyperthreaded, we'll still be benefitting from this work (I just don't want to be the one to write the task balancing and task migration code for such a beast, with all the different penalties of migrating a task between hyperthreads on the same core, between cores on the same processor, or between processors etc. *g*). |
kittyman ![]() Send message Joined: 9 Jul 00 Posts: 50494 Credit: 1,018,363,574 RAC: 2,276
|
I think he works for Intel? Sure Francois is trying to prove a point! He's trying to prove that Intel finally has a butt-kicking architecture available with the new Core 2 cpus. After all, he works for Intel, and I am sure he is excited about what is new and cool in computing, 'cuz Intel is it. I'm sure excited about it, my new X6800 is doing things my AMD FX60 can't even touch! I think what Francois is doing is absolutely fantastic!! Even if the rest of us cannot afford some of the grand toys that he has access to directly from Intel, what he is doing scales down to the processors that are coming on the market in a price range most of us can afford. He has not tried to hide the fact that he works for Intel, and he has already said that Intel did not instruct him to work on Seti, I truly believe he is doing this as a very excited hobbyist. And the manner in which he is doing it is beyond reproach...being openly willing to share optimized code. As far as I am concerned, Francois can beat the Intel drum all he wants. What could be better than an Intel insider who is willing to work with Simon on his optimized apps? This is win-win for everybody! "Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein "With cats." kittyman
|
|
Paydirt Send message Joined: 17 Sep 00 Posts: 53 Credit: 37,938 RAC: 0
|
I think he works for Intel? I'm surprised by some of the responses I've seen to this thread. People are soo stuck in needing to be right or having to see things one specific way that they cannot get excited about something that is new and cool in computing. So what if it is 4 CPUs? Who cares? Are we trying to prove a point, because Francois isn't trying to prove one. It's great for SETI and awesome for the volunteer grid computing community! WE ARE ALL IN THIS TOGETHER! Apple, AMD, Intel, IBM, etc. Whatever it takes! |
kittyman ![]() Send message Joined: 9 Jul 00 Posts: 50494 Credit: 1,018,363,574 RAC: 2,276
|
BTW, would Francois' processor be considered a core 2 quad? I didn't think they had been released yet. Heck, you can't hardly even buy an E6600 or E6700 off the shelf yet. Or do his connections to Intel get him an engineering sample or such? "Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein "With cats." kittyman
|
©2020 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.