Core 2 comparison - Xeon E5320 vs E6300

Author	Message
OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 475247 - Posted: 7 Dec 2006, 0:24:12 UTC - in response to Message 475178. Last modified: 7 Dec 2006, 0:26:38 UTC I would suspect you are seeing a prime example of the bus-memory bottleneck in it's full 8 thread glory. To make it worse, the Xeon memory latency of FB Dimm at 5-5-5-15 compounds the problem. If I understand the bandwidth correctly for a better apples to apples example, your Allendale is running 2 cores @ 1066FSB w/CL3(?) memory latency vs. the Quad Xeons essentially sharing FSB bandwidth for @ equivalent performance of a Allendale running at 533FSB, and at CL5 to add insult. The newer Socket 771 Xeon boards support a dual FSB, so each Quad Core does not have to share it's FSB with the other socket, getting a full 12GB/s througput (assuming 1.333GHz FSB @ 72bit accesses [including ECC bits]) per socket, which is not enough to saturate a 32GB/s througput of the RAM. Of course, his RAM isn't running at full potential since it is not in dual channel interleave mode, so perhaps they are fighting for RAM accesses. If S@H's performance is most influenced by processor clock, cache, memory speed, and FSB in this order (is this correct?), you're likely benefitting w/ the extra cache w/ the 5320, at a "wash" w/ clock speed, and disadvantage w/ memory and FSB speed vs. a straight compare to your E6300. More threads, yes, but operating at substantially reduced efficiency to where the total benefit scales at a somewhat disappointing fraction of the implied potential. http://www.insight64.com/downloads/IntelligentDesign.pdf Personally, I expected to see the dual socket operate at maybe @ 70% of the efficiency of 1 socket. Your 50% results (time vs. time) are sobering. Just these last few days, my debate has been between 2-CPU 5320 or 1 moderately OC QX6700. My perception has been to wait for the 45nm chips to see if they open the bottleneck before jumping into a dual socket. Unless you find a magic fix for your woes, you experience may be good inspiration for others to look at QX solutions w/ some overclocking and fast memory for optimal crunching consideration. According to Intel, the Quad Core parts are only supposed to acheive a maximum of 1.5x (or 150%) boost over dual core parts, assuming a properly programmed application that can work with Intel's advanced pre-fetch cache. ID: 475247 ·

Gecko Volunteer tester Send message Joined: 17 Nov 99 Posts: 454 Credit: 6,946,910 RAC: 47	Message 475314 - Posted: 7 Dec 2006, 1:35:38 UTC - in response to Message 475247. Last modified: 7 Dec 2006, 1:39:00 UTC The newer Socket 771 Xeon boards support a dual FSB, so each Quad Core does not have to share it's FSB with the other socket, getting a full 12GB/s througput (assuming 1.333GHz FSB @ 72bit accesses [including ECC bits]) per socket, which is not enough to saturate a 32GB/s througput of the RAM. Of course, his RAM isn't running at full potential since it is not in dual channel interleave mode, so perhaps they are fighting for RAM accesses. In the Quad core, doesn't the main slowdown occur for ex. when die 1 and die 2 each running at 1066 for example, are both competing for simultaneous bus access at 1066 to the memory controller? In essence, 2 separate dies (of 4 cores) sharing the same bus access to the memory controller, per socket? The two cores on each die however, move data at full speed between them on the same die, right? In an Allendale/Conroe, only 2 cores compete for the same FSB access. Wouldn't the memory controller also contribute to part of the performance hit coordinating twice the traffic w/ 2 busses in a 2 socket config. between the memory banks and individual cores? ID: 475314 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 475368 - Posted: 7 Dec 2006, 2:39:52 UTC - in response to Message 475314. Last modified: 7 Dec 2006, 2:47:28 UTC In the Quad core, doesn't the main slowdown occur for ex. when die 1 and die 2 each running at 1066 for example, are both competing for simultaneous bus access at 1066 to the memory controller? In essence, 2 separate dies (of 4 cores) sharing the same bus access to the memory controller, per socket? The two cores on each die however, move data at full speed between them on the same die, right? In an Allendale/Conroe, only 2 cores compete for the same FSB access. Ah, yes. I thought you were trying to say that both Quad Cores had to compete for the same FSB, which is not true. But yes, every core in the Quad Core chip has to compete for a single FSB, whether it be 1066MHz or 1333MHz (1066 in the case of the 5320). Sorry, I misunderstood what you said. Wouldn't the memory controller also contribute to part of the performance hit coordinating twice the traffic w/ 2 busses in a 2 socket config. between the memory banks and individual cores? Not necessarily. That would depend on if Intel increased the speed and bandwidth of the MCH to handle the extra traffic. But yes, there will be a small latency involved being that it has to go through the traces on the motherboard as opposed to having the MCH built right into the CPU, as with the AMDs. However, that can be alleviated with a good code/data pre-fetch unit to keep the L2 and L1 caches full with the appropriate code/data. Intel has claimed, and some tests have confirmed, their accuracy this pre-fetch unit is up to 99%. On many of these tests, the Xeons have shown to out-perform the Opterons in almost every area (with a few exceptions). As long as you can keep the CPUs busy, it should be hard to notice a performance penalty using an external MCH. ID: 475368 ·

Gecko Volunteer tester Send message Joined: 17 Nov 99 Posts: 454 Credit: 6,946,910 RAC: 47	Message 475422 - Posted: 7 Dec 2006, 3:54:53 UTC - in response to Message 475368. Last modified: 7 Dec 2006, 4:08:47 UTC Sorry, I misunderstood what you said. No worries, I just meant the FSB inefficiencies of a Quad in a 2-cpu would be multiplied by same. The E6300 comparison would be really interesting in terms of processing per core once the memory access question of the E5320 is resolved. For S@H, 2x4 Xeons may not be a great leap over a 1x4 Kentsfield that it would logically suggest, especially since QX6700 can be OC'd and use the full range of performance memory etc. The greater inefficiencies going from 2, to 4 to 8 CPUs result in compromised performance & diminishing returns based on current board & chipset designs. Don't get me wrong, I'd gladly run a Clovertown if I had access, but don't see it as a performance/value proposition if considering building a cruncher doubling as a light duty workstation. That would depend on if Intel increased the speed and bandwidth of the MCH to handle the extra traffic. But yes, there will be a small latency involved being that it has to go through the traces on the motherboard as opposed to having the MCH built right into the CPU, as with the AMDs. However, that can be alleviated with a good code/data pre-fetch unit to keep the L2 and L1 caches full with the appropriate code/data. Intel has claimed, and some tests have confirmed, their accuracy this pre-fetch unit is up to 99%. On many of these tests, the Xeons have shown to out-perform the Opterons in almost every area (with a few exceptions). As long as you can keep the CPUs busy, it should be hard to notice a performance penalty using an external MCH. Quite interesting and informative. Thanks for the explanation. ID: 475422 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 475495 - Posted: 7 Dec 2006, 6:14:23 UTC Last modified: 7 Dec 2006, 6:14:38 UTC I wonder if Crunch3r's BOINC client 5.7.5 with affinity would improve the results? I think it would with 4 different L2 caches. Dublin, California Team: SETI.USA ID: 475495 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 475520 - Posted: 7 Dec 2006, 7:32:42 UTC - in response to Message 475495. I wonder if Crunch3r's BOINC client 5.7.5 with affinity would improve the results? I think it would with 4 different L2 caches. I wondered about that, and I'm currently running Trux's 5.3.12.tx36 affinity client. Doesn't seem to have made much difference. ID: 475520 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 475540 - Posted: 7 Dec 2006, 8:46:02 UTC - in response to Message 475520. I wonder if Crunch3r's BOINC client 5.7.5 with affinity would improve the results? I think it would with 4 different L2 caches. I wondered about that, and I'm currently running Trux's 5.3.12.tx36 affinity client. Doesn't seem to have made much difference. When you get a chance, I would like to see if Crunch3r's version has a different result. Dublin, California Team: SETI.USA ID: 475540 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 475549 - Posted: 7 Dec 2006, 9:37:57 UTC - in response to Message 475540. I wonder if Crunch3r's BOINC client 5.7.5 with affinity would improve the results? I think it would with 4 different L2 caches. I wondered about that, and I'm currently running Trux's 5.3.12.tx36 affinity client. Doesn't seem to have made much difference. When you get a chance, I would like to see if Crunch3r's version has a different result. OK will do, probably over the weekend as I've got to go and get the 6300s bedded down in their new domain now. ID: 475549 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 476421 - Posted: 8 Dec 2006, 12:25:07 UTC I've now installed Crunch3r's client: 08/12/2006 11:59:16\|\|BOINC 5.7.5.32 - 32 bit Edition by Crunch3r 08/12/2006 11:59:16\|\|enabled features: 08/12/2006 11:59:16\|\|-cpu_affinity 08/12/2006 11:59:16\|\|-return_results_immediately 08/12/2006 11:59:16\|\| 08/12/2006 11:59:16\|\| 08/12/2006 11:59:16\|\|Suspending network activity - user request 08/12/2006 11:59:18\|\|Running CPU benchmarks 08/12/2006 11:59:28\|\|Resuming network activity 08/12/2006 12:00:17\|\|Benchmark results: 08/12/2006 12:00:17\|\| Number of CPUs: 8 08/12/2006 12:00:17\|\| 1748 floating point MIPS (Whetstone) per CPU 08/12/2006 12:00:17\|\| 1581 integer MIPS (Dhrystone) per CPU (Sorry Rom, didn't mean to include RRI on a machine as fast as this, but it seems to come with the territory. Crunch3r's documentation is, shall we say, sparse: does anyone know if there's a switch to disable RRI while keeping affinity?) I'll start a new series on the chart and post it when I've got a reasonable number of data points. Two observations to be getting on with: 1) Crunch3r's benchmark is much slower than the one I posted at message 475123 for the Xeons, especially the Dhrystone. May be an artefact of the way I triggered them - I'll do a comparison run later. 2) I've noticed that there's a very big difference between the computers for VHAR WUs (AR > 1.25). The E6300s did these particularly well: the Xeon 5320 does them particularly badly. Some rough figures: E6300 did about 32 credits/hour on 'standard' 0.42AR: VHAR are scattered, but a bit of a cluster around 37 credits/hour (i.e. faster than standard) E5320 does about 18 credits/hour on 'standard' 0.42AR: VHAR are again scattered, but in the range 8 - 15 credits/hour (i.e. significantly slower than standard). This really messes up the RDCF: on standard units I can go down to about 0.35 RDCF, but I've just reported some VHARs which have taken DCF up to over 1.1! Really messes up the cache fetch calculations.... Observations, anyone? [P.S. for new readers - all timings etc. taken with Simon the Chicken's (and friends') app version 1.41] ID: 476421 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 476599 - Posted: 8 Dec 2006, 14:16:24 UTC - in response to Message 476421. I've now installed Crunch3r's client: 2) I've noticed that there's a very big difference between the computers for VHAR WUs (AR > 1.25). The E6300s did these particularly well: the Xeon 5320 does them particularly badly. [...] [P.S. for new readers - all timings etc. taken with Simon the Chicken's (and friends') app version 1.41] Perhaps the culprit could be the 1.41 app? I wonder what the comparison would look like with the stock app? Maybe it is not that the quad is so much slower, but that the 1.41 app improves the dual core to a higher degree? Dublin, California Team: SETI.USA ID: 476599 ·

Reuben Gathright Send message Joined: 8 Mar 01 Posts: 213 Credit: 14,594,579 RAC: 0	Message 476638 - Posted: 8 Dec 2006, 15:10:00 UTC Alright, look up my Core 2 Based Xeon system and you will find that even after OVERCLOCKING MY DUAL 5120 XEON, using an optimized version of BOINC with affinity and running chicken's app... I am still slower than most Core 2 systems. Reasons: 1) ECC memory is slower. 2) FB-Dimms have latency issues because of the their embedded control chips. The good news is that even if I get beat WU for WU every day by single socket motherboards... as soon as I put in two quad chips no one can beat the total crunching power. Overclock with the MSI G31M3-L and Intel E8600 3.33Ghz Intel D865GLC Socket 478 Motherboard ~How To Overclock The Eee ASUS 1005HA Netbook To 1.9Ghz~ ID: 476638 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 476716 - Posted: 8 Dec 2006, 18:02:26 UTC - in response to Message 476599. Last modified: 8 Dec 2006, 18:03:01 UTC I've now installed Crunch3r's client: 2) I've noticed that there's a very big difference between the computers for VHAR WUs (AR > 1.25). The E6300s did these particularly well: the Xeon 5320 does them particularly badly. [...] [P.S. for new readers - all timings etc. taken with Simon the Chicken's (and friends') app version 1.41] Perhaps the culprit could be the 1.41 app? I wonder what the comparison would look like with the stock app? Maybe it is not that the quad is so much slower, but that the 1.41 app improves the dual core to a higher degree? Well, that's partly why I started this thread in the first place - to see if that's a common experience: and if it is, perhaps to give a heads-up to the optimisers that their work isn't finished yet.... (sorry about that, folks - I'm really appreciative of all the work you've done already). Also, if we can get some more experiences recorded, it might help people choosing their next rig. ID: 476716 ·

jrmy Send message Joined: 14 Jun 04 Posts: 10 Credit: 21,475 RAC: 0	Message 480348 - Posted: 11 Dec 2006, 21:47:05 UTC - in response to Message 476716. Last modified: 11 Dec 2006, 21:47:40 UTC Well, that's partly why I started this thread in the first place - to see if that's a common experience: and if it is, perhaps to give a heads-up to the optimisers that their work isn't finished yet.... (sorry about that, folks - I'm really appreciative of all the work you've done already). Also, if we can get some more experiences recorded, it might help people choosing their next rig. i am running a 6300(non OC), 1gb 667 ram, and chicken's core 2 enhanced app. I haven't noticed much of a difference in crunching times, if any difference at all, between it and the stock app.. each WU taking approx. 1.5-2h. i can run any benchmarks and supply any further data if needed. ID: 480348 ·

Sisyfos Send message Joined: 22 Jul 00 Posts: 7 Credit: 2,796,632 RAC: 0	Message 480392 - Posted: 11 Dec 2006, 22:28:14 UTC - in response to Message 476421. ...This really messes up the RDCF: on standard units I can go down to about 0.35 RDCF, but I've just reported some VHARs which have taken DCF up to over 1.1! Really messes up the cache fetch calculations.... A comment on the RDCF... I experienced an oddness with Crunch3r's 5.7.5. Eg.: I received a batch of similar WUs with, for example, 1:30:00 to completion. When I finished the first WU in 1:12:15 the "To completion" of the rest rose to 1:32:12 and in general my RDCF was approximately doubble of what it "should" be. I subsequently reverted to 5.4.11 and the RDCF quickly went back to normal. If you take a peek at my Toledo wich still runs 5.7.5, you'll see that RDCF is around 0.6. That amounts to 04:45:00'ish on the 0.42 variety WUs, but they only take around 02:20:00. Just a FYI. ID: 480392 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 480394 - Posted: 11 Dec 2006, 22:34:20 UTC - in response to Message 480392. I experienced an oddness with Crunch3r's 5.7.5. BOINC v5.7.5 is a beta release direct from the developers. It is not put out by Crunch3r. The "bug" is still interesting to know, nonetheless. Perhaps this should be brought to JM7's or ROM's attention, if they aren't already aware of this. ID: 480394 ·

Jakob Creutzfeld Volunteer tester Send message Joined: 13 Oct 00 Posts: 611 Credit: 2,025,000 RAC: 0	Message 480396 - Posted: 11 Dec 2006, 22:41:08 UTC - in response to Message 480348. i am running a 6300(non OC), 1gb 667 ram, and chicken's core 2 enhanced app. I haven't noticed much of a difference in crunching times, if any difference at all, between it and the stock app.. each WU taking approx. 1.5-2h. Is a matter of the angle range. The two stock-crunched WU's that are still shown on your results page (the two oldest ones) took about 8,000 seconds to complete for about 33 Credits. The newer ones show about 7,500 seconds for about 62 credits! (For comparison: One of your result with 34.25 credits took 5,256.83 seconds with Simon's app). So I think there's an increase of about 30%... Andy ID: 480396 ·

Sisyfos Send message Joined: 22 Jul 00 Posts: 7 Credit: 2,796,632 RAC: 0	Message 480397 - Posted: 11 Dec 2006, 22:41:51 UTC - in response to Message 480394. Last modified: 11 Dec 2006, 22:43:31 UTC It was the "BOINC 5.7.5.32 - 32 bit Edition by Crunch3r" with Affinity and RRI to be specific. I haven't tried the unaltered 5.7.5, so I wouldn't know if it has the same behaviour. EDIT: In reply to OzzFan ID: 480397 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 480459 - Posted: 12 Dec 2006, 0:09:17 UTC - in response to Message 480397. It was the "BOINC 5.7.5.32 - 32 bit Edition by Crunch3r" with Affinity and RRI to be specific. I haven't tried the unaltered 5.7.5, so I wouldn't know if it has the same behaviour. EDIT: In reply to OzzFan Ah, OK. That makes more sense now. ID: 480459 ·

jrmy Send message Joined: 14 Jun 04 Posts: 10 Credit: 21,475 RAC: 0	Message 480479 - Posted: 12 Dec 2006, 0:22:59 UTC - in response to Message 480396. i am running a 6300(non OC), 1gb 667 ram, and chicken's core 2 enhanced app. I haven't noticed much of a difference in crunching times, if any difference at all, between it and the stock app.. each WU taking approx. 1.5-2h. Is a matter of the angle range. The two stock-crunched WU's that are still shown on your results page (the two oldest ones) took about 8,000 seconds to complete for about 33 Credits. The newer ones show about 7,500 seconds for about 62 credits! (For comparison: One of your result with 34.25 credits took 5,256.83 seconds with Simon's app). So I think there's an increase of about 30%... Andy oh ID: 480479 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 480788 - Posted: 12 Dec 2006, 11:24:53 UTC OK, the Xeons are just two weeks old so it's time for an update. (click to see full detail) I've been running the latest affinity client (Crunch3r 5.7.5.32) for about four days - green dots. It seems to be slightly quicker on average, but that may be the eye of faith! Unfortunately, I didn't get so many VHAR in this run, so not as much data as I would have liked. (click to see full detail) I didn't post this one last time, but I think it's interesting. Look how the E6300s (pink dots) outperform the Xeons (blue and green dots) at high AR. Remember, same science app (Simon's 1.41) throughout. Current RAC for the machine (8 cores) is about 2250, and it made number 80 in the top list overnight - not bad for a young 'un. I reckon it should reach just over 3000 RAC, and might just make the top 20 if people stop moving the goalposts! And finally, I see that Dell are now taking orders for this chassis with dual X5355 (2.66GHz) quad cores - at a price, and delivery not until next year. I think I'll wait until we get the RAM questions sorted out. ID: 480788 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.