Message boards :
Number crunching :
Core 2 comparison - Xeon E5320 vs E6300
Message board moderation
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
OK - here's a question for the hardware wizards out there. Why does a Core 2 Duo E6300 (2MB L2 cache, 1.86GHz, 1066 MHz FSB) crunch at twice the speed of a Xeon Quad E5320 (2 x 4MB L2 cache, 1.86GHz, 1066 MHz FSB)? This chart has about 500 readings from the Xeon, and 180+ from the Core 2 - so it's pretty consistent over standard and high ARs: just haven't caught any VLARs on the Core 2s yet. All WUs have been crunched using Simon's 1.41 app during the last week, so there's no obvious reason for the difference except the usual random nature of the WUs issued for crunching. Hardware in both cases is standard Dell kit - Xeons in a Precision 490 with 2 GB RAM, Core 2s in Dimension E520 with 1 GB RAM. OS is Windows XP Pro 32-bit on all systems. My suspicion is that it will be to do with the efficiency of main memory access on the two different motherboards. I'll play around with CPU-Z, but I'm not sure exactly what to look for: can anyone recommend any other tools I could try? |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
Why does a Core 2 Duo E6300 (2MB L2 cache, 1.86GHz, 1066 MHz FSB) crunch at twice the speed of a Xeon Quad E5320 (2 x 4MB L2 cache, 1.86GHz, 1066 MHz FSB)? The obvious difference to me is that Xeon's use FB DIMMs. Also, I think you need 4 DIMMs to get the max memory bandwidth. Are you using 2x1gb or 4x512mb? Dublin, California Team: SETI.USA |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
Why does a Core 2 Duo E6300 (2MB L2 cache, 1.86GHz, 1066 MHz FSB) crunch at twice the speed of a Xeon Quad E5320 (2 x 4MB L2 cache, 1.86GHz, 1066 MHz FSB)? 2 x 1GB. This is Dell's spec from their website: 2GB DDR2 667 Quad Channel FBD Memory (2x1GB) What do you reckon 'quad channel' means, and should it help? |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
Why does a Core 2 Duo E6300 (2MB L2 cache, 1.86GHz, 1066 MHz FSB) crunch at twice the speed of a Xeon Quad E5320 (2 x 4MB L2 cache, 1.86GHz, 1066 MHz FSB)? Read these two pages. It is about the dual core Xeon Mac Pro, but the technology and issues are the same with regard to memory. http://www.anandtech.com/showdoc.aspx?i=2816&p=11 http://www.anandtech.com/showdoc.aspx?i=2816&p=12 Dublin, California Team: SETI.USA |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
Why does a Core 2 Duo E6300 (2MB L2 cache, 1.86GHz, 1066 MHz FSB) crunch at twice the speed of a Xeon Quad E5320 (2 x 4MB L2 cache, 1.86GHz, 1066 MHz FSB)? "For the most part, there's no benefit to having all four channels populated, but in some rare cases the performance boost can be tremendous. Given that lmbench showed us an increase in memory write speed when going from dual to quad channels, we can assume that the scenarios where we do see a large performance gain are write bandwidth bound. If you're going to upgrade the memory in your Mac Pro anyways, you might as well stick to four FB-DIMMs as it will give you the best possible combination of latency and bandwidth (as good as you can get with FB-DIMMs that is)." Dublin, California Team: SETI.USA |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
"For the most part, there's no benefit to having all four channels populated, but in some rare cases the performance boost can be tremendous. Given that lmbench showed us an increase in memory write speed when going from dual to quad channels, we can assume that the scenarios where we do see a large performance gain are write bandwidth bound. Yes, I'd just got as far as reading that bit. But in the chart they're referring to, "tremendous" seems to equate to about 20% improvement - I'm seeing 100% for 6300 over 5320. But I would be surprised to find that SETI is quite so memory-bound. The conventional wisdom would have it that the double-size L2 cache would give the Xeons the edge in this particular comparison. BTW, I should have said that I'm running BOINC as a service on both platforms, and both machines have separate graphics cards (not on-board graphics), so I'm hoping we can rule out graphics as a cause of the difference. |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
Yes, I'd just got as far as reading that bit. But in the chart they're referring to, "tremendous" seems to equate to about 20% improvement - I'm seeing 100% for 6300 over 5320. Who? is the expert on this stuff. I'm sure he'll speak up sooner or later. Dublin, California Team: SETI.USA |
jeffusa Send message Joined: 21 Aug 02 Posts: 224 Credit: 1,809,275 RAC: 0 |
We need more information. Are both of these systems being used exclusively for Seti? Trying to see if something else is taking processor power away. They are both running Windows XP Pro with the latest updates? Use CPU-Z and take screenshots of both the CPU and Memory tabs on both systems. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
We need more information. Xeon system will become my main system, so other stuff is gradually being added - but it's only a week old, and it's felt "slow" for seti from the beginning. No other BOINC projects. 6300 systems are on 48-hour burn-in for a client - I lose them tomorrow morning. No application software at all yet. Trying to see if something else is taking processor power away. Nothing else shows up as being CPU-intensive in Task Manager. The eight SETI processes (did I say this was a dual Xeon) are all showing 12% or 13%. They are both running Windows XP Pro with the latest updates? Yes. The Xeon has IE7, the 6300s still IE6, but otherwise all updates applied. Use CPU-Z and take screenshots of both the CPU and Memory tabs on both systems. Will these CPU-Z 1.38 report files do instead? Xeon Processor(s) Number of processors 2 Number of cores 4 per processor Number of threads 4 (max 4) per processor Name Intel Xeon E5320 Code Name Clovertown Specification Intel(R) Xeon(R) CPU E5320 @ 1.86GHz Package Socket 771 LGA Family/Model/Stepping 6.F.7 Extended Family/Model 6.F Core Stepping B3 Technology 65 nm Core Speed 1862.7 MHz Multiplier x Bus speed 7.0 x 266.1 MHz Rated Bus speed 1064.4 MHz Stock frequency 1866 MHz Instruction sets MMX, SSE, SSE2, SSE3, SSSE3, EM64T L1 Data cache 4 x 32 KBytes, 8-way set associative, 64-byte line size L1 Instruction cache 4 x 32 KBytes, 8-way set associative, 64-byte line size L2 cache 2 x 4096 KBytes, 16-way set associative, 64-byte line size Chipset & Memory Northbridge Intel 5000X rev. 12 Southbridge Intel 6321ESB rev. 09 Graphic Interface PCI-Express PCI-E Link Width x16 PCI-E Max Link Width x16 Memory Type FB-DDR2 Memory Size 2046 MBytes System System Manufacturer Dell Inc. System Name Precision WorkStation 490 System S/N xxxxxxx Mainboard Vendor Dell Inc. Mainboard Model 0GU083 BIOS Vendor Dell Inc. BIOS Version A02 BIOS Date 10/27/2006 Memory SPD Module 1 FB-DDR2, PC2-5300 (333 MHz), 1024 MBytes, Hyundai Electronics Software Windows Version Microsoft Windows XP Professional Service Pack 2 (Build 2600) DirectX Version 9.0c Core 2 E6300 Processor(s) Number of processors 1 Number of cores 2 per processor Number of threads 2 (max 2) per processor Name Intel Core 2 Duo E6300 Code Name Conroe Specification Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz Package Socket 775 LGA Family/Model/Stepping 6.F.6 Extended Family/Model 6.F Core Stepping B2 Technology 65 nm Core Speed 1861.8 MHz Multiplier x Bus speed 7.0 x 266.0 MHz Rated Bus speed 1063.9 MHz Stock frequency 1866 MHz Instruction sets MMX, SSE, SSE2, SSE3, SSSE3, EM64T L1 Data cache 2 x 32 KBytes, 8-way set associative, 64-byte line size L1 Instruction cache 2 x 32 KBytes, 8-way set associative, 64-byte line size L2 cache 2048 KBytes, 8-way set associative, 64-byte line size Chipset & Memory Northbridge Intel P965/G965 rev. C2 Southbridge Intel 82801HB (ICH8) rev. 02 Graphic Interface PCI-Express PCI-E Link Width x16 PCI-E Max Link Width x16 Memory Type DDR2 Memory Size 1024 MBytes System System Manufacturer Dell Inc. System Name Dell DM061 System S/N xxxxxxx Mainboard Vendor Dell Inc. Mainboard Model 0WG864 BIOS Vendor Dell Inc. BIOS Version 2.0.4 BIOS Date 10/25/2006 Memory SPD Module 1 DDR2, PC2-5300 (333 MHz), 512 MBytes, Samsung Module 2 DDR2, PC2-5300 (333 MHz), 512 MBytes, Samsung Software Windows Version Microsoft Windows XP Professional Service Pack 2 (Build 2600) DirectX Version 9.0c |
jeffusa Send message Joined: 21 Aug 02 Posts: 224 Credit: 1,809,275 RAC: 0 |
The above does help. I am curious, run the CPU benchmarks in Boinc on both machines and tell me what you get. I noticed there is only 1 memory module in the Xeon system. Since it is not running in dual channel mode for memory that may hurt it's performance but it shouldn't hurt it too much. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
What do you reckon 'quad channel' means, and should it help? It technically isn't 'quad channel' memory. It is actually dual channel, interleaved - meaning it utilizes two dual channels, alternating between the two to get best performance and RAM accesses. Using dual channel interleaved DDR2-667 RAM, it offers a theoretical 32GB/s bandwidth, much faster than even dual channel DDR2-1000 can offer (which would require RAM overclocking since most chipsets only officially support DDR2-800). It really depends on the application as to how much performance improvement there is (basically if it's memory intensive). |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
But I would be surprised to find that SETI is quite so memory-bound. The conventional wisdom would have it that the double-size L2 cache would give the Xeons the edge in this particular comparison. While generally true, remember that SETI can use up to 64MB of RAM. Unless you have 64MB L2 cache, faster RAM access or speed will effect SETI overall. As long as the app (SETI) has to go back out to main memory for the rest of the code/data, it will slow down from L2 cache speed to RAM speed, so the faster it can get the data out of RAM, the faster SETI will appear to run. [Edit] Since you only have two RAM modules in use, you are not using the dual channel interleave option, so you are using simple dual channel mode. Dual channel DDR2-667 is going to react slower than DDR2-800 on the Core 2, so overall the Core 2 is going to have faster RAM access.[/Edit] |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I noticed there is only 1 memory module in the Xeon system. Since it is not running in dual channel mode for memory that may hurt it's performance but it shouldn't hurt it too much. There is a bug in CPUZ. For instance, my Xeon 5130 machine has four RAM modules (all fully programmed with SPD), but CPUZ only detects one single 1GB module (even though it detects that I have 2.75GB of RAM, which is the maximum Windows XP is recognizing on my system until I upgrade to a 64bit OS). |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Nothing else shows up as being CPU-intensive in Task Manager. The eight SETI processes (did I say this was a dual Xeon) are all showing 12% or 13%. [Edit] That is about right for an eight CPU system. Remember that the Processes tab must add up to 100%, so that leaves roughly 12.5% of each CPU, or Task Manager might alternate that as 12% for one and 13% for the other, since it does not show half of a percent. Click on the performance tab in Task Manager and check out the individual CPU graphs (you should have eight). If each graph is maxed out at the top, then each CPU is performing at it's peak. This means your slowdown must be elsewhere. I'd start by checking for CPU throttling features in the BIOS. Perhaps your CPU is getting hot and it's kicking in?[/Edit] |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
The above does help. I am curious, run the CPU benchmarks in Boinc on both machines and tell me what you get. Xeon 5320 06/12/2006 22:10:41||Suspending network activity - running CPU benchmarks 06/12/2006 22:10:43||Running CPU benchmarks 06/12/2006 22:11:42||Benchmark results: 06/12/2006 22:11:42|| Number of CPUs: 8 06/12/2006 22:11:42|| 1826 double precision MIPS (Whetstone) per CPU 06/12/2006 22:11:42|| 7422 integer MIPS (Dhrystone) per CPU 06/12/2006 22:11:42||Finished CPU benchmarks 06/12/2006 22:11:43||Resuming computation Core 2 E6300 06/12/2006 22:10:42||Suspending computation - running CPU benchmarks 06/12/2006 22:10:44||Running CPU benchmarks 06/12/2006 22:11:43||Benchmark results: 06/12/2006 22:11:43|| Number of CPUs: 2 06/12/2006 22:11:43|| 1747 floating point MIPS (Whetstone) per CPU 06/12/2006 22:11:43|| 3648 integer MIPS (Dhrystone) per CPU 06/12/2006 22:11:43||Finished CPU benchmarks 06/12/2006 22:11:44||Resuming computation As OzzFan notes, the CPU-Z report for the Xeon modules is misleading. Note under 'Chipset & Memory' it says "Memory Size 2046 MBytes". System Properties says 2.00GB of RAM: it also says "Physical Address extension", which I've never noticed before - it says it on the Core 2 E6300 as well (but only 1.00GB there). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
Click on the performance tab in Task Manager and check out the individual CPU graphs (you should have eight). If each graph is maxed out at the top, then each CPU is performing at it's peak. This means your slowdown must be elsewhere. BIOS information is a bit sparse (it being a Dell and all that) - I can't even find any temperature report. It has three options under 'Performance' - Speedstep, Intel Virtualisation Technology, and Limit CPUID Value to 3. All three settings are set to 'off' (factory default). CPU usage fluctuates - as always, especially when Task Manager is running - but I can see it go up to 104% (all 8 SETI processes at 13% simultaneously!). Nothing else registers, even though I have BOINC Manager and IE7 open at the moment. All 8 processor graphs are flatlining at top of scale. It is warm in here while the 5 E6300s burn in, but nothing that triggers any extra fan action. And the other machines have only been here for the last couple of days: the extra heat hasn't changed the speed, it's been like this from the beginning. |
jeffusa Send message Joined: 21 Aug 02 Posts: 224 Credit: 1,809,275 RAC: 0 |
PAE is for Physical Address Extension. This is needed if you want to get past the 32-bit limitation of being able to use only 2GB of RAM. Of course, you also need an OS and programs that are able to take advantage of PAE. Not, this is not an issue for 64-Bit systems, PAE has been designed as a workaround for the limitations on 32-Bit systems. I have more to comment but I must get back to work. I will post some more later. |
Gecko Send message Joined: 17 Nov 99 Posts: 454 Credit: 6,946,910 RAC: 47 |
OK - here's a question for the hardware wizards out there. I would suspect you are seeing a prime example of the bus-memory bottleneck in it's full 8 thread glory. To make it worse, the Xeon memory latency of FB Dimm at 5-5-5-15 compounds the problem. If I understand the bandwidth correctly for a better apples to apples example, your Allendale is running 2 cores @ 1066FSB w/CL3(?) memory latency vs. the Quad Xeons essentially sharing FSB bandwidth for @ equivalent performance of a Allendale running at 533FSB, and at CL5 to add insult. If S@H's performance is most influenced by processor clock, cache, memory speed, and FSB in this order (is this correct?), you're likely benefitting w/ the extra cache w/ the 5320, at a "wash" w/ clock speed, and disadvantage w/ memory and FSB speed vs. a straight compare to your E6300. More threads, yes, but operating at substantially reduced efficiency to where the total benefit scales at a somewhat disappointing fraction of the implied potential. http://www.insight64.com/downloads/IntelligentDesign.pdf Personally, I expected to see the dual socket operate at maybe @ 70% of the efficiency of 1 socket. Your 50% results (time vs. time) are sobering. Just these last few days, my debate has been between 2-CPU 5320 or 1 moderately OC QX6700. My perception has been to wait for the 45nm chips to see if they open the bottleneck before jumping into a dual socket. Unless you find a magic fix for your woes, you experience may be good inspiration for others to look at QX solutions w/ some overclocking and fast memory for optimal crunching consideration. Just my 2-cents. Good luck! |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
OK - here's a question for the hardware wizards out there. Actually, the Conroes seem to be running at 5-5-5-13: 333MHz DDR2, so the raw memory timings are very similar. The optimistic way of looking at this is to say how good the E6300s are, even with the smaller L2 cache - claiming almost 29 credits/hour per core: the Xeon is claiming not much over 16 per core. I'll be keeping the Xeons long-term, so I can run any tests/tweaks people can suggest, but I'll have to wind down and disconnect this batch of C2Ds in the morning (UK time) - so this is the last call for tests I ought to run first. |
Gecko Send message Joined: 17 Nov 99 Posts: 454 Credit: 6,946,910 RAC: 47 |
Whoops! I referred to Conroe in referencing your E6300, when I meant Allendale. Allendale of course has 1066FSB and 1MB per core L2, Conroe 1333 and 2MB per respectively. Sorry for the faux pas and potential confusion. Yes, the E6300 is a mighty warrior! and it sure loves to o-clock! |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.