Message boards :
Number crunching :
Specs for Core2 Extreme QX9775
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
...and fully utilize the chipsets dual branch, dual channel; dual channel interleaved; quad channel, call it whatever you want, memory subsystem! To be accurate, the actual name is dual channel interleaved memory, which isn't exactly quad-channel but it gives similar performance. Looking back at the page that Alinator referenced,Frontside Bus Properties Bandwith 12800 MB/s Memory Bus Properties Bandwith 20640 MB/swhere is all the bandwith going? This system only reads at 3.8GB/sec with the configuration that this reviewer chose. Do the FB-DIMMs only allows this board to read from memory at 30% efficiency? Is that the performance penalty for using FB-DIMMs? Isn't there anything that can be done with the configuration options to tweak this thing to respectability? Remember that those are theoretical figures. For reference, PC133 SDRAM was supposed to achieve 1GB/s throughput, but most often came to around 380MB/s. Dual channel DDR400 RAM is supposed to achieve a throughput of 6.4GB/s but in actuality its closer to 800MB - 1GB/s. A lot of what eats up that theoretical number are bus protocols that have to take time to set up each access, do the access, then close the access, all the while keeping the RAM refreshed so that the data doesn't get lost. General rule of thumb used to be that you take the theoretical number and cut it in half (now maybe one third?) to get real-world performance numbers. Another thing that can have an effect on how efficient a subsystem is, is how mature the technology is; how much the designers can add new tricks and learn the system to get the most out of it. This is why first revision chipsets supporting new technologies such as RAM or HDDs are often thought of as the "first try", while updated chipsets supporting the same technology can often outperform their older counterparts. FB-DIMMs are relatively new but I'd expect with as many Intel chipsets supporting it that they should be able to get some decent performance out of it by now. Unless Intel knows the technology won't mature until more cores/faster processors/doing away with the FSB technology are realized. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
- I read this perhaps incorrectly [quote]as megabits per seconds, which I guess would be 20640/8 = 2580Mb/s (megabytes per second) ~ 2.52 Gb/s (Gigabytes per second) ~ 20.16 GB/sec (gigabits per second) , Let's assume those are theoretical maximums at stock clock. [ 400MHzx4x To calculate the total theoretical bandwidth, take bus speed times bus width. For instance: 1600MT/s Front Side Bus speed X 64bit bus = 102,400bits per second, then divide that number by 8 to convert bits to bytes: 102,400 / 8 = 12,800MB/s total theoretical. The same is done for RAM speed. Take the RAM speed (the total RAM speed, not the true RAM speed) times bus width for megabits per second, then divide by 8 for megabytes per second. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Did you take into account that it is "1600MHz dual independent Front Side Busses" ? Lets look at Dells just released Precision T7400 specs, another Intel D5xxx MOBO solution running the X5482 Xeon like some of the leading MAC PROs. Memory Bandwidth Up to 32.0 GB/s of theoretical memory bandwidth for 667MHz memory in quad channel mode i.e. 4 DIMMs Up to 38.4 GB/s of theoretical memory bandwidth for 800MHz memory in quad channel mode i.e. 4 DIMMs They also claim data transfer rates of 12.8GB/s or about 33% of theoretical max. Come on Ozz, my DDR2-800MHz gets 6-7 GB/s running Dual Channel mode, this D5400XS should at least get better than that in "dual channel interleaved" mode. Shouldn't it? Again I say, "Something just ain't right about the Skulltrail!" |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Did you take into account that it is "1600MHz dual independent Front Side Busses" ? Well, that would be per FSB, not total between the two, of course. Memory Bandwidth Up to 32.0 GB/s of theoretical memory bandwidth for 667MHz memory in quad channel mode i.e. 4 DIMMs Up to 38.4 GB/s of theoretical memory bandwidth for 800MHz memory in quad channel mode i.e. 4 DIMMs Your DDR2-800Mz DIMMs get exactly 6.4GB/s theoretical. I guarantee you that you are not getting that in actual throughput. Synthetic benchmarks will only show you theoretical, not real world. Again I say, "Something just ain't right about the Skulltrail!" I'm not stating anything to the Skulltrail platform specifically. I'm just trying to help keep things grounded. It's easy to look at theoretical numbers and get caught up in it all, but real world is always a much different picture. Skulltrail may very well be crippled. Wouldn't want it performing better than Xeons, would we? Gotta keep those market segments perfectly clear. ;) |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Should I believe that my DDR2-800 PC-6400, OC to 1000Mhz, PC-8000 yields theoretical bandwith of 8.0GB/s and benchmarks at 7.55GB/sec with SiSoftware Sandra benchmarking tool. Is 5.6% under theoretical maximum believable? I guess what I'm asking is if you are familiar with the tool, and if so can I believe the computed benchmarks? Go OZZY! |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Should I believe that my DDR2-800 PC-6400, OC to 1000Mhz, PC-8000 yields theoretical bandwith of 8.0GB/s and benchmarks at 7.55GB/sec with SiSoftware Sandra benchmarking tool. I'm familiar with SiSoft Sandra, even bought a couple copies a few years ago until I just got sick of benchmarking theoretical numbers. SiSoft Sandra is a synthetic benchmark and does not show real world performance. Sandra will show the benchmarks in theoretics only. Is 5.6% under theoretical maximum believable? Absolutely not. But its perfectly within margin of error for maximum theoretical benchmarking though. I guess what I'm asking is if you are familiar with the tool, and if so can I believe the computed benchmarks? You can believe they are 100% theoretical! ;) The results provided by SiSoft Sandra are not indicative of real-world performance what-so-ever. Go OZZY! OZ-ZY! OZ-ZY! OZ-ZY! |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
my DDR2-800 running dual channel @ 960 gets ~ 7.5GBytes/sec too [On SiSoft Sandra Lite]. Checking the List of device bandwidths wiki, the theoretical is somewhere between: PC2-6400 DDR2[-800] SDRAM (dual channel) 102.4 Gbit/s 12.8 GB/s PC2-8000 DDR2[-1000] SDRAM (dual channel) 128.0 Gbit/s 16.0 GB/s So 95% of theoretical sounds on the high side to me ;D Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
my DDR2-800 running dual channel @ 960 gets ~ 7.5GBytes/sec too [On SiSoft Sandra Lite]. Checking the List of device bandwidths wiki, the theoretical is somewhere between: Right, I was using the Oz-imposed theoretical max of 8 times bus frequency* multiplier, not taking into account dual channel. So according to the "Synthetic Benchmark" results and Wiki theoreticals, we're at ~ 47% theoretical that's a bit more believable. That puts us in the ballpark of what OzzFan was saying of actual being ~theoretical/2.0 Thanks, John |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
my DDR2-800 running dual channel @ 960 gets ~ 7.5GBytes/sec too [On SiSoft Sandra Lite]. Checking the List of device bandwidths wiki, the theoretical is somewhere between: Ahh, yes. I made a mistake in my calculation in that I calculated for single channel operation of DDR2-800 (which is exactly half @ 6.4GB/s). You guys made me curious, so I downloaded the latest trial version and ran the memory benchmark. I'm only getting 6.2GB/s throughput on my Intel 5000X chipset using DDR667 FB-DIMMs in dual channel interleaved mode, whereas my total should be 20.8GB/s. I once read the problem was with the FB-DIMMs because they have higher latency due to being fully buffered and typically having ECC too. I don't have a DDR2 non-ECC system on hand to test, but I can probably test my g/f's AMD Athlon X2 system (DDR2-800 ECC). |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
I would tend to believe those numbers OzzFan, I just had a peek at some of your results. Granted you are running stock application, but the variations in runtimes for your VHAR WUs is maybe 30%, I think the differences can be attributed to the varying WU mix and with it, the varying bandwith requirements to feed the cores. It's been my experience that you need in excess of 8GB/sec to get similar runtimes on VHARs feeding 4 cores. You might try setting your local preferences to use at most 3 processors antdsee if your CPU times stabilize. If they do, then it confirms a bandwith issue. Sandra just might be your friend. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
No, I can't attribute it to bandwidth issues entirely. Before I ran Vista on my Xeon machine, I was running XP SP2 and would consistently get RACs of 2,400 or so. Since switching over to Vista, my RAC has dropped to what it is now (roughly 14-1500). Granted, I'm also running CPDN on this machine too, but its too large of a drop to be considered for bandwidth issues alone. Unfortunately, I like Vista more than XP so SETI will just have to suffer the consequences. Besides, I'm not one of those looking to eek out every once of performance for SETI. I do what I do on my systems and I let SETI have whatever is left over. I'm not concerned about changing my processor preferences or anything else. I let my RAC fall where it may. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
OzzFan wrote: No, I can't attribute it to bandwidth issues entirely. Before I ran Vista on my Xeon machine, I was running XP SP2 and would consistently get RACs of 2,400 or so. Since switching over to Vista, my RAC has dropped to what it is now (roughly 14-1500). Granted, I'm also running CPDN on this machine too, but its too large of a drop to be considered for bandwidth issues alone. Unfortunately, I like Vista more than XP so SETI will just have to suffer the consequences. Sorry to hear about possible Vista performance hit, I'm running Vista Ultimate on my Q6600 (shows up as Vista Professional) and am very pleased with performance. And I agree with the whole Vista Experience, it pains me to have to use XP anymore. Crunch On, OzzFan. It's time for my "Wheels of Confusion" fix ;-) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Well the access patterns in the algorithms with the SaH opt Apps, that I've managed to peek at so far, would tend to initially rely heavily on latency, then cache performance/size , then once those are maxed out resort to taxing main memory, but not peeling through contiguos GigaBytes like you might expect from a media streaming server or something. So maybe for seti use it might be prudent to have a look at what's going on with the latency aspect too. For example I noticed some points in some functions were hardcoded to prefetch ahead around 512 bytes in critical loops .... When the intrinsic portions were compiled with intel compiler, the prefetches were shifted slightly and extended to ~1024 bytes ahead, implying to me the compiler realises the memory subsystem itself will take some time to wind up. When I manually adjusted a few prefetches to larger values , around 4096 Bytes ahead, some small further improvements were to be found in some cases (Though I have found Intel compiler w/Intrinsics pretty hard to beat with hand assembly so far ) What that implies to me is the obvious, that has been stated earlier in the thread, that by OC'ing the CPU it alters the bandwidth / latency combination at play. So I propose, what I'm sure I've heard others suggest before in this forum, that winding back on the memory clock so the latency figures can be reduced to minimal/optimal figures by access time may be an option. As eight slots of FB DIMM would be branch interleave, increasing bandwidth but raising latency significantly, using only four slots might reduce bandwidth a little but improve latency a lot. Especially for these small VHAR/VLAR tasks prove to provide better performance. Could this mean it may be lower latency to use only 2 slots on the SKT ? or has the extra branch been 'left off' the board entirely ? We also haven't really considered possible FSB limitations yet have we ? What kind of latency do the FBDimms / 5000x chipset get anyway ? Should be higher with all the buffering & branch interleaving going on shouldn't it? for reference SisSoft Sandra Lite Memory Latency Test: DDR2-800 @ 960 on Bearlake G33 chipset @ slack timings = 75ns (lower is better) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
...... I did an 'enforced test' of this when I got my octocore (Dell Precision 490 workstation, OEM version of Intel 5000 board, supplied with 2 x 1GB FB DIMMs @ 667). It felt really slow as a SETI cruncher - speeded up by about 25% when I put in an extra two DIMMs, as you can see here. Mind you, that was back in the day - I think I was using KWSN v1.41 for those tests. The prefetch pattern may be different for newer optimisations. Edit - I think I'm getting about 5.1GBytes/sec bandwidth on Sandra Lite XII SP1 with the current 4 DIMMs, but I'm finding the Sandra report quite difficult to interpret. Sandra thinks I'm FSB limited (1066 bus). |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I did an 'enforced test' of this when I got my octocore (Dell Precision 490 workstation, OEM version of Intel 5000 board, supplied with 2 x 1GB FB DIMMs @ 667). It felt really slow as a SETI cruncher - speeded up by about 25% when I put in an extra two DIMMs, as you can see here. Sounds indeed like it might have been running single channel originally perhaps ? semms like a strange thing for a manufacturer to do I suppose. Well back on topic of the thread, about one half of the reviews I've read seem to think the platform is the most potent machine they've seen, and the other half that it's somehow knobbled ... Be interesting to see what pans out. back to my dark cave to look at some more code. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Daniel Send message Joined: 21 May 07 Posts: 562 Credit: 437,494 RAC: 0 |
|
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
I did an 'enforced test' of this when I got my octocore (Dell Precision 490 workstation, OEM version of Intel 5000 board, supplied with 2 x 1GB FB DIMMs @ 667). It felt really slow as a SETI cruncher - speeded up by about 25% when I put in an extra two DIMMs, as you can see here. Thanks for refreshing our memories with that thread, now imagine feeding 8 cores like Skulltrail is doing, and I think you see performance stepping back to similar to before your memory upgrade!
Looking back at the BIOS settings photos in Tomshardware review, it seems that with the X5400DS (Skulltrail) board with it's 4 memory slots can be configured as
|
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Edit - I think I'm getting about 5.1GBytes/sec bandwidth on Sandra Lite XII SP1 with the current 4 DIMMs, but I'm finding the Sandra report quite difficult to interpret. Sandra thinks I'm FSB limited (1066 bus). SiSoft said the same thing about my system, and I went with the 1333FSB Xeons. SiSoft says I'm only getting about 59% of my total bandwidth. I remember using SiSoft on an old Pentium 4 with SDRAM and it said it was only using about 55% of total bandwidth. SiSoft seemed to favor RDRAM as another P4 system with RDRAM said it's total bandwidth was at 85%. I just don't trust synthetic benchmarks anymore. They really don't indicate real world performance (but I'll admit they're fun to look at!). |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
I just don't trust synthetic benchmarks anymore. They really don't indicate real world performance (but I'll admit they're fun to look at!). Maybe not, but they can tell you performance under ideal circumstances. Of course, actual (real world) performance will be different for everybody. The "synthetic benchmarks" really are moving data at the rates indicated, and do provide basis for comparison. Isn't that what benchmarks are supposed to do? Take, for instance, the EPA Mileage Estimates used to rate the fuel efficiency of automobiles here in the USA. Very few people, if any, truely do achieve those City/Highway numbers, though I do get pretty darn close. Driving habits and conditions vary for each driver, but the numbers do still provide a basis for comparison, their purpose. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Maybe not, but they can tell you performance under ideal circumstances. Of course, actual (real world) performance will be different for everybody. What good is ideal if everyone's "ideal" is different? Benchmarks are about as useless as those fuel efficiency ratings you mentioned. If nobody achieves them, then how do we know what pertains to us? It really depends on what you want to find out about the architecture. If all you want is theoretics, then benchmarks will tell you everything you want to know. If you want to know how it will perform with a specific set of software, then the benchmarks don't tell you much. Just like most people never achieve maximum fuel efficiency, most software never achieve maximum system efficiency. IMO the benchmarks become moot and nothing more than pretty graphs to look at. It would be like looking at my gross income and looking at what I "could have" if I didn't pay any of my bills. Now my net income can vary depending on what bills I decide to take on for myself to improve my life (or at least I convince myself that all of it is for my betterment). Most software performance will vary depending on how it taxes a system but will never achieve the system's maximum potential. Then I proffer: what good is maximum potential if it can never be achieved? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.