Q9550 vs. Q6600, a comparison of two hosts

Message boards : Number crunching : Q9550 vs. Q6600, a comparison of two hosts
Message board moderation

To post messages, you must log in.

AuthorMessage
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 830565 - Posted: 14 Nov 2008, 22:35:16 UTC

I have recently been operating a Q6600 at 2.79 GHz, and a new build Q9550 at stock 2.83 GHz.

In making some comparisons of these hosts to early Nehalem results, I was surprised to see how much advantage the Penryn-generation Q9550 had over the Conroe-generation Q6600. This advantage enlarged considerably when I compared work units run with a full load of 4 SETI tasks, rather than the 1 SETI paired with 3 Einsteins which made up my original data.

So here are a few graphs, comparing these two hosts, both when running 1 SETI and 3 Einsteins, and when running 4 SETI tasks.


here is an enlargement near angle Range 0.4.
The Q6600 suffers a much bigger penalty from running four SETI tasks than does the Q9550--and also has much more variation.

and here is a closer look at the VHAR data for Angle Range above 1.
As at mid AR, the Q6600 results are much more impaired, much more variable, and much slower.


I realize that the Q6600 has a been a favorite build base for folks building machines for which BOINC performance and cost are both considerations. I think a life-cycle cost comparison taking into account the much higher power consumption of a Q6600 host overclocked enough to come anywhere near matching the Q9550 stock performance is enough to make the Q9550 a clear winner on life cycle cost.

My own new-build Q9550 is not a bare-bones build, but is pretty lean--one each optical and hard drive, a cheap motherboard, and a cheap fanless graphics card. Two 120mm Yate Loon fans on the case, and one 120mm PWM fan on the CPU keep things admirably cool with not much acoustic noise--much less than the 8-year old Dell.

I'm undervolting it somewhat at stock frequency (1.14375 requested, 1.064 detected). The system ran stably at my last trial voltage (1.1125 requested) for a week, after which I raised it 5 CPU voltage increments for long-term safety. With four Einsteins running, the total system draw at the power plug is 113 Watts.

At 113 watts it does not meet my original hope of matching the power consumption of the eight year-old 933 MHz Coppermine box it replaces, but I expect to save the difference and more by backing my Q6600 and E6600 boxes down from their modest 2.79 GHz overclock and corresponding overvoltage.

I don't know how much of the considerable SETI output advantage at same clock comes from:
1. bigger cache
2. SSE4.1 vs. SSSE3X
3. Penryn architectural improvements over Conroe
4. chipset advantage of my ASUS P5QL Pro on the Q9550 over the Gigabyte 965P-DS3 on the Q6600

I'd hazard a guess that of these the considerably reduced degradation in going from 1 to 4 SETI tasks is mostly likely from the bigger cache. So I'd be hesitant in assuming these results apply to smaller-cache Penryn-generation Quads. (the Q9550 was the cheapest big-cache variant on offer when I made the purchase).



ID: 830565 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 830581 - Posted: 14 Nov 2008, 23:10:06 UTC - in response to Message 830565.  
Last modified: 14 Nov 2008, 23:17:31 UTC

Hi, think you got it right there. The cache and the 45nM+ structure, makes the biggest difference.
Have a 9650 @ 3GHz compaired to an 6600 @ 3GHz, the 9650 had a 25%~30% gain on SETI MB WU's, running XP x86 Pro and BOINC 5.10.45 and SSSE3x AK optimized app.
Both on an ASUS P5E(X38), same RAM, too. Compaired over a month's time.

The QX9650 now runs at 3350MHz and XP64(!), wich also gives a boost in speed. (Did not really take time to OC it further, but in time ;) .It ran at 4GHz, but I DID fry my RAM :(
{565MHz;1.82V appeared too much, for standard 400MHz sticks}
ID: 830581 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 830731 - Posted: 15 Nov 2008, 4:36:45 UTC - in response to Message 830565.  

I don't know how much of the considerable SETI output advantage at same clock comes from:
1. bigger cache
2. SSE4.1 vs. SSSE3X
3. Penryn architectural improvements over Conroe
4. chipset advantage of my ASUS P5QL Pro on the Q9550 over the Gigabyte 965P-DS3 on the Q6600


All those things, but mostly how the different setups each deal with contention on the bus, as evidenced by the wider gap running 4 WUs. The Q6600, being OC'd is placing more pressure on the bus. The stock clocked 9550 is likely has some headroom there, so can benefit from the wider strides incorporated in the SSE4.1 app (breathing room). It comes down to how multiple hardware prefetcher requests are concatenated on the core2 architecture, and how single large versus multiple small, requests react under high load. That is to say where the bus has headroom the larger stride 4.1 app will have an advantage, but be more prone to contention under OC conditions. The smaller strided ssse3x will have a finer granularity of requests, allowing a smaller number of requests to encounter contention, giving it an advantage under pressure.

In short, if the 9550 were to be OC'd, you'd find a crossover point where the ssse3x would perform equally, and beyond that point better. I've little doubt that the hardware prefetch mechanisms have undergone further tuning in the newer chips also, making for the appreciable StockSpeed+SSE4.1 performance gap.

Jason

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 830731 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 830904 - Posted: 15 Nov 2008, 19:03:18 UTC

So are you saying that the green is the Penryn and the red is the Conroe?
ID: 830904 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 830913 - Posted: 15 Nov 2008, 19:42:26 UTC - in response to Message 830904.  

So are you saying that the green is the Penryn and the red is the Conroe?
Oops, sorry I got sloppy on notation. Color denotes number of SETI tasks, while symbol shape denotes the host.

Red is for the _4 case--running four SETI tasks, while Green is the _1 case, running a single SETI plus three Einsteins.

The system is denoted by the shape--the open square is the Penryn-class quad (Yorkfield) while the x is the Conroe-class Quad (Kentsfield, formally).

There was an indirect clue in the titles--a Conroe could not have been running SSE4.1 code--but I did not intend to make this a brain teaser.

By the way, I was obviously running a lot of Einstein on these same hosts, and the Q9550 advantage at 2.83 GHz over the 2.79 GHz Q6600 is only about 10% Einstein--much less than the SETI advantage. And, yes, I'm aware of the periodic cycle issue in comparing Einstein execution times.



ID: 830913 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 831210 - Posted: 16 Nov 2008, 19:07:21 UTC

It looks, then, to me, that the Penryn is considerably slower than the Conroe, since the times for the x are quicker than the times for the square.
ID: 831210 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 831232 - Posted: 16 Nov 2008, 20:39:52 UTC - in response to Message 831210.  

It looks, then, to me, that the Penryn is considerably slower than the Conroe, since the times for the x are quicker than the times for the square.
Oops, squared. I must not have had my posting cap tied on at all yesterday.

I said it wrong, and it is to late for me to edit my errors, so let me try to say it right this time:

Stoll5 is a new system, a Q9550 Penryn-class quad (Yorkfield) running stock 2.83 GHz. Its times are shown as x's, Green when running one SETI plus three Einstein, red when running four SETI's.

Stoll4 is a system built in July, 2007, a Q6600 Conroe-class quad (Kentsfield) which during these measurements was running a mild overclock of 2.79 GHz (stock would be 2.4). Its times are shown as squares, Green when running one SETI plus three Einsteins, red when running four SETI's.

The interpretative comments I made in post 830565 I still stand by--at these nearly identical clock rates the Q9550 running SSE4.1 build 41 has a substantial advantage over the Q6600 running SSSE3X build 41, especially strongly true when running 4 SETIs.

So for any new builds for which performance and power are both part of the cost consideration, the much higher SETI performance and much lower power consumption per unit SETI output of the Q9550 merit strong consideration over a Q6600 of lower initial purchase price.



ID: 831232 · Report as offensive

Message boards : Number crunching : Q9550 vs. Q6600, a comparison of two hosts


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.