Some goodies from upcoming AMD K10

Message boards : Number crunching : Some goodies from upcoming AMD K10
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 628474 - Posted: 28 Aug 2007, 21:39:03 UTC
Last modified: 28 Aug 2007, 21:40:55 UTC

INQ

Has posted a quick and dirty session of their latest encounter of the K10 Barcelona

It roughly can produce past 30K 3dmark06 score OCed to 3GHz cooped with 2 x ATI GPU boards clocked at 830 Mhz..

It really seems promising indeed but only time will tell.. In S@H terms i presume that the SSE4 build of Penryn still will beat the K10 but in other terms it could be a close call..

Check it out

Am really thrilled of what to expect after all..
The K7 made me go AMD in 1999 when it made entrance.. Went Intel again with the introduction of northwood and then back again to Amds K8 platform (Heat and underperforming Netburst arch. was Intels issue back then including a very long pipe) and are now back with only Core2 based systems chugging on at home. Best bang for the heat/buck are always the moto :D

Kind Regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 628474 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24905
Credit: 3,081,182
RAC: 7
Ireland
Message 628484 - Posted: 28 Aug 2007, 21:55:08 UTC - in response to Message 628474.  

INQ

Has posted a quick and dirty session of their latest encounter of the K10 Barcelona

It roughly can produce past 30K 3dmark06 score OCed to 3GHz cooped with 2 x ATI GPU boards clocked at 830 Mhz..

It really seems promising indeed but only time will tell.. In S@H terms i presume that the SSE4 build of Penryn still will beat the K10 but in other terms it could be a close call..

Check it out

Am really thrilled of what to expect after all..
The K7 made me go AMD in 1999 when it made entrance.. Went Intel again with the introduction of northwood and then back again to Amds K8 platform (Heat and underperforming Netburst arch. was Intels issue back then including a very long pipe) and are now back with only Core2 based systems chugging on at home. Best bang for the heat/buck are always the moto :D

Kind Regards Vyper


Nice one Vpyer. If they were telling the 100% truth, then it will be very close indeed. I have been with AMD since summer 1991 after suffering a continius headache with my Intel systems.

As stated in another thread, I think I will wait until September before completing my server build.

ID: 628484 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 628748 - Posted: 29 Aug 2007, 8:33:19 UTC - in response to Message 628484.  



Nice one Vpyer. If they were telling the 100% truth, then it will be very close indeed. I have been with AMD since summer 1991 after suffering a continius headache with my Intel systems.

As stated in another thread, I think I will wait until September before completing my server build.


Hehe you oughta

But remember that those test were mentioning the "newer" K10 which will arrive later on.. B2 stepping..

With these they are arriving with know they have pushed that siliconversion nearly to the edge if i'm not out the blue.

If you're gonna build a non OCed server then perhaps the near K10 is the way to go in terms of heat/power/performance ratio , but noone can really tell until the reviewers do a full test of this cpu.

Only time will tell and that time is soon here.

Kind Regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 628748 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 630415 - Posted: 31 Aug 2007, 22:42:02 UTC

I have found a Nice comparison between CPU's...

http://www23.tomshardware.com/cpu_2007.html?modelx=33&model1=946&model2=882&chart=418

If the new AMD is really hitting 30K then its 9x faster than the current fastest core 2 quad (QX6850 3.0Ghz) which clocked in at 11765 3Dmark06 score

~BoB


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 630415 · Report as offensive
Profile Francois Piednoel
Avatar

Send message
Joined: 14 Jun 00
Posts: 898
Credit: 5,969,361
RAC: 0
United States
Message 630735 - Posted: 1 Sep 2007, 3:08:47 UTC - in response to Message 630415.  
Last modified: 1 Sep 2007, 3:19:17 UTC

I have found a Nice comparison between CPU's...

http://www23.tomshardware.com/cpu_2007.html?modelx=33&model1=946&model2=882&chart=418

If the new AMD is really hitting 30K then its 9x faster than the current fastest core 2 quad (QX6850 3.0Ghz) which clocked in at 11765 3Dmark06 score

~BoB


Don't forget the awesome Cinebench scores, and superPI scores, they are awesome too ... poooo pooooo pi doooooo

K10 getting beat up to death

poooo poooo pi dooooo

I am feel much better now, it was all smoke, and it will be wildly confirmed by the press. except few picked memory benchies. On the IQR, Theo is having a hole in the contiuum Space/Time and get a processor scaling more than frequency scaling, so 1+1 = 3, 4 x 5 =23 , etc ...Yes, from 2.5GHZ to 3.0GHz is much more than the score increase % describe in the article ... I think he got a Big fish, the kind that is 21cm when you catch it, but becomes 50cm when you explain to your friends. If you study this benchmark a little, you'll figure out that the CPU has a 6 to 7% impact on the global score, good luck to do what Theo describe. I am saying this ... you know, just to say ...

When Theo try to do some propaganda, he should at least get the math right.

and, since I am here, K10= K5 x 2 !

pooo pooo pi doooo. I ll believe 3DMark scores when I see it on 3DMark hall of Fame, for the moment, it is more like the Hall of Shame.

who?
Again, this is my own opinion, my employer is not responsable, but God!!! I love Blue!
ID: 630735 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 630803 - Posted: 1 Sep 2007, 5:19:18 UTC

A simple guess as to why the huge speed up... Could there have been a limiting speed rather than bandwith? I know on my core 2 that OC'ing from 2.13 to 2.6 (~20%)gives me about an 40% speed boost.

~BoB


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 630803 · Report as offensive
Profile Francois Piednoel
Avatar

Send message
Joined: 14 Jun 00
Posts: 898
Credit: 5,969,361
RAC: 0
United States
Message 630824 - Posted: 1 Sep 2007, 6:06:56 UTC - in response to Message 630803.  

A simple guess as to why the huge speed up... Could there have been a limiting speed rather than bandwith? I know on my core 2 that OC'ing from 2.13 to 2.6 (~20%)gives me about an 40% speed boost.

~BoB


I am sorry to tell you that it does not do that on 3DMark2006. I am curious, because if you see this behavior, i want to see. I never saw this happening.
I think you are overclocking the FSB to overclock the CPU, so , you get some benefit out of it, but 40%, i don't think so.
Over-scaling is ALWAYS due to measurement error.


who?
ID: 630824 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20982
Credit: 7,508,002
RAC: 20
United Kingdom
Message 630935 - Posted: 1 Sep 2007, 11:05:29 UTC - in response to Message 630803.  

A simple guess as to why the huge speed up... Could there have been a limiting speed rather than bandwith? I know on my core 2 that OC'ing from 2.13 to 2.6 (~20%)gives me about an 40% speed boost.

That is my suspicion...

The system performance is dependant on all the parts for whichever is slowest for whatever operation.

If you read the text carefully, you'll notice that the entire system was OC-ed, including the graphics cards...


At the moment, this is still all smoke and random numbers and rhetoric until more believable tests are published.

It's still looking interesting, regardless of whatever religious motivations!


Who? Keep up the good work. It will be very good to see the vast areas of silicon presently wasted on cache instead better utilised for computing. The (very old, i360 old) FSB limitation needs clearing first!

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 630935 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20982
Credit: 7,508,002
RAC: 20
United Kingdom
Message 630937 - Posted: 1 Sep 2007, 11:10:11 UTC - in response to Message 630935.  

... It will be very good to see the vast areas of silicon presently wasted on cache instead better utilised for computing. The (very old, i360 old) FSB limitation needs clearing first!

On a related note:

We already have PCIe, AMD's Hypertransport, and such as SATA, USB, Firewire. Looking at Intel's proposal for a "Hypertransport", why oh why can not everyone settle on one fast serial interconnect transport?

Or would having all system components 'plug-n-play' and interchangeable all on one common interconnect standard cause too much competition for the vendors?...

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 630937 · Report as offensive
Profile ohiomike
Avatar

Send message
Joined: 14 Mar 04
Posts: 357
Credit: 650,069
RAC: 0
United States
Message 631029 - Posted: 1 Sep 2007, 14:22:28 UTC - in response to Message 630937.  

<snip>
Or would having all system components 'plug-n-play' and interchangeable all on one common interconnect standard cause too much competition for the vendors?...

Happy crunchin',
Martin

It would be like all cars having interchangeable motors- connected to the vehicle by a v-belt.

Boinc Button Abuser In Training >My Shrubbers<
ID: 631029 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 631164 - Posted: 1 Sep 2007, 17:08:22 UTC - in response to Message 630824.  

A simple guess as to why the huge speed up... Could there have been a limiting speed rather than bandwith? I know on my core 2 that OC'ing from 2.13 to 2.6 (~20%)gives me about an 40% speed boost.

~BoB


I am sorry to tell you that it does not do that on 3DMark2006. I am curious, because if you see this behavior, i want to see. I never saw this happening.
I think you are overclocking the FSB to overclock the CPU, so , you get some benefit out of it, but 40%, i don't think so.
Over-scaling is ALWAYS due to measurement error.


who?


Upped FSB, CPU voltage, and ram voltage.

Also note in the article that the stock 2.5 Ghz pulled in 25,000 (with OC'ed vid cards). I think its safe to say that without OC'ing anything it would be pulling around 20,000.

~BoB


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 631164 · Report as offensive
Profile Francois Piednoel
Avatar

Send message
Joined: 14 Jun 00
Posts: 898
Credit: 5,969,361
RAC: 0
United States
Message 631221 - Posted: 1 Sep 2007, 18:07:16 UTC - in response to Message 631164.  
Last modified: 1 Sep 2007, 18:10:16 UTC

A simple guess as to why the huge speed up... Could there have been a limiting speed rather than bandwith? I know on my core 2 that OC'ing from 2.13 to 2.6 (~20%)gives me about an 40% speed boost.

~BoB


I am sorry to tell you that it does not do that on 3DMark2006. I am curious, because if you see this behavior, i want to see. I never saw this happening.
I think you are overclocking the FSB to overclock the CPU, so , you get some benefit out of it, but 40%, i don't think so.
Over-scaling is ALWAYS due to measurement error.


who?


Upped FSB, CPU voltage, and ram voltage.

Also note in the article that the stock 2.5 Ghz pulled in 25,000 (with OC'ed vid cards). I think its safe to say that without OC'ing anything it would be pulling around 20,000.

~BoB



I know that many people would like to "dream" that it is true. never the less, there is physics and computer science rules, one of them is Amdahl's law.
This is one of those laws that can't be cracked, and Theo's article just explode this law.

Now, let me give you more details. 3DMark2006 is a program that is heavily x87 and SSE scalar intensive. I invite anybody to download vTune and check by themselve. Memory is a small part of the equation, and this is a big difference with 3DMark2005 (It is a memory test, 05 is just copying buffers to the GPU).
Since the vTune profile is CPU and GPU limited for 3DMark06, you want to look at where it can scale. A Kentfield with the same GPUs as Theo will give you around 18000, at 2.93GHz , using same frequency and same Memory speed in the GPU. Now, you cut the frequency at 2.0GHZ, the score does not go much lower. that means, the over all score is very little sensitive to CPU. in fact, it is about 8% on the score. So, if you respect Amdal's law, the Theo processor should have been about 10 times faster than Core 2 QX6850... hummm hummm
I know it is nice to dream, but it is very unlikely.
Especially because the code is x87 and SSE scalar. Mr Kanter, on his web site has a very nice architecture teaching web page: Page 6
Over there, you can see that the increase in resources from K8 to K10 is how large is the execution units, they go from 64 to 128 bits. on x87 and SSE scalar, this will not help you AT ALL. (K8 and K10 will execute the same amount of x87 or SSE scalar per cycle, it is from AMD documentation in the optimizer guide)
There is nothing in the architecture that can justify a 10X speed up on x87. Cache speed is superior in Core 2, there is no "write combining between the 2 L2 of Core 2 on 3DMark06", so, really NOTHING can justify the break of Amdahl's law.

Conclusion, Theo was tricked or mislead by somebody, or he has too much exotic drinks.

it is just purely impossible.

who?
ID: 631221 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20982
Credit: 7,508,002
RAC: 20
United Kingdom
Message 631317 - Posted: 1 Sep 2007, 20:08:43 UTC - in response to Message 631221.  
Last modified: 1 Sep 2007, 20:09:05 UTC

... Mr Kanter, on his web site has a very nice architecture teaching web page: Page 6

Thnaks for the link. A long read but very interesting.

Regardless of the latest dubious ghosts 'n' smoke numbers, it looks like it's going to stay a close run race between Intel and AMD. The architectures suggest AMD will maintain its server advantage whilst Intel still holds onto the desktop scene...

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 631317 · Report as offensive
Profile Francois Piednoel
Avatar

Send message
Joined: 14 Jun 00
Posts: 898
Credit: 5,969,361
RAC: 0
United States
Message 631322 - Posted: 1 Sep 2007, 20:14:27 UTC - in response to Message 631317.  
Last modified: 1 Sep 2007, 20:22:53 UTC

... Mr Kanter, on his web site has a very nice architecture teaching web page: Page 6

Thnaks for the link. A long read but very interesting.

Regardless of the latest dubious ghosts 'n' smoke numbers, it looks like it's going to stay a close run race between Intel and AMD. The architectures suggest AMD will maintain its server advantage whilst Intel still holds onto the desktop scene...

Happy crunchin',
Martin


Trust me, I did my home work. The race will not happen:
It will not even get close to Core 2 QX6850, on desktop, it is over. On server, I don't know, I am not an expert in this area, but desktop, there is NO way, on single socket, it will be fun ... At least for me.

And Penryn will definitively increase the gap where it matters.
Let me add this: K10 will increase nicely compare to K8 on MMX/SSE2 Packed, and this is expected. Where ever it matter, SSE4 will re-increase the gap. Most of the vendors are already commited to SSE4, you saw windows media, DivX, so, most of the MMX world that matter is covered by SSE4. The 128bits units of K8 will be too short. many software vendor jumped on SSE4 MPSADBW instruction, because it make so much sense, they requested it for years, now that they have it, they love it.

who?
PS: This is my personal opinion.
ID: 631322 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 631385 - Posted: 1 Sep 2007, 21:13:38 UTC

Who?,

please do explain then if the CPU does not make much difference then why is there such a huge difference between the ones listed on the cpu chart.

Also... on that same chart click on the CPU version of 3Dmark06... If there is so little difference then why do we see the results shown?

Also there is a huge memory difference between the k10 and the core 2.

But regardless of missing the 10 GB/s mark, it is still faster than any DDR3 memory on an Intel system, regardless of the clock achieved by the DDR3 memory. If you put the memory on 1066 MHz, 11GB/s bandwidth was smashed with read, write and copy tests and that was by quite some margin.


~BoB


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 631385 · Report as offensive
Profile Francois Piednoel
Avatar

Send message
Joined: 14 Jun 00
Posts: 898
Credit: 5,969,361
RAC: 0
United States
Message 631463 - Posted: 1 Sep 2007, 21:51:15 UTC - in response to Message 631385.  
Last modified: 1 Sep 2007, 21:59:57 UTC

Who?,

please do explain then if the CPU does not make much difference then why is there such a huge difference between the ones listed on the cpu chart.

Also... on that same chart click on the CPU version of 3Dmark06... If there is so little difference then why do we see the results shown?

Also there is a huge memory difference between the k10 and the core 2.

But regardless of missing the 10 GB/s mark, it is still faster than any DDR3 memory on an Intel system, regardless of the clock achieved by the DDR3 memory. If you put the memory on 1066 MHz, 11GB/s bandwidth was smashed with read, write and copy tests and that was by quite some margin.


~BoB


The CPU makes a difference in the CPU score, and 8% of the global scores. Did does not increase the global score by more than the frequency. it is against the principal of computer science number 1: Amdahl's law.

after all, you can believe that ever you want, as a performance expert, doing this for 7 years, i am telling you that it is not possible, if the 30K record exist on 3Dmark2006, it is because the GPUs, not because the processor.

Just get prepared for a hard landing, because I can quaranty you what Theo has a wrong information. Again, I am trying to put some science into this, and I gave the MAth. If you don't believe it, well, nothing i can do to help you.

Over frequency scaling is a very well known "no no" in the processor community, i propose that you mail the AMD guys and ask them.

you can still post again and again this information, but you are only helping to increase the dissapointement when the hardware will get public.

If I am wrong, I ll appologize, but for the moment, even Theo is not that loud about it anymore ... humm hummmm

who?
ID: 631463 · Report as offensive
Profile Francois Piednoel
Avatar

Send message
Joined: 14 Jun 00
Posts: 898
Credit: 5,969,361
RAC: 0
United States
Message 631489 - Posted: 1 Sep 2007, 22:10:15 UTC - in response to Message 631385.  
Last modified: 1 Sep 2007, 22:28:23 UTC

Who?,

please do explain then if the CPU does not make much difference then why is there such a huge difference between the ones listed on the cpu chart.

Also... on that same chart click on the CPU version of 3Dmark06... If there is so little difference then why do we see the results shown?

Also there is a huge memory difference between the k10 and the core 2.

But regardless of missing the 10 GB/s mark, it is still faster than any DDR3 memory on an Intel system, regardless of the clock achieved by the DDR3 memory. If you put the memory on 1066 MHz, 11GB/s bandwidth was smashed with read, write and copy tests and that was by quite some margin.


~BoB


I forgot, Today's memory subsystem of the K8 is already faster than core 2, if you design your memory test to void the memory prefetchers of core 2. There is a famous post on the old aceshardware about Repriest "fixing" the memory test of sciencemark to avoid the prefetchers.
In reality, most of the application work perfectly with the prefetchers, and it comes from 2 different way to think about computers.
1) you can make the latency of a memory access very small, by having a memory controler on dice. (AMD)
2) you can choose to avoid memory access at all by detecting the pattern of memory access and smartly avoid the access by doing them while you are doing something else, early enough. This is what Core 2 does, it does have a very complex prefetcher that work perfectly on Games, Multimedia and many other applications.
a smart prefetcher requires large cache to avoid cache pollution, it requires as well more lane access to the cache, to avoid struggling in the hub.

at the end of the day, today's prefetchers are proven to be more efficent than memory controler on dice, just look at the benchmarks, it is pretty clear (K8 vs Core 2). The memory test is a nice to have, but it does not have any significant impact on desktop benchmarks. Server is a different story, I am not expert into this.

Today, Sisoft sandra memory test, Core 2 gets killed by a K8, but on memory intensive application like video encoding, Core 2 just wins by a large marging, because the Cache miss to clock ratio is outstanding. I hope it helps you to understand.

good lecture about this: Here


who?
ID: 631489 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20982
Credit: 7,508,002
RAC: 20
United Kingdom
Message 631511 - Posted: 1 Sep 2007, 22:34:36 UTC - in response to Message 631489.  
Last modified: 1 Sep 2007, 22:35:24 UTC

[...]
In reality, most of the application work perfectly with the prefetchers, and it comes from 2 different way to think about computers.
1) you can make the latency of a memory access very small, by having a memory controler on dice. (AMD)
2) you can choose to avoid memory access at all by detecting the pattern of memory access and smartly avoid the access by doing them while you are doing something else, early enough. This is what Core 2 does, it does have a very complex prefetcher that work perfectly on Games, Multimedia and many other applications.
a smart prefetcher requires large cache to avoid cache pollution, it requires as well more lane access to the cache, to avoid struggling in the hub...

Which all adds up to subtly different design approaches, and that the real benchmark is how well the entire system performs for each specific application.

Thanks for a good description.

This is all a very good contest of design and silicon and compilers/programming...

The question remains for how close the race will be for what devices are available to be put in our PCs. I guess we continue to watch the top 20 on the s@h stats!

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 631511 · Report as offensive
Profile Francois Piednoel
Avatar

Send message
Joined: 14 Jun 00
Posts: 898
Credit: 5,969,361
RAC: 0
United States
Message 631679 - Posted: 2 Sep 2007, 1:26:22 UTC - in response to Message 631511.  

even better than memory controler on dice, the memory on dice: TeraScale
(Using multipackaging technics already used in Core 2 ... terribly efficent!
This is working, up and running, Tera is the next step, we should all focus on this big leap.


who?
ID: 631679 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 631695 - Posted: 2 Sep 2007, 1:47:22 UTC - in response to Message 631679.  

even better than memory controler on dice, the memory on dice: TeraScale
(Using multipackaging technics already used in Core 2 ... terribly efficent!
This is working, up and running, Tera is the next step, we should all focus on this big leap.


who?


Who?, do you think you could get me one of those Tera's? :-D

ID: 631695 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Some goodies from upcoming AMD K10


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.