CES 2017 -- AMD RYZEN CPU

Message boards : Number crunching : CES 2017 -- AMD RYZEN CPU
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1852983 - Posted: 4 Mar 2017, 21:03:31 UTC

Problem with Windows and Ryzen were posted today explaining what is going wrong with SMT and single-thread performance. Good news is that Linux with the latest kernel has Ryzen working correctly.

smt_configuration_error_in_windows_found_to_be
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1852983 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1852995 - Posted: 4 Mar 2017, 21:20:43 UTC - in response to Message 1852983.  
Last modified: 4 Mar 2017, 21:22:00 UTC

Problem with Windows and Ryzen were posted today explaining what is going wrong with SMT and single-thread performance. Good news is that Linux with the latest kernel has Ryzen working correctly.

smt_configuration_error_in_windows_found_to_be

Will be good if they can get that issue sorted early on.
I doubt if single threaded performance will improve much (if any- in Linux tests it's single threaded performance was still no where near Kaby Lake, however it is much, much improved on all AMD has done before and does bring it up to around the previous Intel generation); however it will give a good boost to multicore performance, particularly with games. At present Ryzen needs HyperThreading off to perform well on games that can make use of multiple cores (which is pretty much all of them these days, although some make better use of extra CPU cores than others).
I suspect that once software developers start optimizing for Ryzen we should see some significant boosts to single threaded performance.




Even with it's relatively poor single threaded performance, if you have an application that can take advantage of the Ryzen architecture for good single threaded performance, and it also supports multiple threads, you do get excellent performance.

Grant
Darwin NT
ID: 1852995 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1853072 - Posted: 5 Mar 2017, 0:16:47 UTC - in response to Message 1852527.  

I'm waiting delivery of a 1700, Asus x370 pro motherboard and 8GB RAM.

As to the question why an extra 80 for the 1700X when no cooler included - 'cos they make more money that way.....
or Pricing sponsored by <cooler manufacturer's name>...

Replying to myself basically about my thoughts on the why the 1700X and 1800X were so much more than the 1700. An article over at PCPerspective shot down my argument that the X processors had XFR and the 1700 did not. Not true evidently. The 1700 has XFR also. Only difference is that it can boost only two 50 Mhz steps compared to two 100 Mhz steps for the 1700X and 1800X. Looking more and more that the smart selection for the 8C/16T chips is the 1700. It is overclocking to 95% of the X chips at $70-170 less cost.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1853072 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1853076 - Posted: 5 Mar 2017, 0:31:01 UTC - in response to Message 1848281.  

From the PCper architectural details, given on their last podcast, the half clocked AVX256 might be something to consider for many. For our purposes mostly feeding GPUs, I think/hope that'll be much of a muchness. Had to be a compromise. Pretty sure I could live with half speed AVX256 at the price and power advantages over the equivalent Intels. We'll see.


. . That is my way of thinking. From what I have heard there is no performance loss in their SSE3.0 implementation so maybe use that for crunching. I intend to run some comparisons in that hope. While their 128bit AVX can only be half as fast as 256Bit AVX would have been, their SSE3.0 might bridge the gap and make them very useful even as CPU crunchers. And they would allow me to squeeze that little extra out of my 2 x 1060s.

Stephen

:)
ID: 1853076 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1853655 - Posted: 7 Mar 2017, 2:56:12 UTC

I've done a little bit of looking around and research and found Cinebench results for the plain 1700 (and the X and 1800X, for that matter) that don't look entirely promising for SMT/HT, if I'm honest.

For example, the 1700 got 149 single, 1410 multi (~9.5x speedup), which I assume is on all 16 threads. 1700X seems to be about 9.75x and 1800x is pretty much 10.0x.

What I haven't seen though, is anywhere that has done Cinebench with SMT/HT turned off. That's the one that I'm really interested in seeing, and I would expect it to be above 7.00x. But what the speed-up seems to suggest to me with 16 threads versus one single thread... SMT/HT looks to be practically useless. Sure, it's better than 8x, but it seems to suffer pretty hard and doesn't result in much of a gain.

Based on the scores for the plain 1700 that I'm seeing, 16 threads only having the effectiveness of 9.5 threads is basically worse than the Bulldozer's loss from using all the available cores (my FX-6100 ends up at 4.1x when using all 6 cores, but gives me 2.95x when I specify that I only want it to use 3).

So it almost seems like it would be a detriment to run with SMT/HT turned on, even if overall, it yields more throughput--any single-threaded task will end up suffering when more than 8 threads are being used, just like how Bulldozer suffers when there are more threads in progress than there are pairs of cores.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1853655 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1853664 - Posted: 7 Mar 2017, 4:05:12 UTC - in response to Message 1853655.  

I've done a little bit of looking around and research and found Cinebench results for the plain 1700 (and the X and 1800X, for that matter) that don't look entirely promising for SMT/HT, if I'm honest.

For example, the 1700 got 149 single, 1410 multi (~9.5x speedup), which I assume is on all 16 threads. 1700X seems to be about 9.75x and 1800x is pretty much 10.0x.

What I haven't seen though, is anywhere that has done Cinebench with SMT/HT turned off. That's the one that I'm really interested in seeing, and I would expect it to be above 7.00x. But what the speed-up seems to suggest to me with 16 threads versus one single thread... SMT/HT looks to be practically useless. Sure, it's better than 8x, but it seems to suffer pretty hard and doesn't result in much of a gain.

Based on the scores for the plain 1700 that I'm seeing, 16 threads only having the effectiveness of 9.5 threads is basically worse than the Bulldozer's loss from using all the available cores (my FX-6100 ends up at 4.1x when using all 6 cores, but gives me 2.95x when I specify that I only want it to use 3).

So it almost seems like it would be a detriment to run with SMT/HT turned on, even if overall, it yields more throughput--any single-threaded task will end up suffering when more than 8 threads are being used, just like how Bulldozer suffers when there are more threads in progress than there are pairs of cores.

I'll have to spend some time looking. But I'm sure that with the latest discoveries regarding thread scheduling, how apps handle the L3 cache and such, I'm almost positive I've run across Cinebench benchmarks done with SMT off for both S15 and R15 tests in the last day or so. Now just have to find you the links again.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1853664 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1853674 - Posted: 7 Mar 2017, 4:56:50 UTC - in response to Message 1853664.  

So it almost seems like it would be a detriment to run with SMT/HT turned on, even if overall, it yields more throughput--any single-threaded task will end up suffering when more than 8 threads are being used, just like how Bulldozer suffers when there are more threads in progress than there are pairs of cores.

It has always been the case with HyperThreading that the performance of a single thread is less than with HyperThreading off. But as with Seti crunching on the CPU (and multiple WUs on the GPU depending on the application), the longer run time per thread is offset by more threads been done. The end result is more work done, which is what's important.




Grant
Darwin NT
ID: 1853674 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1855278 - Posted: 13 Mar 2017, 14:08:44 UTC - in response to Message 1853674.  

So it almost seems like it would be a detriment to run with SMT/HT turned on, even if overall, it yields more throughput--any single-threaded task will end up suffering when more than 8 threads are being used, just like how Bulldozer suffers when there are more threads in progress than there are pairs of cores.

It has always been the case with HyperThreading that the performance of a single thread is less than with HyperThreading off. But as with Seti crunching on the CPU (and multiple WUs on the GPU depending on the application), the longer run time per thread is offset by more threads been done. The end result is more work done, which is what's important.


. . The captions do not specify but I get the impression those graphs are both from tests with HyperThreading turned ON. I would like to see the graphs of the same tests with HT off. Granted with only a single task running on one thread it should in theory be much the same as running without HT, but I would like to see it put to the test anyway. It's all well and good to say the output will be higher with it on, but that presumes that runtimes with HT on and multiple threads running will be significantly LESS than double the runtimes with HT off, so I would like to see those figures. That would tell the story.

Stephen

?
ID: 1855278 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1859526 - Posted: 5 Apr 2017, 5:47:19 UTC

Hi All,

Got my Ryzen system built and now I'm playing with S@H as that is my most important benchmark.

First things first, I see some people reporting a BOINC benchmark of 4500 MIPS for the Ryzen 1800, but I got 5000 right out of the box, so I just want to put that on the record.

Anyway, here's my question, what is the optimal thread count for Ryzen? There must be a justifiable theoretical answer. 8 or 16?

(Here's why I'm confused. The FX-line had one FPU per core pair, so it only made sense that simultaneous thread efficiency would peak around 1/2 the number of cores. But, what I read about Ryzen is that it has *two* FP units per core. Now, this is very confusing. Let's put aside the practical issues of feeding cache from main memory. Does this mean we expect to see throughput increasing beyond 1/2 num cpus? Up to roughly two simultaneous threads per core, even?

I'm confused because Ryzen BOINC benchmark is two times greater than FX, and if I ran twice as many threads per core, *and* twice as many cores, that makes 8 times the throughput, which is crazy. Over just three days, I'm pretty sure it's not even going to reach 4 times, judging by the new curve in RAC; I've watched it recover many times after outages or problems in the past and it has a characteristic shape.

I still have BOINC set to 50% CPUS as I'm very suspicious. I would just try to run more but it's really hard to understand BOINC performance. I've been running BOINC for a long time and I've never seen a better measure of performance than RAC, but that takes about 30 days to stabilize. credit/sec fluctuates wildly between tasks, and, besides, running 16 threads for a few hours would make a lot of data to gather just to compute the average. It's straight forward, I'm sure; I could do that.)
ID: 1859526 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1859535 - Posted: 5 Apr 2017, 6:09:13 UTC - in response to Message 1859526.  

Hi Paul, and welcome to the thread. All interesting points you've stated. I too was questioning just how many "real" cores to put into play since the Ryzen architecture is not FPU hamstrung like its predecessor the FX. For the moment since I put my Ryzen system online I have limited the number of SETI tasks to 12 with typical 4 GPU tasks running. That means the CPU is mainly running at 75% utilization. I typically have 3 to 4 GPU tasks running since I also crunch on the GPU for Einstein and MilkyWay. That means that I can have 8-9 CPU tasks running at the same time. I do limit the CPU tasks to physical cores through affinity in ProcessLasso. That also means I am not running all of my CPU tasks on physical cores at all times since there all only 8 physical cores. I am still feeling the system out and trying to take baby steps first before I let it fully loose on BOINC.

I have already seen a large impact on RAC since it went online. Most evident in the large decrease in processing time for CPU tasks over my FX processors. And also interesting is the 1700X's preference for BLC CPU tasks which run a half hour quicker than any normal range Arecibo task. I have no idea why except to guess the AVX processing speed of the 1700X is multiple times faster than on my FX processors.

I haven't tried to track a specific CPU task that is running on one of the virtual cores through to completion to compare it to one run on a physical core yet. But I really haven't seen any very visible outlier in running time for the same AR range CPU tasks that I can pinpoint it to being run on a virtual core. I have my suspicion that with the core leveling going on in the chip that it probably is irrelevant whether the task runs on a "real" or "virtual" core with regard to task completion times. I am going to stay at 12 tasks run concurrently for a while longer before I begin to load the chip to 100%.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1859535 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1866789 - Posted: 11 May 2017, 20:27:23 UTC

*blows dust off thread*

I wasn't entirely sure where to put this--make a new thread, or add to this one, or put it in the Panic Mode thread? I chose to put it here in this one instead.

16-core Ryzen "Whitehaven" details leaked

AMD’s upcoming 16 core enthusiast Ryzen “Whitehaven” CPUs have been spotted. The new processors will come in variations of up to 16 cores and 32 threads and will support quad-channel DDR4 memory.

AMD Aiming For Intel’s Jugular – Rolling Out 16 Core Behemoths For The Enthusiast CPU Market In Mid 2017

The upcoming family of enthusiast Ryzen CPUs are considerably larger than the current Ryzen lineup and will not be compatible with AM4 due to this fact. It’s still unclear what the new socket – internally code named “S3” — will officially be named but we do know it’s going to be LGA, rather than PGA.

Base Clock: 3.1 GHz
Boost Clock: 3.6 GHz
L3 Cache: 64 MB
DDR channels: quad

AMD’s new enthusiast desktop CPU platform will reportedly be showcased at Computex (May 30th – June 2nd). According to sources in the upstream supply chain, the S3 platform will launch alongside AMD’s 32 core Naples processors in the middle of the year.

Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1866789 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1866844 - Posted: 12 May 2017, 4:05:11 UTC - in response to Message 1866789.  

*blows dust off thread*

I wasn't entirely sure where to put this--make a new thread, or add to this one, or put it in the Panic Mode thread? I chose to put it here in this one instead.

16-core Ryzen "Whitehaven" details leaked

AMD’s upcoming 16 core enthusiast Ryzen “Whitehaven” CPUs have been spotted. The new processors will come in variations of up to 16 cores and 32 threads and will support quad-channel DDR4 memory.

AMD Aiming For Intel’s Jugular – Rolling Out 16 Core Behemoths For The Enthusiast CPU Market In Mid 2017

The upcoming family of enthusiast Ryzen CPUs are considerably larger than the current Ryzen lineup and will not be compatible with AM4 due to this fact. It’s still unclear what the new socket – internally code named “S3” — will officially be named but we do know it’s going to be LGA, rather than PGA.

Base Clock: 3.1 GHz
Boost Clock: 3.6 GHz
L3 Cache: 64 MB
DDR channels: quad

AMD’s new enthusiast desktop CPU platform will reportedly be showcased at Computex (May 30th – June 2nd). According to sources in the upstream supply chain, the S3 platform will launch alongside AMD’s 32 core Naples processors in the middle of the year.

Thanks for the link. I have been offline for a day and had missed the news. I was aware of the server chip Naples architecture but had heard nothing about the "Whitehaven" chip. Interesting that it will be a LGA socket. Should give Intel a good run for the money.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1866844 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1869487 - Posted: 26 May 2017, 7:02:43 UTC - in response to Message 1859526.  

Hi All,

Got my Ryzen system built and now I'm playing with S@H as that is my most important benchmark.

First things first, I see some people reporting a BOINC benchmark of 4500 MIPS for the Ryzen 1800, but I got 5000 right out of the box, so I just want to put that on the record.

Anyway, here's my question, what is the optimal thread count for Ryzen? There must be a justifiable theoretical answer. 8 or 16?

(Here's why I'm confused. The FX-line had one FPU per core pair, so it only made sense that simultaneous thread efficiency would peak around 1/2 the number of cores. But, what I read about Ryzen is that it has *two* FP units per core. Now, this is very confusing. Let's put aside the practical issues of feeding cache from main memory. Does this mean we expect to see throughput increasing beyond 1/2 num cpus? Up to roughly two simultaneous threads per core, even?

I'm confused because Ryzen BOINC benchmark is two times greater than FX, and if I ran twice as many threads per core, *and* twice as many cores, that makes 8 times the throughput, which is crazy. Over just three days, I'm pretty sure it's not even going to reach 4 times, judging by the new curve in RAC; I've watched it recover many times after outages or problems in the past and it has a characteristic shape.

I still have BOINC set to 50% CPUS as I'm very suspicious. I would just try to run more but it's really hard to understand BOINC performance. I've been running BOINC for a long time and I've never seen a better measure of performance than RAC, but that takes about 30 days to stabilize. credit/sec fluctuates wildly between tasks, and, besides, running 16 threads for a few hours would make a lot of data to gather just to compute the average. It's straight forward, I'm sure; I could do that.)


Okay, the confusion about Bulldozer suffered from exuberant marketing due to it's internal design when they decided to count maximum number of simultaneous threads and market it as cores rather than the number of Bulldozer "Modules", the "Module" being more analogous to Intel's Hyper-Threading enabled core or today's Ryzen Simultaneous MultiThreading core in terms of functionality.

There is no functional difference between them. Each can execute two threads at the same time and each thread shares use of a single FPU. The physical difference is that Bulldozer decided to fix partition the ALU (Arithmetic Logic Unit) as two "Integer Cores", one for each thread that could run, where Intel and Ryzen consist of a much larger ALU that could achieve higher overall throughput when two unrelated threads are scheduled at the same time through dynamic sharing of resources.

The i7-7700K, the Ryzen 1500X and the FX-8350 have four FPUs and each can run 8 threads.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1869487 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1869526 - Posted: 26 May 2017, 13:57:10 UTC
Last modified: 26 May 2017, 13:57:47 UTC

AMD Ryzen R7 RAC seems impressively good
ID: 1869526 · Report as offensive
Paul

Send message
Joined: 17 May 99
Posts: 72
Credit: 42,977,964
RAC: 43
United States
Message 1869590 - Posted: 26 May 2017, 17:21:39 UTC - in response to Message 1869487.  
Last modified: 26 May 2017, 17:22:33 UTC


Okay, the confusion about Bulldozer suffered from exuberant marketing due to it's internal design when they decided to count maximum number of simultaneous threads and market it as cores rather than the number of Bulldozer "Modules", the "Module" being more analogous to Intel's Hyper-Threading enabled core or today's Ryzen Simultaneous MultiThreading core in terms of functionality.

There is no functional difference between them. Each can execute two threads at the same time and each thread shares use of a single FPU. The physical difference is that Bulldozer decided to fix partition the ALU (Arithmetic Logic Unit) as two "Integer Cores", one for each thread that could run, where Intel and Ryzen consist of a much larger ALU that could achieve higher overall throughput when two unrelated threads are scheduled at the same time through dynamic sharing of resources.

The i7-7700K, the Ryzen 1500X and the FX-8350 have four FPUs and each can run 8 threads.


Thanks for your help. I think I understand what you are saying, but still have questions.

I understand the distinction between threads and cores. As you say, HT has been around for a long time and we learned the difference very well when Intel did it.

Wikipedia explicitly states that there are "two floating-point units per core". Now, I checked the cited reference and I think I see why they say that. There is *one* FP rename, but two sets of compute units behind it. The 1500 has 8 cores, but FX-3850 had 4.

1) Are you saying that they started counting cores differently between FX and Ryzen?

2) I think my question stands: Can one core tackle two FP threads, simultaneously?
ID: 1869590 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1869607 - Posted: 26 May 2017, 19:17:25 UTC - in response to Message 1869590.  


Okay, the confusion about Bulldozer suffered from exuberant marketing due to it's internal design when they decided to count maximum number of simultaneous threads and market it as cores rather than the number of Bulldozer "Modules", the "Module" being more analogous to Intel's Hyper-Threading enabled core or today's Ryzen Simultaneous MultiThreading core in terms of functionality.

There is no functional difference between them. Each can execute two threads at the same time and each thread shares use of a single FPU. The physical difference is that Bulldozer decided to fix partition the ALU (Arithmetic Logic Unit) as two "Integer Cores", one for each thread that could run, where Intel and Ryzen consist of a much larger ALU that could achieve higher overall throughput when two unrelated threads are scheduled at the same time through dynamic sharing of resources.

The i7-7700K, the Ryzen 1500X and the FX-8350 have four FPUs and each can run 8 threads.


Thanks for your help. I think I understand what you are saying, but still have questions.

I understand the distinction between threads and cores. As you say, HT has been around for a long time and we learned the difference very well when Intel did it.

Wikipedia explicitly states that there are "two floating-point units per core". Now, I checked the cited reference and I think I see why they say that. There is *one* FP rename, but two sets of compute units behind it. The 1500 has 8 cores, but FX-3850 had 4.

1) Are you saying that they started counting cores differently between FX and Ryzen?

2) I think my question stands: Can one core tackle two FP threads, simultaneously?


1) Yes, cores are counted the same as Intel for Ryzen. The "core" count for FX was really "modules" with a 4 X 2 configuration. I believe most of the monitoring programs now interrogate FX as 4 cores with HT. I know that the author of SIV decided to change to that definition in past versions after consulting with me and running tests.
Ryzen now is considered to have a true modern definition of core counts. Each core capable of HT or SMT thread scheduling. The IOMMU and NUMA nodes both identify it as having 8 cores.

2) I've always liked the "deep-dive" that Anandtech does of CPU architectures. Each core has its own FPU capable of two FPU threads, one listed as "schedulable" and the other listed as "non-schedulable" So, yes to answer your question, each core can process two FPU threads with each taking a turn on clock scheduling.

But, just as with any HT core, irregardless of Intel or AMD architecture, the "real" or physical core always get prioritized over a virtual core in practicality.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1869607 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1869701 - Posted: 27 May 2017, 2:50:47 UTC - in response to Message 1869607.  

Following up with some reading of Agner Fog's latest optimisation guides, looks as though IPC is higher than any Intel processor, so the memory latency + frequency issues will dominate potentially for some time. Having watched through portions of an AMD livestream, while looking for info on the IOMMU updates due for AGESA 1006 update to fix groupings, I noted that they mentioned they've covered 'standard'JEDEC compatibility, and are moving onto the custom XMP2 style support, with memory with the Samsung B-Die having been the easy one.

Most likely FFTW tweaks will end up being incorporated at some point, then as things settle MT apps will need to be produced to make better use of these, in addition to figuring out some additional optimisations. Apparently the AVX2 implementation, despite being effectively half clocked, is faster than separate faster-clocked SSEx, because it preserves entries/fetches/decodes in the instruction pipeline.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1869701 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1869721 - Posted: 27 May 2017, 4:45:41 UTC

All I know is that my Ryzen 1700X simply burns through the BLC CPU tasks using the r3330 AVX version. I don't know why that app finds the BLC tasks so easy to process compared to Arecibo CPU tasks. I guess something about the tasks data structure is especially amenable to AVX pathway. I know I don't get anywhere near the same process time using that app on my old FX processors. Ryzen really likes AVX code it seems. Interesting to hear that it may run especially well on AVX2 code pathways also.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1869721 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1869734 - Posted: 27 May 2017, 6:44:57 UTC - in response to Message 1869607.  


But, just as with any HT core, irregardless of Intel or AMD architecture, the "real" or physical core always get prioritized over a virtual core in practicality.

To be picky, the OS only sees "logical cores", all it knows is to assign one thread to every even/odd pair (0,1- 2,3 - etc) before assigning a 2nd thread to a pair, doesn't matter which core, the even or odd numbered one.

Ryzen adds the complication to the scheduler that the current R5 and R7 versions are really two cores complexes (CCX) each with 4 cores and 8MB of L3 cache (six core versions are 3+3 and quad cores are 2+2). If threads need to communicate to each other or access data on the other CCX's cache using the chip's "Infinity Fabric" (I hate AMD marketing), it takes 2 1/2 times longer than if it's on the same CCX. This is why benchmarks that test cache performance are giving odd results on Ryzen compared to Intel. The Windows scheduler does appear to assign single threads to all the cores on one CCX before it starts assigning them to the other and then double threads up per core as usual.

Now since, at least the CPU version of the S@H cruncher, is essentially running on a single thread, it shouldn't really impact it. However threads can move from one core to another while the OS suspends one thread to allow other ones to run and if the thread moves from one CCX to another, there is going to be a performance penalty if data it needs is residing in the L3 cache of the other CCX but it's probably not much. Plus it sort of can be fixed in the software of the cruncher. Also at the OS level you can tell an application to only run on the same core it starts on (or a group of cores) which is how SIV does it.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1869734 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1869808 - Posted: 27 May 2017, 17:19:34 UTC - in response to Message 1869734.  

Yes, the system will use whatever it wants. Unless you provide guidance in exactly how you want it to work. I only crunch on core #0,2,4,6,8,10,12,14 for CPU tasks still. I still haven't gotten around to try running on the odd numbered cores in an experiment. I set the CPU app affinity to even cores in ProcessLasso. The odd numbered cores get to do support duty for the GPU tasks and running the desktop. I have tried to minimize the times required to traverse the Data Fabric by running my memory at 3200 Mhz. I have hopes that later next month when my motherboard gets the AGESA 1.0.0.6 F/W update in a new BIOS that I might be able to increase the memory clock a bit more and further reduce the CCX communication penalty.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1869808 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : CES 2017 -- AMD RYZEN CPU


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.