Ryzen 1700x Build

Message boards : Number crunching : Ryzen 1700x Build
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile LooneyTunes
Avatar

Send message
Joined: 12 Dec 08
Posts: 51
Credit: 16,807,684
RAC: 0
Australia
Message 1864180 - Posted: 28 Apr 2017, 4:27:04 UTC - in response to Message 1864179.  

I figure the 960 and the 1050Ti are pretty close in compute units, so a single tuning parameter will likely work for both. I'm not sure if they have equal amounts of memory but I would go with this simple app_config.xml command line:

<cmdline>-sbs 512 -period_iterations_num 2 -tt 1500</cmdline>

If the system gets too laggy, then drop the -tt value down to 300 or something similar. If still too laggy, then increase -period_iterations to 5 or 10. If the first value works well without system lag, then try -period_iterations 1. If you are running only single task and the cards have at least 4GB of VRAM, try -SBS 1024. Also the best way to tune for the SBS buffer size is to look at your SoG task stderr.txt output and look at the tuning values that Raistmer prints out for tried tuning values in the output. Look for the size of the FftLength value that has the lowest total Sum time, Delta of 1 and highest N value. That is the value of the -SBS buffer size you should be using. Don't exceed the actual total amount of VRAM on each card.


Thanks for that Keith. I will give it a go.
"I know I am insignificant...Just look how many stars their are...!"

ID: 1864180 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864348 - Posted: 28 Apr 2017, 17:39:04 UTC

And if you find the time to give Raistmer's explanation a read, it will enlighten your knowledge about what all the tuning parameters are for, how to implement them, and the consequences of improper tuning.

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864348 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1864391 - Posted: 28 Apr 2017, 21:15:00 UTC - in response to Message 1864348.  

And if you find the time to give Raistmer's explanation a read, it will enlighten your knowledge about what all the tuning parameters are for, how to implement them, and the consequences of improper tuning.

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view


Its a very good starting point indeed but one also has to learn how to read the given values from stderr correctly.
In most cases much more optimisation is possible on newer cards but its certainly better than no app tuning at all.


With each crime and every kindness we birth our future.
ID: 1864391 · Report as offensive
Un4given Project Donor

Send message
Joined: 1 May 04
Posts: 19
Credit: 7,983,035
RAC: 13
United States
Message 1864634 - Posted: 29 Apr 2017, 23:51:35 UTC

Nice system Looney. I too just built a 1700X and am loving it. While I definitely work the GPU when work units are available, I hate letting the CPU go to waste. My system is running fully stock CPU speeds (3.4 GHz @ 3.5 GHz boost) and this thing is running circles around my older Intel 6c/12t at 4.1 GHz. I won't even mention how badly this is stomping the old FX CPU it replaced, because the difference is just embarrassing.

But just to give you an idea, my Intel system is running 9 CPU units at once, and the 1700X is running 12 units, and even at a 600 MHz deficit is beating my Intel box by 10-15 minutes per work unit.
ID: 1864634 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1864645 - Posted: 30 Apr 2017, 1:06:35 UTC - in response to Message 1864634.  

But just to give you an idea, my Intel system is running 9 CPU units at once, and the 1700X is running 12 units, and even at a 600 MHz deficit is beating my Intel box by 10-15 minutes per work unit.

The stats for the computers themselves report that the Intel has faster run times per core. The Ryzen has more cores, hence more output per hour.
Grant
Darwin NT
ID: 1864645 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1864652 - Posted: 30 Apr 2017, 1:34:15 UTC - in response to Message 1864179.  

I figure the 960 and the 1050Ti are pretty close in compute units, so a single tuning parameter will likely work for both. I'm not sure if they have equal amounts of memory but I would go with this simple app_config.xml command line:

<cmdline>-sbs 512 -period_iterations_num 2 -tt 1500</cmdline>

If the system gets too laggy, then drop the -tt value down to 300 or something similar. If still too laggy, then increase -period_iterations to 5 or 10. If the first value works well without system lag, then try -period_iterations 1. If you are running only single task and the cards have at least 4GB of VRAM, try -SBS 1024. Also the best way to tune for the SBS buffer size is to look at your SoG task stderr.txt output and look at the tuning values that Raistmer prints out for tried tuning values in the output. Look for the size of the FftLength value that has the lowest total Sum time, Delta of 1 and highest N value. That is the value of the -SBS buffer size you should be using. Don't exceed the actual total amount of VRAM on each card.


. . Somehow that seems wrong to me. In the stderr for my GTX950 which is a 2GB card, the -sbs value you describe would be 2048 but I am sure that will overcommit the VRAM. What does the last comment on each line signify? I think maybe you should only consider the values where the final comment is high_perf. For my 950 that is -sbs 512, which, when I am running doubles, makes much more sense to me. A value of -sbs 256 has a 1 sec shorter sum but the average time is doubled and the N value is halved.

. . Also I would try a value of iterations=10 and tt=400 to see how smoothly that runs ....

Stephen

:)
ID: 1864652 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864653 - Posted: 30 Apr 2017, 1:46:19 UTC - in response to Message 1864634.  

Nice system Looney. I too just built a 1700X and am loving it. While I definitely work the GPU when work units are available, I hate letting the CPU go to waste. My system is running fully stock CPU speeds (3.4 GHz @ 3.5 GHz boost) and this thing is running circles around my older Intel 6c/12t at 4.1 GHz. I won't even mention how badly this is stomping the old FX CPU it replaced, because the difference is just embarrassing.

But just to give you an idea, my Intel system is running 9 CPU units at once, and the 1700X is running 12 units, and even at a 600 MHz deficit is beating my Intel box by 10-15 minutes per work unit.

Have you given any thought to overclocking your 1700X? Or are you just comfortable letting it run stock with the power savings of the new platform? I have my 1700X running at 3.85 Ghz with just the core multiplier. The chip is running at the default 1.35V VID voltage in the BIOS. Under load the actual voltage mostly runs between 1.306 and 1.312V. I have my RAM running at 3200 Mhz so that has the Data Fabric which connects the CCX's running at 1600 Mhz which helps the computations. I find that Ryzen does really well with the AVX CPU app, especially the BLC tasks. And you are correct, the R7 stomps my old FX systems.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864653 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864658 - Posted: 30 Apr 2017, 1:58:51 UTC - in response to Message 1864652.  

I figure the 960 and the 1050Ti are pretty close in compute units, so a single tuning parameter will likely work for both. I'm not sure if they have equal amounts of memory but I would go with this simple app_config.xml command line:

<cmdline>-sbs 512 -period_iterations_num 2 -tt 1500</cmdline>

If the system gets too laggy, then drop the -tt value down to 300 or something similar. If still too laggy, then increase -period_iterations to 5 or 10. If the first value works well without system lag, then try -period_iterations 1. If you are running only single task and the cards have at least 4GB of VRAM, try -SBS 1024. Also the best way to tune for the SBS buffer size is to look at your SoG task stderr.txt output and look at the tuning values that Raistmer prints out for tried tuning values in the output. Look for the size of the FftLength value that has the lowest total Sum time, Delta of 1 and highest N value. That is the value of the -SBS buffer size you should be using. Don't exceed the actual total amount of VRAM on each card.


. . Somehow that seems wrong to me. In the stderr for my GTX950 which is a 2GB card, the -sbs value you describe would be 2048 but I am sure that will overcommit the VRAM. What does the last comment on each line signify? I think maybe you should only consider the values where the final comment is high_perf. For my 950 that is -sbs 512, which, when I am running doubles, makes much more sense to me. A value of -sbs 256 has a 1 sec shorter sum but the average time is doubled and the N value is halved.

. . Also I would try a value of iterations=10 and tt=400 to see how smoothly that runs ....

Stephen

:)

I looked his cards up after my post. The 960 only has 2 GB, the 1050Ti has 4 GB. Go read Raistmer's post again. The high_perf runs are just for determining which subsequent tunings should be tried. Doesn't mean that they perform the fastest. I guess you didn't comprehend my last sentence. "Don't exceed the actual total amount of VRAM on each card" I meant that you shouldn't try to use a tuning that exceeds the available RAM on the card, taking into account whether you are doing singles or multiples. Even if a tuning run shows that a 8192 MB -SBS buffer would produce the lowest sum time, you obviously can't use that value with a card that only has 2GB total of RAM on board,
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864658 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864661 - Posted: 30 Apr 2017, 2:10:16 UTC

You really should give the Lunatics installer a try to get access to the better apps for the CPU over the stock apps. You are not running very optimized with any platform, Intel or AMD. Take a look at my systems for CPU comparison times.

Pipqueek FX-8350@4.4Ghz

Numbskull R7 1700X@3.85Ghz
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864661 · Report as offensive
Un4given Project Donor

Send message
Joined: 1 May 04
Posts: 19
Credit: 7,983,035
RAC: 13
United States
Message 1864669 - Posted: 30 Apr 2017, 3:08:34 UTC - in response to Message 1864645.  
Last modified: 30 Apr 2017, 3:13:19 UTC

@Grant

The stats for the computers themselves report that the Intel has faster run times per core. The Ryzen has more cores, hence more output per hour.

My Intel unit is a Sandy Bridge-E, so it isn't brand new, but all evidence has pointed to Intel not working on major performance improvements, so much as power usage improvements. Regardless, say what you will, I'm watching these systems side-by-side, and this Ryzen is kicking the shit out of this Intel unit, despite a 600 MHz deficit. At this performance and price point, I'll take Ryzen all day long, every damn day.
ID: 1864669 · Report as offensive
Un4given Project Donor

Send message
Joined: 1 May 04
Posts: 19
Credit: 7,983,035
RAC: 13
United States
Message 1864670 - Posted: 30 Apr 2017, 3:11:13 UTC - in response to Message 1864653.  
Last modified: 30 Apr 2017, 3:14:22 UTC

@Keith

"Have you given any thought to overclocking your 1700X? Or are you just comfortable letting it run stock with the power savings of the new platform? I have my 1700X running at 3.85 Ghz with just the core multiplier. The chip is running at the default 1.35V VID voltage in the BIOS. Under load the actual voltage mostly runs between 1.306 and 1.312V. I have my RAM running at 3200 Mhz so that has the Data Fabric which connects the CCX's running at 1600 Mhz which helps the computations. I find that Ryzen does really well with the AVX CPU app, especially the BLC tasks. And you are correct, the R7 stomps my old FX systems."

Waiting for the availability of mounting hardware for my Deepcool Lucifer V2 HSF. Right now I'm limited to using an old AMD OEM HSF from previous generation A10/A12 CPUs. :/
ID: 1864670 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864674 - Posted: 30 Apr 2017, 3:31:02 UTC - in response to Message 1864670.  


Waiting for the availability of mounting hardware for my Deepcool Lucifer V2 HSF. Right now I'm limited to using an old AMD OEM HSF from previous generation A10/A12 CPUs. :/

Ahh, makes sense now. That cooler is well equipped to handle the 95-120W dissipation of the overclocked 1700X. Should have no issues getting to at least 3.8Ghz on stock voltages.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864674 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1864676 - Posted: 30 Apr 2017, 3:44:03 UTC - in response to Message 1864669.  
Last modified: 30 Apr 2017, 3:44:19 UTC

My Intel unit is a Sandy Bridge-E, so it isn't brand new, but all evidence has pointed to Intel not working on major performance improvements, so much as power usage improvements.

?
Kaby Lake outperforms the previous CPUs, as they outperformed the previous series.
The jumps between series aren't as great as they were, but they are still there, even though they have limited their power usage.
Intel still leads AMD for single threaded performance.


But since they've closed the gap (significantly), their 16 threads will result in more work being done than with 12 threads.


Regardless, say what you will, I'm watching these systems side-by-side, and this Ryzen is kicking the shit out of this Intel unit, despite a 600 MHz deficit.

More cores = more work done per hour, even if they are somewhat slower to process each WU than the lesser cored machine. Only a huge difference in single threaded performance would result in a system with many threads being slower than one with not nearly as many threads.
Your i7 system
Average processing rate 18.98 GFLOPS

Your Ryzen system
Average processing rate 17.61 GFLOPS

The difference is slight, but there.
So the Intel system takes less time to crunch a given WU. The Ryzen system is able to process more at the same time, so it does more work per hour even though it is (slightly) slower with each WU.
As Keith suggested, running an optimized application would result in better output from both systems.

At this performance and price point, I'll take Ryzen all day long, every damn day.

I'm not arguing about that, just your per core performance claims that aren't supported by your systems own current processing numbers.
Grant
Darwin NT
ID: 1864676 · Report as offensive
Un4given Project Donor

Send message
Joined: 1 May 04
Posts: 19
Credit: 7,983,035
RAC: 13
United States
Message 1864678 - Posted: 30 Apr 2017, 3:50:33 UTC - in response to Message 1864674.  


Waiting for the availability of mounting hardware for my Deepcool Lucifer V2 HSF. Right now I'm limited to using an old AMD OEM HSF from previous generation A10/A12 CPUs. :/

Ahh, makes sense now. That cooler is well equipped to handle the 95-120W dissipation of the overclocked 1700X. Should have no issues getting to at least 3.8Ghz on stock voltages.


Definitely hoping to hit that 3.8+ mark. Their web site says that mounting hardware isn't due until early May.
ID: 1864678 · Report as offensive
Un4given Project Donor

Send message
Joined: 1 May 04
Posts: 19
Credit: 7,983,035
RAC: 13
United States
Message 1864681 - Posted: 30 Apr 2017, 3:53:22 UTC - in response to Message 1864676.  

My Intel unit is a Sandy Bridge-E, so it isn't brand new, but all evidence has pointed to Intel not working on major performance improvements, so much as power usage improvements.

?
Kaby Lake outperforms the previous CPUs, as they outperformed the previous series.
The jumps between series aren't as great as they were, but they are still there, even though they have limited their power usage.
Intel still leads AMD for single threaded performance.


But since they've closed the gap (significantly), their 16 threads will result in more work being done than with 12 threads.


Regardless, say what you will, I'm watching these systems side-by-side, and this Ryzen is kicking the shit out of this Intel unit, despite a 600 MHz deficit.

More cores = more work done per hour, even if they are somewhat slower to process each WU than the lesser cored machine. Only a huge difference in single threaded performance would result in a system with many threads being slower than one with not nearly as many threads.
Your i7 system
Average processing rate 18.98 GFLOPS

Your Ryzen system
Average processing rate 17.61 GFLOPS

The difference is slight, but there.
So the Intel system takes less time to crunch a given WU. The Ryzen system is able to process more at the same time, so it does more work per hour even though it is (slightly) slower with each WU.
As Keith suggested, running an optimized application would result in better output from both systems.

At this performance and price point, I'll take Ryzen all day long, every damn day.

I'm not arguing about that, just your per core performance claims that aren't supported by your systems own current processing numbers.


My variances may be the result of running more threads than physical cores, and possibly due to OS variances; I'm on W10 on Ryzen and still W7 on Intel system. I'm just going by the BOINC program reported numbers.
ID: 1864681 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864683 - Posted: 30 Apr 2017, 4:00:08 UTC - in response to Message 1864676.  


I'm not arguing about that, just your per core performance claims that aren't supported by your systems own current processing numbers.

And I am not arguing about your summation. No contest between current Ryzen IPC and current Intel IPC. Intel still rules the roost with regard to single thread performance. Partly due to efficiencies in thread scheduling and micro-op performance and majorly due to much higher average core clock speeds.

If and when Ryzen 2 can use better process manufacturing that allows higher clocks over the the current fabs, they might be able to match Intel. But Intel continues to march onward with new processes and designs too. Always a moving target.

What I appreciate is the price point that R7 Ryzen hits against commensurate Intel high core count products. Just a good bang for the buck deal in my estimation.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864683 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1864685 - Posted: 30 Apr 2017, 4:08:55 UTC - in response to Message 1864681.  
Last modified: 30 Apr 2017, 4:10:40 UTC

My variances may be the result of running more threads than physical cores, and possibly due to OS variances; I'm on W10 on Ryzen and still W7 on Intel system. I'm just going by the BOINC program reported numbers.

RAC is not a good way to judge performance.
It takes a long time to settle down, about 2 months. And that's without server outages.
The time it takes to process a given WU is the best, however there are 2 main types of WU- Arecibo & Green Bank Telescope (GBT). Within Arecibo there are 3 different types of WU- Shorties which get processed very quickly, Mid range which take a medium amount of time to process and VLARs which take a long time to process compared to the other 2.
With GBT, pretty much all the WUs are VLAR, however there can still be considerable differences in processing times for the different WUs, and they all differ from the Arecibo WU crunching times.
So the only way to compare performace accurately is to compare run time for WUs of the same type; it's not possible to compare different types of WUs to get an idea of how a system is performing.

A quicker method to see how things are going is to check the APR (Average Processing Rate) in the Details, Application details- Show, for each system.

Your i7 system
Average processing rate 18.98 GFLOPS

Your Ryzen system
Average processing rate 17.61 GFLOPS
The higher the number, the faster the system is crunching each WU (where GPUs are involved, when more than 1WU at a time is being processed the APR is no longer a good indicator).
Grant
Darwin NT
ID: 1864685 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864690 - Posted: 30 Apr 2017, 4:32:52 UTC - in response to Message 1864685.  


A quicker method to see how things are going is to check the APR (Average Processing Rate) in the Details, Application details- Show, for each system.

Grant, how often do you think that APR gets calculated? I think it is only about once a day. Or do you think it is a real-time running tally?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864690 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1864691 - Posted: 30 Apr 2017, 4:34:48 UTC - in response to Message 1864690.  

Grant, how often do you think that APR gets calculated? I think it is only about once a day. Or do you think it is a real-time running tally?

I'm pretty sure it gets updated fairly regularly as I've seen it vary over a few hours when the mix of GPU work has changed significantly.
Grant
Darwin NT
ID: 1864691 · Report as offensive
Un4given Project Donor

Send message
Joined: 1 May 04
Posts: 19
Credit: 7,983,035
RAC: 13
United States
Message 1864699 - Posted: 30 Apr 2017, 5:41:52 UTC - in response to Message 1864685.  

My variances may be the result of running more threads than physical cores, and possibly due to OS variances; I'm on W10 on Ryzen and still W7 on Intel system. I'm just going by the BOINC program reported numbers.

RAC is not a good way to judge performance.
It takes a long time to settle down, about 2 months. And that's without server outages.
The time it takes to process a given WU is the best, however there are 2 main types of WU- Arecibo & Green Bank Telescope (GBT). Within Arecibo there are 3 different types of WU- Shorties which get processed very quickly, Mid range which take a medium amount of time to process and VLARs which take a long time to process compared to the other 2.
With GBT, pretty much all the WUs are VLAR, however there can still be considerable differences in processing times for the different WUs, and they all differ from the Arecibo WU crunching times.
So the only way to compare performace accurately is to compare run time for WUs of the same type; it's not possible to compare different types of WUs to get an idea of how a system is performing.

A quicker method to see how things are going is to check the APR (Average Processing Rate) in the Details, Application details- Show, for each system.

Your i7 system
Average processing rate 18.98 GFLOPS

Your Ryzen system
Average processing rate 17.61 GFLOPS
The higher the number, the faster the system is crunching each WU (where GPUs are involved, when more than 1WU at a time is being processed the APR is no longer a good indicator).


Yes, I've seen the different units. The variation on my Intel system between them is almost exactly 1 hour to the minute. Yes I know my processing time on my Ryzen system is still new and my Intel several years old. However, when looking at similar units between the two they are still separated by several minutes.

Now, lets treat this like a math equation. I know SMT and HT are not equally effective, but if you eliminate HT with SMT, you have a 6 core unit running 9 SETI units. That is a 50% increase over the number of the physical cores. My Ryzen system is running 12 SETI units, witch is a 50% increase over the number of physical cores. When it finishes several minutes ahead of my Intel system across both file types, what conclusion do you draw?
ID: 1864699 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Ryzen 1700x Build


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.