My 2990WX

Message boards : Number crunching : My 2990WX
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1970763 - Posted: 17 Dec 2018, 6:09:57 UTC - in response to Message 1970758.  

With memory interweaving off:
Auto Ghz.  Mem "try it" 3200/CL14
tom@EJS-GIFT:~/Downloads/ml_test/Linux$ sudo ./mlc
[sudo] password for tom: 
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
		Numa node
Numa node	     0	     1	     2	     3	
       0	  64.3	-	 103.2	-	
       1	 104.5	-	 102.4	-	
       2	 103.3	-	  64.5	-	
       3	 102.9	-	 106.0	-	

With memory interweating auto:
tom@EJS-GIFT:~/Downloads/ml_test/Linux$ sudo ./mlc
[sudo] password for tom: 
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
		Numa node
Numa node	     0	     1	     2	     3	
       0	  64.0	-	 103.3	-	
       1	 106.0	-	 103.1	-	
       2	 103.4	-	  64.1	-	
       3	 103.1	-	 105.9	-	

with socket
tom@EJS-GIFT:~/Downloads/ml_test/Linux$ sudo ./mlc
[sudo] password for tom: 
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
		Numa node
Numa node	     0	     1	     2	     3	
       0	  63.1	-	 101.9	-	
       1	 106.0	-	 102.9	-	
       2	 103.1	-	  64.1	-	
       3	 103.1	-	 106.0	-	

with Die
tom@EJS-GIFT:~/Downloads/ml_test/Linux$ sudo ./mlc
[sudo] password for tom: 
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
		Numa node
Numa node	     0	     1	     2	     3	
       0	  63.9	-	 102.9	-	
       1	 105.7	-	 102.9	-	
       2	 103.0	-	  64.0	-	
       3	 102.8	-	 105.7	-	

With channel
Intel(R) Memory Latency Checker - v3.6
Measuring idle latencies (in ns)...
		Numa node
Numa node	     0	     1	     2	     3	
       0	  64.0	-	 103.3	-	
       1	 105.9	-	 102.9	-	
       2	 103.2	-	  62.7	-	
       3	 102.8	-	 105.9	-	



I don't believe I have any bios setting that says "uma" on it. I suppose it might be you can't get UMA because of the bi-memory model.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1970763 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970808 - Posted: 17 Dec 2018, 18:17:34 UTC

I think with all your testing variables which match mine in my BIOS, you are correct. With your quad die architecture, only NUMA model is available. When I try all the memory interleaving options in my BIOS, I get a change in each parameter test
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970808 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1970813 - Posted: 17 Dec 2018, 19:08:08 UTC - in response to Message 1970808.  

I think with all your testing variables which match mine in my BIOS, you are correct. With your quad die architecture, only NUMA model is available. When I try all the memory interleaving options in my BIOS, I get a change in each parameter test


Well. As someone someplace has said "It has been a learning experience." :)

To revisit a question previously raised on the best choice for highest speed cpu crunching under Seti I believe that we can say that the 2920x/2950x are the top consumer choices by far. (Best bang for the buck).

The 2920x/2950x provide their cpus with full speed access to memory which is the issue with cpu crunching with Seti.

Right now it looks as if I can get 26 cores of cpu processing and however many gpus I have and that is the most production I can manage. Without -nobs.

With a 2950x I should be able to run a full 90% cpu crunching core plus x number of gpus running -nobs. That looks to be 27-28 cpu cores. Yes, I can see that it is only 1-2 more cores than I am now running. Might be able to run 100% which would give 32 cores plus the gpu threads.

But if someone is upgrading to this level instead of down grading it would save in the neighborhood of $800 which could be devoted to another high end Nvidia gpu.

If the e5-2670v1 cpus from data center upgrades had not hit the market with those incredible $100 prices I don't think we would have ever got up into this high a core count till AMD introduced the Ryzen 7/Threadripper products.

I think I will still try retests at 30 cores and 40 cores because I want to see if I can beat my last, best production with the slower ram. (67 minutes at 40 cores).

With the current mix of slower tasks I am running about 52 minutes which is up from the near 47 minutes I was running.

I was trying out a cpu multiplier at 3.5GHz earlier and it was "unstable" (it ran but the cpu frequency was not staying near 3.5). I still have the ambition to return to 3.7GHz that was stable at one time. :)

Tom
A proud member of the OFA (Old Farts Association).
ID: 1970813 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1970821 - Posted: 17 Dec 2018, 20:08:23 UTC - in response to Message 1970813.  
Last modified: 17 Dec 2018, 20:08:36 UTC

Thanks for a good summary.


If you're into tweaking a Linux kernel, then you could try these two kernel configs:

  │ CONFIG_NUMA:
  │
  │ Enable NUMA (Non Uniform Memory Access) support. 
  │
  │ The kernel will try to allocate memory used by a CPU on the
  │ local memory controller of the CPU and add some more
  │ NUMA awareness to the kernel.
  │
  │ For 64-bit this is recommended if the system is Intel Core i7
  │ (or later), AMD Opteron, or EM64T NUMA.

  │ Symbol: NUMA [=n]
  │ Type  : bool
  │ Prompt: Numa Memory Allocation and Scheduler Support
  │   Location:
  │     -> Processor type and features
  │   Defined at arch/x86/Kconfig:1515
  │   Depends on: SMP [=y] && (X86_64 [=y] || X86_32 [=n] && HIGHMEM64G [=n] && X86_BIGSMP [=n])


and, I don't know if s@h (or the kernel opportunistically for an app) would take advantage of this:

  │ CONFIG_KSM:
  │
  │ Enable Kernel Samepage Merging: KSM periodically scans those areas
  │ of an application's address space that an app has advised may be
  │ mergeable.  When it finds pages of identical content, it replaces
  │ the many instances by a single page with that content, so
  │ saving memory until one or another app needs to modify the content. 
  │ Recommended for use with KVM, or with other duplicative applications.
  │ See Documentation/vm/ksm.rst for more information: KSM is inactive
  │ until a program has madvised that an area is MADV_MERGEABLE, and
  │ root has set /sys/kernel/mm/ksm/run to 1 (if CONFIG_SYSFS is set).
  │
  │ Symbol: KSM [=n]
  │ Type  : bool
  │ Prompt: Enable KSM for page merging
  │   Location:
  │     -> Memory Management options
  │   Defined at mm/Kconfig:297
  │   Depends on: MMU [=y]



That would be very cool if that could save you a few repeated MBytes of CPU cache for the same one s@h app running multiple times to then allow more memory bandwidth for the compute data...


Happy fast crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1970821 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1970860 - Posted: 18 Dec 2018, 1:44:39 UTC

Not sure I should post this here... :-P


Building a Compact Monster PC: Threadripper Meets Micro-ATX and Custom Liquid Cooling

... we set out to build a powerful and quiet desktop beast that lives in a relatively svelte form factor while still packing ... high-end performance...


Does the "w" in "threadripper-2990wx" mean "water"?...

Strictly for comparison :-)


Enjoy,

Happy cool fast crunchin',!
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1970860 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970861 - Posted: 18 Dec 2018, 1:49:55 UTC - in response to Message 1970860.  

Not sure I should post this here... :-P


Building a Compact Monster PC: Threadripper Meets Micro-ATX and Custom Liquid Cooling

... we set out to build a powerful and quiet desktop beast that lives in a relatively svelte form factor while still packing ... high-end performance...


Does the "w" in "threadripper-2990wx" mean "water"?...

Strictly for comparison :-)


Enjoy,

Happy cool fast crunchin',!
Martin

No the WX suffix for TR means "workstation" or professional high core count cpus.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970861 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1970873 - Posted: 18 Dec 2018, 4:58:43 UTC - in response to Message 1970860.  

Not sure I should post this here... :-P

--edit------------
Does the "w" in "threadripper-2990wx" mean "water"?...


Some article/review said it meant "way extreme" :)
A proud member of the OFA (Old Farts Association).
ID: 1970873 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1970874 - Posted: 18 Dec 2018, 5:06:07 UTC - in response to Message 1970821.  

Thanks for a good summary.


If you're into tweaking a Linux kernel, then you could try these two kernel configs:


I would say "it above my pay grade" except, wait. I'm not getting paid for this so I can't even claim that ;/

I think if it was possible to shoehorn the data and working space into the cpu cache we would see amazing performance. But this is so far outside of what I have studied in Seti that I can't even hazard a guess except the negative one.

I suppose I should start saving for an 64 core, high clock EPYC :(

Tom
A proud member of the OFA (Old Farts Association).
ID: 1970874 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970876 - Posted: 18 Dec 2018, 5:22:19 UTC

I'll post here as well in the GPUUG forum. Anyone using the latest LTS release 4.19 kernel? I am interested in getting the correct temperatures to be reported with my 2920X cpu. Seems you need at least 4.18 kernels to get the updated k10temp driver which correctly reports the Threadripper 2 cpus because the fixed k10temp driver isn't going to be backported to the stock 4.15 kernel in Ubuntu 18.04?

Anyone tried it yet with the Nvidia 410 series drivers?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970876 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1970882 - Posted: 18 Dec 2018, 6:02:39 UTC
Last modified: 18 Dec 2018, 6:04:26 UTC

I have a C state listing for my cpu parameters.

On the Intel cpu when I turn down/off the C states they run faster. Does anyone have any experimence one way or the other on the AMD cpus we are running?

Tom

Yes, I have re-used a 3.7GHz OC button profile I have and so far, so good.
A proud member of the OFA (Old Farts Association).
ID: 1970882 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1970883 - Posted: 18 Dec 2018, 6:08:12 UTC - in response to Message 1970876.  

I'll post here as well in the GPUUG forum. Anyone using the latest LTS release 4.19 kernel? I am interested in getting the correct temperatures to be reported with my 2920X cpu. Seems you need at least 4.18 kernels to get the updated k10temp driver which correctly reports the Threadripper 2 cpus because the fixed k10temp driver isn't going to be backported to the stock 4.15 kernel in Ubuntu 18.04?

Anyone tried it yet with the Nvidia 410 series drivers?


Would another good place to ask be over in the Windows to Linux thread? I am not sure how many Linux people are reading here but certainly there should be even more over there. I know there are issues with cross posting, just hoping it won't moderate us here and on Stephens thread.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1970883 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970888 - Posted: 18 Dec 2018, 6:47:50 UTC - in response to Message 1970882.  

I have a C state listing for my cpu parameters.

On the Intel cpu when I turn down/off the C states they run faster. Does anyone have any experimence one way or the other on the AMD cpus we are running?

Tom

Yes, I have re-used a 3.7GHz OC button profile I have and so far, so good.

If you want to keep you cpu clocks up at full load all the time, don't use C-states as that gives the OS the ability to downclock opportunistically when it thinks the cpu thread is under light loading. So when a task finishes up and before the next occupies it, the OS can knock the clocks down. But the OS can't transition back to full clocks that fast and has some rather large latencies to get back to speed. Better to keep the C-State always at 0 for compute loads. So for many seconds, the task starts computing at reduced clocks before it moves back to high gear.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970888 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970889 - Posted: 18 Dec 2018, 6:52:34 UTC - in response to Message 1970883.  

I'll post here as well in the GPUUG forum. Anyone using the latest LTS release 4.19 kernel? I am interested in getting the correct temperatures to be reported with my 2920X cpu. Seems you need at least 4.18 kernels to get the updated k10temp driver which correctly reports the Threadripper 2 cpus because the fixed k10temp driver isn't going to be backported to the stock 4.15 kernel in Ubuntu 18.04?

Anyone tried it yet with the Nvidia 410 series drivers?


Would another good place to ask be over in the Windows to Linux thread? I am not sure how many Linux people are reading here but certainly there should be even more over there. I know there are issues with cross posting, just hoping it won't moderate us here and on Stephens thread.

Tom

I didn't want to spam multiple threads so thought this one was targeted at high performance TR. I already looked through the Top 100 hosts lists and didn't see anyone running more than the 4.15 kernels. But there could be a lightly used TR way down the lists that may be on the newer kernels. I didn't page through a hundred pages trying to find the needle in the haystack.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970889 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1970905 - Posted: 18 Dec 2018, 9:17:36 UTC - in response to Message 1970889.  

But there could be a lightly used TR way down the lists that may be on the newer kernels.
You could try the data export files for off site stats. That might be the easiest way to search.
ID: 1970905 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1970931 - Posted: 18 Dec 2018, 20:12:27 UTC - in response to Message 1970888.  

I have a C state listing for my cpu parameters.

On the Intel cpu when I turn down/off the C states they run faster. Does anyone have any experimence one way or the other on the AMD cpus we are running?

Tom

Yes, I have re-used a 3.7GHz OC button profile I have and so far, so good.

If you want to keep you cpu clocks up at full load all the time, don't use C-states as that gives the OS the ability to downclock opportunistically when it thinks the cpu thread is under light loading. So when a task finishes up and before the next occupies it, the OS can knock the clocks down. But the OS can't transition back to full clocks that fast and has some rather large latencies to get back to speed. Better to keep the C-State always at 0 for compute loads. So for many seconds, the task starts computing at reduced clocks before it moves back to high gear.


Thank you. I have disabled the C state parameter for now.

I have managed to get 4 gpu's racked onto it but the cpu was running hot and became unstable in its cpu frequency at 3.7GHz. So its back down at 3.35GHz.

I guess I will go back and get the case(s) you recommended onto my wish list. It is pretty clear that I am going to need the original recommendation for cooling that you made. And to do that, I am going to need a bigger case.

Either that or downgrade to a 2950x and buy another faster gpu after I sell the 2990WX :)

Tom
A proud member of the OFA (Old Farts Association).
ID: 1970931 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1970937 - Posted: 18 Dec 2018, 20:52:40 UTC - in response to Message 1970197.  

This article puts some numbers on the reason for the massive ballooning in processing time when I run 56-60 cores of Seti processing.

https://www.pcworld.com/article/3298859/components-processors/how-memory-bandwidth-is-killing-amds-32-core-threadripper-performance.html

The available memory bandwidth goes from about 5 GB for 8/16 cores to around 2 GB for 32 cores.

This makes it sound like the "sweet spot" is going to be 26 threads with SMT running. ...

That's a beautiful article, thanks for that!

Those charts very clearly show the effect of hitting the maximum bandwidth for the RAM, and the downward tail-off shows the worsening effect of 'poisoning' the cache with too many cache misses due to working through too much different data too quickly.


If you have enough physical cores, you should see better results by simply turning the SMT feature off!

Otherwise... Do the SETI tasks become staggered to balance the RAM IO capacity? Or does everything just badly pile up in a log-jam just like a congested motorway?


Happy cool fast crunchin'!
Martin


That is a very good question. If I turn off SMT but don't change the number of threads, will it get faster, slower or stay the same?

I will bet it gets slower because more physical cores that don't have direct memory access would have to be engaged. Even though the SMT penalty goes away.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1970937 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970952 - Posted: 18 Dec 2018, 21:42:35 UTC - in response to Message 1970937.  


That is a very good question. If I turn off SMT but don't change the number of threads, will it get faster, slower or stay the same?

I will bet it gets slower because more physical cores that don't have direct memory access would have to be engaged. Even though the SMT penalty goes away.

Tom

No, that isn't the way it works. The physical cores still are the same . . . how could they not. You still have four dies with 8 cores and the dies with direct memory access are still connected to memory in exactly the same way.

You just will only have one instruction pipeline going through each core instead of two. That means the task occupying the core has exclusive access to the core's registers without having to share timeslices with another thread. By turning off SMT, the task should speed up because it doesn't have to share resources.

Try it and see. I would be very surprised if it stayed the same or got worse.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970952 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1971019 - Posted: 19 Dec 2018, 6:14:48 UTC - in response to Message 1970931.  


Thank you. I have disabled the C state parameter for now.


I was off reading and came back by and system had crashed. Turned on the C state thingy (it is on by default). Will try it again later.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1971019 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971026 - Posted: 19 Dec 2018, 7:35:46 UTC - in response to Message 1971019.  

You know with all the things you have tried on your host, I come away with the feeling that the motherboard just isn't up to the task. MSI of some flavor, correct? I think you would likely have much better luck with another brand of board a little higher up the price point.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971026 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1971054 - Posted: 19 Dec 2018, 16:29:06 UTC - in response to Message 1971026.  

You know with all the things you have tried on your host, I come away with the feeling that the motherboard just isn't up to the task. MSI of some flavor, correct? I think you would likely have much better luck with another brand of board a little higher up the price point.


I think I agree but the jump is $200 higher cost for the apparent best choice.

And I have good evidence that I may be able to run at least 3.7GHz (but not 4.0) if I can get the cpu temperature to stay down out of the "outer limits" so it seems like a new case and better cooling is the first step.

Then if my seasonal job lasts longer enough I probably will go for that top end MB.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1971054 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : My 2990WX


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.