Optimisations for NUMA

Message boards : Number crunching : Optimisations for NUMA
Message board moderation

To post messages, you must log in.

AuthorMessage
DextersLab2013

Send message
Joined: 25 May 16
Posts: 2
Credit: 177,129
RAC: 0
United Kingdom
Message 1790487 - Posted: 26 May 2016, 8:02:03 UTC

hello!

I have been doing some benchmarking on an old Sun server, i was about to strip it for parts but thought i would try SAH on it before i did and it made me think if there are any optimisations in BOINC that can improve performance on machines with NUMA architecture.

It's an 8 CPU machine, using dual core Opterons (total 16 cores) with each CPU having 8Gb RAM all interlinked with Hypertransport.

Is there any mechanism in BOINC to ensure working data for each thread is kept local to the CPU it's running on?
ID: 1790487 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1790492 - Posted: 26 May 2016, 8:21:43 UTC - in response to Message 1790487.  

I too have a SUN workstation with a 2 core Opteron 1210 at 1.8 GHZ. It has been working 24/7 since January 2008 with no problem. I recently installed an AMD/ATI HD 7770 graphic board in it and it is crunching both SETI@home and Einstein@home GPU tasks. OS is SuSE Leap 42.1. SuSE recently upgraded its kernel and I had to recompile the kernel modules for Virtual Box.
Tullio
ID: 1790492 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1790505 - Posted: 26 May 2016, 9:25:06 UTC
Last modified: 26 May 2016, 9:28:22 UTC

NUMA is just built into the OS kernel, and basically all it does is allows the RAM pool for the second CPU socket to be accessed by the first CPU and vice versa.

NUMA stands for Non-Uniform Memory Access. You can read more about it here. For Windows, it was introduced in XP SP2 (I know that for sure, because I had a game that would not install until the installer detected NUMA support, and SP1 did not satisfy that, but SP2 did) and has been present ever since. I'm sure Linux has had it for a long time.

There's really nothing you can do to limit the memory for each core to stay on the respective bank of RAM for that core and physical CPU. The OS kernel's ability to do memory and resource management is going to be where the biggest difference will be, and along those lines, I don't know of any tools for an OS that will restrict memory for a specific process to stay in a certain location, aside from hard-coding memory addresses into the source code for a program.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1790505 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1790564 - Posted: 26 May 2016, 14:36:25 UTC

I had a server with dual Xeon E5645 CPUs. I could enable or disable NUMA in the BIOS & doing so had no apparent effect on BOINC or the project applications I was running.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1790564 · Report as offensive
DextersLab2013

Send message
Joined: 25 May 16
Posts: 2
Credit: 177,129
RAC: 0
United Kingdom
Message 1790598 - Posted: 26 May 2016, 16:21:50 UTC - in response to Message 1790564.  

on a system with 2 CPUs it probably makes little difference, presumably some systems can operate with two FSBs?

on the system i have here (it's a Sun Fire X4600) if say, the worst case the data is stored on CPU board 0 and a process on CPU board 7 wants access to it, it takes three hops through CPUs 6, 1 to 0 or CPUs 5 & 2 to 0 via the hypertransport links to get to it and of course three for the reply to come back.

just thought it was an interesting thing to think about
ID: 1790598 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1790606 - Posted: 26 May 2016, 16:40:44 UTC - in response to Message 1790505.  

I don't know of any tools for an OS that will restrict memory for a specific process to stay in a certain location, aside from hard-coding memory addresses into the source code for a program.

The whole idea besides NUMA optimization is to keep data local to CPU as long as possible. That is, to allocate always on the same node (OS runtime lib does if NUMA-aware) and to keep process on initial CPU as long as possible (again, NUMA-aware OS does). But one can deliberately help OS in this aspect by setting CPU affinity and restricting OS from moving process to other CPU even when it wants to do that.
ID: 1790606 · Report as offensive

Message boards : Number crunching : Optimisations for NUMA


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.