Optimisations for NUMA

Author	Message
DextersLab2013 Send message Joined: 25 May 16 Posts: 2 Credit: 177,129 RAC: 0	Message 1790487 - Posted: 26 May 2016, 8:02:03 UTC hello! I have been doing some benchmarking on an old Sun server, i was about to strip it for parts but thought i would try SAH on it before i did and it made me think if there are any optimisations in BOINC that can improve performance on machines with NUMA architecture. It's an 8 CPU machine, using dual core Opterons (total 16 cores) with each CPU having 8Gb RAM all interlinked with Hypertransport. Is there any mechanism in BOINC to ensure working data for each thread is kept local to the CPU it's running on? ID: 1790487 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 1790492 - Posted: 26 May 2016, 8:21:43 UTC - in response to Message 1790487. I too have a SUN workstation with a 2 core Opteron 1210 at 1.8 GHZ. It has been working 24/7 since January 2008 with no problem. I recently installed an AMD/ATI HD 7770 graphic board in it and it is crunching both SETI@home and Einstein@home GPU tasks. OS is SuSE Leap 42.1. SuSE recently upgraded its kernel and I had to recompile the kernel modules for Virtual Box. Tullio ID: 1790492 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1790505 - Posted: 26 May 2016, 9:25:06 UTC Last modified: 26 May 2016, 9:28:22 UTC NUMA is just built into the OS kernel, and basically all it does is allows the RAM pool for the second CPU socket to be accessed by the first CPU and vice versa. NUMA stands for Non-Uniform Memory Access. You can read more about it here. For Windows, it was introduced in XP SP2 (I know that for sure, because I had a game that would not install until the installer detected NUMA support, and SP1 did not satisfy that, but SP2 did) and has been present ever since. I'm sure Linux has had it for a long time. There's really nothing you can do to limit the memory for each core to stay on the respective bank of RAM for that core and physical CPU. The OS kernel's ability to do memory and resource management is going to be where the biggest difference will be, and along those lines, I don't know of any tools for an OS that will restrict memory for a specific process to stay in a certain location, aside from hard-coding memory addresses into the source code for a program. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1790505 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1790564 - Posted: 26 May 2016, 14:36:25 UTC I had a server with dual Xeon E5645 CPUs. I could enable or disable NUMA in the BIOS & doing so had no apparent effect on BOINC or the project applications I was running. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1790564 ·

DextersLab2013 Send message Joined: 25 May 16 Posts: 2 Credit: 177,129 RAC: 0	Message 1790598 - Posted: 26 May 2016, 16:21:50 UTC - in response to Message 1790564. on a system with 2 CPUs it probably makes little difference, presumably some systems can operate with two FSBs? on the system i have here (it's a Sun Fire X4600) if say, the worst case the data is stored on CPU board 0 and a process on CPU board 7 wants access to it, it takes three hops through CPUs 6, 1 to 0 or CPUs 5 & 2 to 0 via the hypertransport links to get to it and of course three for the reply to come back. just thought it was an interesting thing to think about ID: 1790598 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1790606 - Posted: 26 May 2016, 16:40:44 UTC - in response to Message 1790505. I don't know of any tools for an OS that will restrict memory for a specific process to stay in a certain location, aside from hard-coding memory addresses into the source code for a program. The whole idea besides NUMA optimization is to keep data local to CPU as long as possible. That is, to allocate always on the same node (OS runtime lib does if NUMA-aware) and to keep process on initial CPU as long as possible (again, NUMA-aware OS does). But one can deliberately help OS in this aspect by setting CPU affinity and restricting OS from moving process to other CPU even when it wants to do that. ID: 1790606 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.