[PATCH v5 0/6] Memory Hierarchy: Enable target node lookups for reserved memory

From: Dan Williams
Date: Sun Feb 16 2020 - 15:17:07 EST

Changes since v4 [1]:
- Rename __initdata_numa to __initdata_or_meminfo (Thomas)
- Capitalize NUMA throughout (Ingo)
- Replace explicit memcpy with implicit structure copy to address an 80
column violation, and fixup a function definition line-wrap (Ingo)
- Rename numa_move_memblk() to numa_move_tail_memblk(), and remove the
stale kernel-doc that implied @dst was optional (Thomas)
- Comment that phys_to_target_node() is an optional arch implementation
detail that consumers must gate with "depends on $ARCH"
- Apply Ingo's conditional reviewed-by

[1]: http://lore.kernel.org/r/157966227494.2508551.7206194169374588977.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


Merge notes: I believe this addresses all outstanding comments, barring
additional feedback I will push to libnvdimm-for-next.



Arrange for platform NUMA info to be preserved for determining
'target_node' data. Where a 'target_node' is the node a reserved memory
range will become when it is onlined.

This new infrastructure is expected to be more valuable over time for
Memory Tiers / Hierarchy management as more platforms (via the ACPI HMAT
and EFI Specific Purpose Memory) publish reserved or "soft-reserved"
ranges to Linux. Linux system administrators will expect to be able to
interact with those ranges with a unique NUMA node number when/if that
memory is onlined via the dax_kmem driver [2].

One configuration that currently fails to properly convey the target
node for the resulting memory hotplug operation is persistent memory
defined by the memmap=nn!ss parameter. For example, today if node1 is a
memory only node, and all the memory from node1 is specified to
memmap=nn!ss and subsequently onlined, it will end up being onlined as
node0 memory. As it stands, memory_add_physaddr_to_nid() can only
identify online nodes and since node1 in this example has no online cpus
/ memory the target node is initialized node0.

The fix is to preserve rather than discard the numa_meminfo entries that
are relevant for reserved memory ranges, and to uplevel the node
distance helper for determining the "local" (closest) node relative to
an initiator node.

[2]: https://pmem.io/ndctl/daxctl-reconfigure-device.html


Dan Williams (6):
ACPI: NUMA: Up-level "map to online node" functionality
mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node()
powerpc/papr_scm: Switch to numa_map_to_online_node()
x86/NUMA: Provide a range-to-target_node lookup facility
libnvdimm/e820: Retrieve and populate correct 'target_node' info

arch/powerpc/platforms/pseries/papr_scm.c | 21 ---------
arch/x86/Kconfig | 1
arch/x86/mm/numa.c | 67 +++++++++++++++++++++++------
drivers/acpi/numa/srat.c | 41 ------------------
drivers/nvdimm/e820.c | 18 ++------
include/linux/acpi.h | 23 ++++++++++
include/linux/numa.h | 30 +++++++++++++
mm/Kconfig | 5 ++
mm/mempolicy.c | 26 +++++++++++
9 files changed, 140 insertions(+), 92 deletions(-)

base-commit: bb6d3fb354c5ee8d6bde2d576eb7220ea09862b9