On Fri, Mar 22, 2019 at 9:45 PM Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:
When running applications on the machine with NVDIMM as NUMA node, theHmm, no, I don't think we should do this. Especially considering
memory allocation may end up on NVDIMM node. This may result in silent
performance degradation and regression due to the difference of hardware
property.
DRAM first should be obeyed to prevent from surprising regression. Any
non-DRAM nodes should be excluded from default allocation. Use nodemask
to control the memory placement. Introduce def_alloc_nodemask which has
DRAM nodes set only. Any non-DRAM allocation should be specified by
NUMA policy explicitly.
In the future we may be able to extract the memory charasteristics from
HMAT or other source to build up the default allocation nodemask.
However, just distinguish DRAM and PMEM (non-DRAM) nodes by SRAT flag
for the time being.
Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
---
arch/x86/mm/numa.c | 1 +
drivers/acpi/numa.c | 8 ++++++++
include/linux/mmzone.h | 3 +++
mm/page_alloc.c | 18 ++++++++++++++++--
4 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index dfb6c4d..d9e0ca4 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -626,6 +626,7 @@ static int __init numa_init(int (*init_func)(void))
nodes_clear(numa_nodes_parsed);
nodes_clear(node_possible_map);
nodes_clear(node_online_map);
+ nodes_clear(def_alloc_nodemask);
memset(&numa_meminfo, 0, sizeof(numa_meminfo));
WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.memory,
MAX_NUMNODES));
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 867f6e3..79dfedf 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -296,6 +296,14 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
goto out_err_bad_srat;
}
+ /*
+ * Non volatile memory is excluded from zonelist by default.
+ * Only regular DRAM nodes are set in default allocation node
+ * mask.
+ */
+ if (!(ma->flags & ACPI_SRAT_MEM_NON_VOLATILE))
+ node_set(node, def_alloc_nodemask);
current generation NVDIMMs are energy backed DRAM there is no
performance difference that should be assumed by the non-volatile
flag.
Why isn't default SLIT distance sufficient for ensuring a DRAM-first
default policy?