Re: [RFC PATCH v4 4/7] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM

From: Aneesh Kumar K V
Date: Mon Jun 06 2022 - 09:01:33 EST


On 6/6/22 5:39 PM, Bharata B Rao wrote:
On 6/6/2022 5:24 PM, Aneesh Kumar K.V wrote:
Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes:

Can you try this change?

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 7a11c387fbbc..905609260dda 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -94,6 +94,17 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
goto err_reg_mgid;
data->mgid = rc;
+ /*
+ * This get called before the node is brought online. That
+ * is because depending on the value of mhp_default_online_type
+ * the kernel will online the memory along with hotplug
+ * operation. Add the new memory tier before we try to bring
+ * memory blocks online. Otherwise new node will get added to
+ * the default memory tier via hotplug callbacks.
+ */
+#ifdef CONFIG_TIERED_MEMORY
+ node_set_memory_tier(numa_node, MEMORY_TIER_PMEM);
+#endif
for (i = 0; i < dev_dax->nr_range; i++) {
struct resource *res;
struct range range;
@@ -148,9 +159,6 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
dev_set_drvdata(dev, data);
-#ifdef CONFIG_TIERED_MEMORY
- node_set_memory_tier(numa_node, MEMORY_TIER_PMEM);
-#endif
return 0;
err_request_mem:

Yes, this fixes the issue for me. Thanks.


I might put the below change instead of the above. In the end I guess it is better to add a NUMA node to memory tier after the node is brought online than before even though with the current code it shouldn't matter much.

modified drivers/dax/kmem.c
@@ -147,9 +147,15 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
}

dev_set_drvdata(dev, data);
-
+ /*
+ * node_reset_memory_tier is used here to ensure we force
+ * update the NUMA node memory tier. Depending on the value
+ * of mhp_default_online_type the kernel will online the memory
+ * blocks along with hotplug operation above. This can result in dax
+ * kmem memory NUMA node getting added to default memory tier.
+ */
#ifdef CONFIG_TIERED_MEMORY
- node_set_memory_tier(numa_node, MEMORY_TIER_PMEM);
+ node_reset_memory_tier(numa_node, MEMORY_TIER_PMEM);
#endif
return 0;