Re: [PATCH v5 3/3] mm/mempolicy: Support memory hotplug in weighted interleave

From: Oscar Salvador
Date: Wed Apr 02 2025 - 05:51:37 EST


On Wed, Apr 02, 2025 at 10:49:04AM +0900, Rakie Kim wrote:
> The weighted interleave policy distributes page allocations across multiple
> NUMA nodes based on their performance weight, thereby improving memory
> bandwidth utilization. The weight values for each node are configured
> through sysfs.
>
> Previously, sysfs entries for configuring weighted interleave were created
> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> might not have memory. However, not all nodes in N_POSSIBLE are usable at
> runtime, as some may remain memoryless or offline.
> This led to sysfs entries being created for unusable nodes, causing
> potential misconfiguration issues.
>
> To address this issue, this patch modifies the sysfs creation logic to:
> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> the creation of sysfs entries for nodes that cannot be used.
> 2) Support memory hotplug by dynamically adding and removing sysfs entries
> based on whether a node transitions into or out of the N_MEMORY state.
>
> Additionally, the patch ensures that sysfs attributes are properly managed
> when nodes go offline, preventing stale or redundant entries from persisting
> in the system.
>
> By making these changes, the weighted interleave policy now manages its
> sysfs entries more efficiently, ensuring that only relevant nodes are
> considered for interleaving, and dynamically adapting to memory hotplug
> events.
>
> Signed-off-by: Rakie Kim <rakie.kim@xxxxxx>
> Signed-off-by: Honggyu Kim <honggyu.kim@xxxxxx>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@xxxxxx>

Sorry to jump at v5, and maybe this was already discussed but:

> +static int wi_node_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + int err;
> + struct memory_notify *arg = data;
> + int nid = arg->status_change_nid;
> +
> + if (nid < 0)
> + goto notifier_end;
> +
> + switch(action) {
> + case MEM_ONLINE:
> + err = sysfs_wi_node_add(nid);
> + if (err) {
> + pr_err("failed to add sysfs [node%d]\n", nid);
> + return NOTIFY_BAD;
> + }
> + break;
> + case MEM_OFFLINE:
> + if (!node_state(nid, N_MEMORY))
> + sysfs_wi_node_release(nid);

This check is not needed.

If status_change_nid is different than NUMA_NO_NODE, it means that the memory
state of the numa node was effectively changed.
So doing:

case MEM_OFFLINE:
sysfs_wi_node_release(nid)

is enough.

--
Oscar Salvador
SUSE Labs