Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed
From: David Hildenbrand (Arm)
Date: Mon Jun 15 2026 - 04:13:27 EST
On 6/14/26 15:25, Farhad Alemi wrote:
Hi, thanks for your patch!
For the future, please don't submit new revisions as reply to previous submissions.
> Creating a child cpuset where cpuset.mems is never set leads to a div/0
> when a VMA mempolicy with MPOL_F_RELATIVE_NODES rebinds in response to a
> CPU hotplug event.
>
> Reproduction steps:
> 1) Create a cgroup w/ cpuset controls (do not set cpuset.mems)
> 2) Move the task into the child cpuset
> 3) Create a VMA mempolicy for that task with MPOL_F_RELATIVE_NODES
> 4) unplug and hotplug a cpu
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 1 > /sys/devices/system/cpu/cpu1/online
> 5) mempolicy rebind does a div/0 in mpol_relative_nodemask on the
> call to __nodes_fold()
>
> The cpuset code passes (cs->mems_allowed) which is not guaranteed to have
> nodes to the rebind routine. Use cs->effective_mems instead, which is
> guaranteed to have a non-empty nodemask.
Probably worth mentioning here that this makes the linked reproducer happy.
>
> Link: https://lore.kernel.org/linux-mm/CA+0ovCgxbZkXa+OU8w3s84R3KNPNxxRfmsNR-udh+afQBbGNmw@xxxxxxxxxxxxxx/
This should be a
Closes:
https://lore.kernel.org/linux-mm/CA+0ovCgxbZkXa+OU8w3s84R3KNPNxxRfmsNR-udh+afQBbGNmw@xxxxxxxxxxxxxx/
> Link: https://lore.kernel.org/all/CA+0ovCiEz6SP_sn3kN4Tb+_oC=eHMXy_Ffj=usV3wREdQrUtww@xxxxxxxxxxxxxx/
> Fixes: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}")
> Suggested-by: Gregory Price <gourry@xxxxxxxxxx>
> Suggested-by: Waiman Long <longman@xxxxxxxxxx>
> Signed-off-by: Farhad Alemi <farhad.alemi@xxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
> v2: rebind to cs->effective_mems instead of newmems (Waiman Long);
> condense the changelog.
>
> kernel/cgroup/cpuset.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2649,7 +2649,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
>
> migrate = is_memory_migrate(cs);
>
> - mpol_rebind_mm(mm, &cs->mems_allowed);
> + mpol_rebind_mm(mm, &cs->effective_mems);
God this is confusing.
So, we obtain newmems from guarantee_online_mems(), which guarantees that
newmems is non-empty.
In cpuset_change_task_nodemask(), we set tsk->mems_allowed to newmems, and call
mpol_rebind_task(tsk, newmems).
So at least tsk->mems_allowed should be non-empty.
Then we call mpol_rebind_mm(mm, &cs->mems_allowed);
Naturally I wonder: Why are we not using "task->mems_allowed" (maybe cs vs. tsk
was the original bug?), which is effectively just newmems?
guarantee_online_mems() computes newmems as "cs->effective_mems &
node_states[N_MEMORY]", but walks up to the parent if it would be empty.
--
Cheers,
David