Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed

From: David Hildenbrand (Arm)

Date: Mon Jun 15 2026 - 04:13:27 EST


On 6/14/26 15:25, Farhad Alemi wrote:

Hi, thanks for your patch!

For the future, please don't submit new revisions as reply to previous submissions.

> Creating a child cpuset where cpuset.mems is never set leads to a div/0
> when a VMA mempolicy with MPOL_F_RELATIVE_NODES rebinds in response to a
> CPU hotplug event.
>
> Reproduction steps:
> 1) Create a cgroup w/ cpuset controls (do not set cpuset.mems)
> 2) Move the task into the child cpuset
> 3) Create a VMA mempolicy for that task with MPOL_F_RELATIVE_NODES
> 4) unplug and hotplug a cpu
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 1 > /sys/devices/system/cpu/cpu1/online
> 5) mempolicy rebind does a div/0 in mpol_relative_nodemask on the
> call to __nodes_fold()
>
> The cpuset code passes (cs->mems_allowed) which is not guaranteed to have
> nodes to the rebind routine. Use cs->effective_mems instead, which is
> guaranteed to have a non-empty nodemask.

Probably worth mentioning here that this makes the linked reproducer happy.

>
> Link: https://lore.kernel.org/linux-mm/CA+0ovCgxbZkXa+OU8w3s84R3KNPNxxRfmsNR-udh+afQBbGNmw@xxxxxxxxxxxxxx/

This should be a

Closes:
https://lore.kernel.org/linux-mm/CA+0ovCgxbZkXa+OU8w3s84R3KNPNxxRfmsNR-udh+afQBbGNmw@xxxxxxxxxxxxxx/

> Link: https://lore.kernel.org/all/CA+0ovCiEz6SP_sn3kN4Tb+_oC=eHMXy_Ffj=usV3wREdQrUtww@xxxxxxxxxxxxxx/
> Fixes: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}")
> Suggested-by: Gregory Price <gourry@xxxxxxxxxx>
> Suggested-by: Waiman Long <longman@xxxxxxxxxx>
> Signed-off-by: Farhad Alemi <farhad.alemi@xxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
> v2: rebind to cs->effective_mems instead of newmems (Waiman Long);
> condense the changelog.
>
> kernel/cgroup/cpuset.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2649,7 +2649,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
>
> migrate = is_memory_migrate(cs);
>
> - mpol_rebind_mm(mm, &cs->mems_allowed);
> + mpol_rebind_mm(mm, &cs->effective_mems);

God this is confusing.

So, we obtain newmems from guarantee_online_mems(), which guarantees that
newmems is non-empty.

In cpuset_change_task_nodemask(), we set tsk->mems_allowed to newmems, and call
mpol_rebind_task(tsk, newmems).

So at least tsk->mems_allowed should be non-empty.

Then we call mpol_rebind_mm(mm, &cs->mems_allowed);


Naturally I wonder: Why are we not using "task->mems_allowed" (maybe cs vs. tsk
was the original bug?), which is effectively just newmems?

guarantee_online_mems() computes newmems as "cs->effective_mems &
node_states[N_MEMORY]", but walks up to the parent if it would be empty.

--
Cheers,

David