Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed

From: David Hildenbrand (Arm)

Date: Mon Jun 22 2026 - 03:18:01 EST


On 6/21/26 05:24, Waiman Long wrote:
> On 6/18/26 4:41 AM, David Hildenbrand (Arm) wrote:
>> On 6/16/26 17:23, Waiman Long wrote:
>>> Yes, taking newmems is a reasonable choice and there are pros and cons with each
>>> options. My focus is more on not changing how v1 cpuset behaves as it is well
>>> defined in the v1 cpusets.rst file:
>>>
>>>      Requests by a task, using the sched_setaffinity(2) system call to
>>>      include CPUs in its CPU affinity mask, and using the mbind(2) and
>>>      set_mempolicy(2) system calls to include Memory Nodes in its memory
>>>      policy, are both filtered through that task's cpuset, filtering out any
>>>      CPUs or Memory Nodes not in that cpuset.  The scheduler will not
>>>      schedule a task on a CPU that is not allowed in its cpus_allowed
>>>      vector, and the kernel page allocator will not allocate a page on a
>>>      node that is not allowed in the requesting task's mems_allowed vector.
>>>
>>> v2, OTOH, is more vague as to what setting cpuset.mems will mean and we
>>> generally follow what v1 is doing, but we have more leeway of what we can do.
>>>
>>> Using newmems will make the above text not totally correct. At least the offline
>>> memory nodes will be filtered out which will not be utilized by the task when
>>> the offline node becomes online. That is why I am saying that we will have to
>>> correct the documentation if we want to make this change.
>> So IIUC:
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 1335e437098e..cdfc615f35a5 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -2645,7 +2645,13 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
>>                    migrate = is_memory_migrate(cs);
>>   -               mpol_rebind_mm(mm, &cs->mems_allowed);
>> +               /*
>> +                * For v1 we can have empty effective_mems, but we cannot
>> +                * attach any tasks (see cpuset_can_attach_check()). For v2,
>> +                * it's guaranteed to not be empty.
>> +                */
>> +               VM_WARN_ON_ONCE(nodes_empty(cs->effective_mems));
>> +               mpol_rebind_mm(mm, &cs->effective_mems);
>>                  if (migrate)
>>                          cpuset_migrate_mm(mm, &cs->old_mems_allowed, &newmems);
>>                  else
>
> That is true, but I don't think we need a VM_WARN_ON_ONCE() here.

I'd prefer if we catch such stuff in the future more easily than running into
late divide-by-zero. Maybe we should check in mpol_rebind_mm() instead.

--
Cheers,

David