Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed

From: Waiman Long

Date: Sat Jun 20 2026 - 23:24:45 EST


On 6/18/26 4:41 AM, David Hildenbrand (Arm) wrote:
On 6/16/26 17:23, Waiman Long wrote:
On 6/16/26 2:59 AM, David Hildenbrand (Arm) wrote:
On 6/16/26 05:43, Waiman Long wrote:
BTW, I still prefer the v2 patch. If it is decided we should use the
guarantee_online_mems() value instead, it will have to be a separate patch with
changes in the relevant documentation like Documentation/admin-guide/cgroup-v1/
cpuset.rst.
newmems is "obviously" correct, so I really don't see why we should add
something that needs half a page of text to explain why it is fine -- if newmems
just does the trick?

Please enlighten me.
Yes, taking newmems is a reasonable choice and there are pros and cons with each
options. My focus is more on not changing how v1 cpuset behaves as it is well
defined in the v1 cpusets.rst file:

    Requests by a task, using the sched_setaffinity(2) system call to
    include CPUs in its CPU affinity mask, and using the mbind(2) and
    set_mempolicy(2) system calls to include Memory Nodes in its memory
    policy, are both filtered through that task's cpuset, filtering out any
    CPUs or Memory Nodes not in that cpuset.  The scheduler will not
    schedule a task on a CPU that is not allowed in its cpus_allowed
    vector, and the kernel page allocator will not allocate a page on a
    node that is not allowed in the requesting task's mems_allowed vector.

v2, OTOH, is more vague as to what setting cpuset.mems will mean and we
generally follow what v1 is doing, but we have more leeway of what we can do.

Using newmems will make the above text not totally correct. At least the offline
memory nodes will be filtered out which will not be utilized by the task when
the offline node becomes online. That is why I am saying that we will have to
correct the documentation if we want to make this change.
So IIUC:

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 1335e437098e..cdfc615f35a5 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2645,7 +2645,13 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
migrate = is_memory_migrate(cs);
- mpol_rebind_mm(mm, &cs->mems_allowed);
+ /*
+ * For v1 we can have empty effective_mems, but we cannot
+ * attach any tasks (see cpuset_can_attach_check()). For v2,
+ * it's guaranteed to not be empty.
+ */
+ VM_WARN_ON_ONCE(nodes_empty(cs->effective_mems));
+ mpol_rebind_mm(mm, &cs->effective_mems);
if (migrate)
cpuset_migrate_mm(mm, &cs->old_mems_allowed, &newmems);
else

That is true, but I don't think we need a VM_WARN_ON_ONCE() here.

Cheers,
Longman