Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed
From: Waiman Long
Date: Tue Jun 16 2026 - 11:30:58 EST
On 6/16/26 2:59 AM, David Hildenbrand (Arm) wrote:
On 6/16/26 05:43, Waiman Long wrote:
On 6/15/26 10:26 PM, Waiman Long wrote:newmems is "obviously" correct, so I really don't see why we should add
On 6/15/26 5:38 AM, Gregory Price wrote:BTW, I still prefer the v2 patch. If it is decided we should use the
All interactions between mempolicy and cpuset are horrible andThat is what I believe why mpol_rebind_mm() a bit differently from the others
confusing. Much like Lorenzo's anon_vma work, I have to keep
notes on how this whole thing doesn't just spew SIGBUS constantly.
The short answer is: mempolicy is advisory and cpuset is strictly
followed - in a dispute cpuset wins... except for file backed memory,
then everyon loses and nothing is consistent.
and it is historically done this way a long time ago before cgroup v2.
For cgroup v1, mems_allowed can't be empty or you can't put any task into the
cpuset. Also effective_mems is the same as mems_allowed. cgroup v2 is quite
different in how it handles memory nodes and CPUs. Users can isn't forced to
set mems_allowed and cpus_allowed as effective_mems and effective_cpus will
inherit parent version if mems_allowed and cpus_allowed are not set. IOW,
effective_mems will never be empty. Yes, it is a bug with the introduction of
cpuset v2 that we should have replaced mems_allowed by effective_mems at that
time. With v2, effective_mems should contain only online nodes. The only
exception is during the short transition period when a memory node hotunplug
operation is in progress when a write to cpuset.mems is happening at the same
time. With v1, it is theoretically possible that none of the nodes in
mems_allowed is online.
The reason why I am suggesting to use cs->effective_mems to keep the old
cgroup v1 behavior. If the consensus is to use the output of
guarantee_online_mems() for mpol_rebind_mm(), I will not be against that but
it will be a slight change in user-visible behavior.
guarantee_online_mems() value instead, it will have to be a separate patch with
changes in the relevant documentation like Documentation/admin-guide/cgroup-v1/
cpuset.rst.
something that needs half a page of text to explain why it is fine -- if newmems
just does the trick?
Please enlighten me.
Yes, taking newmems is a reasonable choice and there are pros and cons with each options. My focus is more on not changing how v1 cpuset behaves as it is well defined in the v1 cpusets.rst file:
Requests by a task, using the sched_setaffinity(2) system call to
include CPUs in its CPU affinity mask, and using the mbind(2) and
set_mempolicy(2) system calls to include Memory Nodes in its memory
policy, are both filtered through that task's cpuset, filtering out any
CPUs or Memory Nodes not in that cpuset. The scheduler will not
schedule a task on a CPU that is not allowed in its cpus_allowed
vector, and the kernel page allocator will not allocate a page on a
node that is not allowed in the requesting task's mems_allowed vector.
v2, OTOH, is more vague as to what setting cpuset.mems will mean and we generally follow what v1 is doing, but we have more leeway of what we can do.
Using newmems will make the above text not totally correct. At least the offline memory nodes will be filtered out which will not be utilized by the task when the offline node becomes online. That is why I am saying that we will have to correct the documentation if we want to make this change.
Cheers,
Longman