Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed
From: Waiman Long
Date: Mon Jun 15 2026 - 23:43:51 EST
On 6/15/26 10:26 PM, Waiman Long wrote:
On 6/15/26 5:38 AM, Gregory Price wrote:
On Mon, Jun 15, 2026 at 10:08:51AM +0200, David Hildenbrand (Arm) wrote:
On 6/14/26 15:25, Farhad Alemi wrote:All interactions between mempolicy and cpuset are horrible and
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.cGod this is confusing.
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2649,7 +2649,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
migrate = is_memory_migrate(cs);
- mpol_rebind_mm(mm, &cs->mems_allowed);
+ mpol_rebind_mm(mm, &cs->effective_mems);
confusing. Much like Lorenzo's anon_vma work, I have to keep
notes on how this whole thing doesn't just spew SIGBUS constantly.
The short answer is: mempolicy is advisory and cpuset is strictly
followed - in a dispute cpuset wins... except for file backed memory,
then everyon loses and nothing is consistent.
That is what I believe why mpol_rebind_mm() a bit differently from the others and it is historically done this way a long time ago before cgroup v2.
For cgroup v1, mems_allowed can't be empty or you can't put any task into the cpuset. Also effective_mems is the same as mems_allowed. cgroup v2 is quite different in how it handles memory nodes and CPUs. Users can isn't forced to set mems_allowed and cpus_allowed as effective_mems and effective_cpus will inherit parent version if mems_allowed and cpus_allowed are not set. IOW, effective_mems will never be empty. Yes, it is a bug with the introduction of cpuset v2 that we should have replaced mems_allowed by effective_mems at that time. With v2, effective_mems should contain only online nodes. The only exception is during the short transition period when a memory node hotunplug operation is in progress when a write to cpuset.mems is happening at the same time. With v1, it is theoretically possible that none of the nodes in mems_allowed is online.
The reason why I am suggesting to use cs->effective_mems to keep the old cgroup v1 behavior. If the consensus is to use the output of guarantee_online_mems() for mpol_rebind_mm(), I will not be against that but it will be a slight change in user-visible behavior.
BTW, I still prefer the v2 patch. If it is decided we should use the guarantee_online_mems() value instead, it will have to be a separate patch with changes in the relevant documentation like Documentation/admin-guide/cgroup-v1/cpuset.rst.
Cheers,
Longman