Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed
From: Waiman Long
Date: Mon Jun 15 2026 - 22:26:25 EST
On 6/15/26 5:38 AM, Gregory Price wrote:
On Mon, Jun 15, 2026 at 10:08:51AM +0200, David Hildenbrand (Arm) wrote:
On 6/14/26 15:25, Farhad Alemi wrote:All interactions between mempolicy and cpuset are horrible and
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.cGod this is confusing.
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2649,7 +2649,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)
migrate = is_memory_migrate(cs);
- mpol_rebind_mm(mm, &cs->mems_allowed);
+ mpol_rebind_mm(mm, &cs->effective_mems);
confusing. Much like Lorenzo's anon_vma work, I have to keep
notes on how this whole thing doesn't just spew SIGBUS constantly.
The short answer is: mempolicy is advisory and cpuset is strictly
followed - in a dispute cpuset wins... except for file backed memory,
then everyon loses and nothing is consistent.
That is what I believe why mpol_rebind_mm() a bit differently from the others and it is historically done this way a long time ago before cgroup v2.
For cgroup v1, mems_allowed can't be empty or you can't put any task into the cpuset. Also effective_mems is the same as mems_allowed. cgroup v2 is quite different in how it handles memory nodes and CPUs. Users can isn't forced to set mems_allowed and cpus_allowed as effective_mems and effective_cpus will inherit parent version if mems_allowed and cpus_allowed are not set. IOW, effective_mems will never be empty. Yes, it is a bug with the introduction of cpuset v2 that we should have replaced mems_allowed by effective_mems at that time. With v2, effective_mems should contain only online nodes. The only exception is during the short transition period when a memory node hotunplug operation is in progress when a write to cpuset.mems is happening at the same time. With v1, it is theoretically possible that none of the nodes in mems_allowed is online.
The reason why I am suggesting to use cs->effective_mems to keep the old cgroup v1 behavior. If the consensus is to use the output of guarantee_online_mems() for mpol_rebind_mm(), I will not be against that but it will be a slight change in user-visible behavior.
Cheers, Longman
Naturally I wonder: Why are we not using "task->mems_allowed" (maybe cs vs. tskShort answer: task->mems_allowed is protected by the task lock and we
was the original bug?), which is effectively just newmems?
don't hold the task lock for a foreign task (not-current) over mm
operations.
Long answer: Reasons and "Stop looking at the spaghetti, it's going to
break"
~Gregory