Re: [PATCH v2] cgroup/cpuset: rebind mm mempolicy to effective_mems, not mems_allowed

From: Waiman Long

Date: Mon Jun 15 2026 - 23:43:51 EST


On 6/15/26 10:26 PM, Waiman Long wrote:

On 6/15/26 5:38 AM, Gregory Price wrote:
On Mon, Jun 15, 2026 at 10:08:51AM +0200, David Hildenbrand (Arm) wrote:
On 6/14/26 15:25, Farhad Alemi wrote:
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2649,7 +2649,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs)

          migrate = is_memory_migrate(cs);

-        mpol_rebind_mm(mm, &cs->mems_allowed);
+        mpol_rebind_mm(mm, &cs->effective_mems);
God this is confusing.

All interactions between mempolicy and cpuset are horrible and
confusing.  Much like Lorenzo's anon_vma work, I have to keep
notes on how this whole thing doesn't just spew SIGBUS constantly.

The short answer is: mempolicy is advisory and cpuset is strictly
followed - in a dispute cpuset wins... except for file backed memory,
then everyon loses and nothing is consistent.

That is what I believe why mpol_rebind_mm() a bit differently from the others and it is historically done this way a long time ago before cgroup v2.

For cgroup v1, mems_allowed can't be empty or you can't put any task into the cpuset. Also effective_mems is the same as mems_allowed. cgroup v2 is quite different in how it handles memory nodes and CPUs. Users can isn't forced to set mems_allowed and cpus_allowed as effective_mems and effective_cpus will inherit parent version if mems_allowed and cpus_allowed are not set. IOW, effective_mems will never be empty. Yes, it is a bug with the introduction of cpuset v2 that we should have replaced mems_allowed by effective_mems at that time. With v2, effective_mems should contain only online nodes. The only exception is during the short transition period when a memory node hotunplug operation is in progress when a write to cpuset.mems is happening at the same time. With v1, it is theoretically possible that none of the nodes in mems_allowed is online.

The reason why I am suggesting to use cs->effective_mems to keep the old cgroup v1 behavior. If the consensus is to use the output of guarantee_online_mems() for mpol_rebind_mm(), I will not be against that but it will be a slight change in user-visible behavior.

BTW, I still prefer the v2 patch. If it is decided we should use the guarantee_online_mems() value instead, it will have to be a separate patch with changes in the relevant documentation like Documentation/admin-guide/cgroup-v1/cpuset.rst.

Cheers,
Longman