Re: [cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping

From: Waiman Long
Date: Sun Nov 02 2025 - 22:13:42 EST


On 11/2/25 9:55 PM, Chen Ridong wrote:

On 2025/11/3 9:34, Waiman Long wrote:
From: Gabriele Monaco <gmonaco@xxxxxxxxxx>

Currently the user can set up isolated cpus via cpuset and nohz_full in
such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
domain isolated nor nohz full). This can be a problem for other
subsystems (e.g. the timer wheel imgration).

Prevent this configuration by blocking any assignation that would cause
the union of domain isolated cpus and nohz_full to covers all CPUs.

Acked-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
Reviewed-by: Waiman Long <longman@xxxxxxxxxx>
Signed-off-by: Gabriele Monaco <gmonaco@xxxxxxxxxx>
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/cgroup/cpuset.c | 67 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index da770dac955e..d6d459c95d82 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1329,6 +1329,19 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
}
+/*
+ * isolated_cpus_should_update - Returns if the isolated_cpus mask needs update
+ * @prs: new or old partition_root_state
+ * @parent: parent cpuset
+ * Return: true if isolated_cpus needs modification, false otherwise
+ */
+static bool isolated_cpus_should_update(int prs, struct cpuset *parent)
+{
+ if (!parent)
+ parent = &top_cpuset;
+ return prs != parent->partition_root_state;
+}
+
Hi Longman,

I am confused about this function.

Why do we need to compare the partition_root_state (prs) with the parent's partition_root_state?

For example, when a local partition is assigned to a member, I don't think the isolated cpumasks
should be updated in this case.

In my understanding, the isolated CPUs should only be updated when an isolated partition is being
disabled or enabled. I was thinking of something like this:

bool isolated_cpus_should_update(int new_prs, int old_prs)
{
if (new_prs == old_prs)
return false;
if (old_prs == 2 || new_prs == 2)
return true;
return false;
}

I would really appreciate it if you could provide some further explanation on this.

This function should only be called when both the current and the parent (top_cpuset for remote) are valid partition roots. For both local and remote partition, the child cpuset takes CPUs from the parent. The list of isolated CPUs will only change if parent and child cpusets have different partition root types. If parent is an isolated partition, taking CPUs from parent to form another isolated partition will not change isolated_cpus. The same is true if both parent and child are root.

You are right that this check may not be obvious for a casual observer. I can add some more comments to clarify that.

Cheers,
Longman