Re: [cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
From: Waiman Long
Date: Fri Jan 09 2026 - 15:16:03 EST
On 1/5/26 1:29 AM, Chen Ridong wrote:
On 2026/1/5 12:06, Waiman Long wrote:
On 1/4/26 10:58 PM, Chen Ridong wrote:Sorry, I should have been clearer. If we change the condition, the code would essentially be:
On 2026/1/5 11:50, Waiman Long wrote:Changing the condition to (new_prs <= 0) won't affect the result except for a bit of wasted cpu
On 1/4/26 8:15 PM, Chen Ridong wrote:If we change the condition to (new_prs <= 0), it will reset the partition data even when we call
On 2026/1/5 5:25, Waiman Long wrote:effective_xcpus should be set when exclusive_cpus is not empty or when the cpuset is a valid
On 1/3/26 9:48 PM, Chen Ridong wrote:Thank you for your patience.
On 2026/1/2 3:15, Waiman Long wrote:update_cpumasks_hier() is called when changes in a cpuset or hotplug affects other cpusets in the
Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()The code resets partition data only for new_prs < 0. My understanding is that a partition is
& make update_cpumasks_hier() handle remote partition"), the
compute_effective_exclusive_cpumask() helper was extended to
strip exclusive CPUs from siblings when computing effective_xcpus
(cpuset.cpus.exclusive.effective). This helper was later renamed to
compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
CPU mask computation logic").
This helper is supposed to be used consistently to compute
effective_xcpus. However, there is an exception within the callback
critical section in update_cpumasks_hier() when exclusive_cpus of a
valid partition root is empty. This can cause effective_xcpus value to
differ depending on where exactly it is last computed. Fix this by using
compute_excpus() in this case to give a consistent result.
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/cgroup/cpuset.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index da2b3b51630e..37d118a9ad4d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks
*tmp,
spin_lock_irq(&callback_lock);
cpumask_copy(cp->effective_cpus, tmp->new_cpus);
cp->partition_root_state = new_prs;
- if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
- compute_excpus(cp, cp->effective_xcpus);
-
/*
- * Make sure effective_xcpus is properly set for a valid
- * partition root.
+ * Need to compute effective_xcpus if either exclusive_cpus
+ * is non-empty or it is a valid partition root.
*/
- if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
- cpumask_and(cp->effective_xcpus,
- cp->cpus_allowed, parent->effective_xcpus);
- else if (new_prs < 0)
+ if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
+ compute_excpus(cp, cp->effective_xcpus);
+ if (new_prs < 0)
reset_partition_data(cp);
spin_unlock_irq(&callback_lock);
invalid
when new_prs <= 0. Shouldn't reset_partition_data() also be called when new_prs = 0? Is there a
specific reason to skip the reset in that case?
hierarchy. With respect to changes in partition state, it is either from valid to invalid or vice
versa. It will not change from a valid partition to member. The only way new_prs = 0 is when
old_prs
= 0. Even if the affected cpuset is processed again in update_cpumask_hier(), any state change
from
valid partition to member (update_prstate()), reset_partition_data() should have been called
there.
That is why we only care about when new_prs != 0.
The code isn't wrong here. However I can change the condition to (new_prs <= 0) if it makes itI agree there's nothing wrong with the current logic. However, for clarity, I suggest changing the
easier to understand.
condition to (new_prs <= 0). This allows the function's logic to be fully self-consistent and
focused on a single responsibility. This approach would allow us to simplify the code to:
if (new_prs > 0)
compute_excpus(cp, cp->effective_xcpus);
else
reset_partition_data(cp);
Since reset_partition_data() already handles cases whether cp->exclusive_cpus is empty or not, this
implementation would be more concise while correctly covering all scenarios.
partition root. So just checking new_prs for compute_excpus() is not enough.
compute_excpus (for !cpumask_empty(cp->exclusive_cpus)), so we should still get the same result,
right?
cycles. That is why I am planning to make the change in the next version to make it easier to
understand.
if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus))
compute_excpus(cp, cp->effective_xcpus);
if (new_prs <= 0)
reset_partition_data(cp);
For cases where new_prs <= 0 && !cpumask_empty(cp->exclusive_cpus), both compute_excpus() and
reset_partition_data() would be called.
Is this functionally equivalent to:
if (new_prs > 0)
compute_excpus(cp, cp->effective_xcpus);
else (new_prs <= 0)
reset_partition_data(cp);
They are not equivalent because reset_partition_data() won't do a compute_excpus(). In fact, one of the tests in test_cpuset_prs.sh will fail if we make this change.
Cheers,
Longman