On 9/15/23 11:45, Pierre Gondois wrote:
When an isolated parition is created, the sched domains (sds) are
rebuilt and the sds of the isolated CPUs are detached. This only
happens at the creation of the isolated parition. Updating
the cpuset of the partition doesn't rebuild the sds:
To reproduce:
# ls /sys/kernel/debug/sched/domains/cpu0/
domain0
# ls /sys/kernel/debug/sched/domains/cpu1/
domain0
#
# mkdir cgroup
# mount -t cgroup2 none cgroup/
# mkdir cgroup/A1/
# echo "+cpuset" > cgroup/cgroup.subtree_control
# echo 0-3 > cgroup/A1/cpuset.cpus
# echo isolated > cgroup/A1/cpuset.cpus.partition
#
# ls /sys/kernel/debug/sched/domains/cpu0/
# ls /sys/kernel/debug/sched/domains/cpu1/
#
# echo 0 > cgroup/A1/cpuset.cpus
# ls /sys/kernel/debug/sched/domains/cpu0/
# ls /sys/kernel/debug/sched/domains/cpu1/
#
Here CPU1 should have a sched domain re-attached.
Signed-off-by: Pierre Gondois <pierre.gondois@xxxxxxx>
---
kernel/cgroup/cpuset.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 58e6f18f01c1..e3eb27ff9b68 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1680,11 +1680,15 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
* empty cpuset is changed, we need to rebuild sched domains.
* On default hierarchy, the cpuset needs to be a partition
* root as well.
+ * Also rebuild sched domains if the cpuset of an isolated
+ * partition changed.
*/
- if (!cpumask_empty(cp->cpus_allowed) &&
- is_sched_load_balance(cp) &&
- (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) ||
- is_partition_valid(cp)))
+ if ((!cpumask_empty(cp->cpus_allowed) &&
+ is_sched_load_balance(cp) &&
+ (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) ||
+ is_partition_valid(cp))) ||
+ (cp->partition_root_state == PRS_ISOLATED ||
+ cp->partition_root_state == PRS_INVALID_ISOLATED))
need_rebuild_sched_domains = true;
rcu_read_lock();
Thanks for spotting the problem and sending out a patch to fix it.
However, it should be done in the update_cpumask(). The sched_domain
rebuild in update_cpumasks_hier() is supposed to be used for impacted
descendant cpusets lower down in the hierarchy.
Anyway, I believe this problem should have been fixed in commit
a86ce68078b2 ("cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE &
CS_SCHED_LOAD_BALANCE handling") recently merged into the v6.6 kernel.
Would you mind testing the latest upstream kernel to see if this problem
is gone or not?
Thanks,
Longman