Re: [PATCH RESEND] sched/fair: Only update stats for allowed CPUs when looking for dst group
From: Chen, Yu C
Date: Tue Oct 14 2025 - 08:07:30 EST
On 10/14/2025 6:51 PM, Adam Li wrote:
Hi Chenyu,
Thanks for your comments.
On 10/12/2025 1:42 AM, Chen, Yu C wrote:
On 10/11/2025 2:43 PM, Adam Li wrote:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc0b7ce8a65d..d5ec15050ebc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10671,7 +10671,7 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
if (sd->flags & SD_ASYM_CPUCAPACITY)
sgs->group_misfit_task_load = 1;
- for_each_cpu(i, sched_group_span(group)) {
+ for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
Looks good to me. One minor question, would pre-calculating the mask be better?
I do agree pre-calculating the cpumask can save cpu cycles, without
doing mask AND at each loop.
Copied from select_idle_cpu():But I am not sure if it is safe to use the percpu 'select_rq_mask'
cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
cpumask_and(cpus, sched_group_span(sd), p->cpus_ptr);
for_each_cpu(i, cpus) {
in update_sg_wakeup_stats(). Or we have to allocate a 'struct cpumask'.
Allocating dynamically would be costly. Using percpu select_rq_mask is
safe in this scenario: the waker's CPU has already disabled local irq
via raw_spinlock_irqsave(&p->pi_lock), so I suppose no one can modify
it simultaneously. Moreover, if the fast wakeup path select_idle_sibling()
can use it, the slow path sched_balance_find_dst_cpu() should also be able
to do so IMO.
I tested bellow patch. It can work and fix the bug.
If it is safe to use 'select_rq_mask' , I can submit V2 patch.
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10664,6 +10664,7 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
struct task_struct *p)
{
int i, nr_running;
+ struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
memset(sgs, 0, sizeof(*sgs));
@@ -10671,7 +10672,8 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
if (sd->flags & SD_ASYM_CPUCAPACITY)
sgs->group_misfit_task_load = 1;
- for_each_cpu(i, sched_group_span(group)) {
nice-to-have:
maybe add a comment here that cpus is not empty, because
we have cpumask_intersects() check in sched_balance_find_dst_group(),
(just in case sgs->group_type incorrectly remain 0 which is group_has_spare, if
the cpus is empty)
+ cpumask_and(cpus, sched_group_span(group), p->cpus_ptr);
+ for_each_cpu(i, cpus) {
struct rq *rq = cpu_rq(i);
unsigned int local;
and from my understanding, for this percpu version,
Reviewed-by: Chen Yu <yu.c.chen@xxxxxxxxx>
thanks,
Chenyu