Re: [PATCH] sched/fair: Don't trigger active lb if src_rq->curr is CFS and not on_rq
From: Xin Zhao
Date: Sat Jun 13 2026 - 02:07:14 EST
On Fri, 12 Jun 2026 16:50:00 +0200 Valentin Schneider <vschneid@xxxxxxxxxx> wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index b5819c489..cba6dc6da 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -13423,7 +13423,9 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
> > * if the curr task on busiest CPU can't be
> > * moved to this_cpu:
> > */
> > - if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
> > + if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr) ||
> > + (busiest->curr->sched_class == &fair_sched_class &&
> > + !busiest->curr->on_rq)) {
>
> AFAICT the standard pattern for a blocking task is:
>
> __schedule()
> rq_lock(rq)
> try_to_block_task(rq, prev)
> prev->on_rq=0;
>
> rq->curr = next;
>
> context_switch()
> raw_spin_rq_unlock_irq(rq);
>
> proxy & blocked-on tasks shouldn't get near p->on_rq.
>
> I'm struggling to see how can load balance grab a rq lock and observe
> rq->curr->on_rq=0? Obviously I'm missing something but it's Friday...
It took me quite a lot of time to think about why rq->curr could have
zero on_rq while holding rq lock. I conducted some experiments and
ultimately found the reason. I will add the reason to the commit log in
PATCH v2.
In __schedule(), before setting curr to next, during the execution of
pick_next_task(), sched_balance_rq() is called. It will unlock and then
re-lock the rq, creating "holes" during which other CPUs may see zero
rq->curr->on_rq. This situation occurs quite frequently because:
1. Periodic load balancing across CPUs often happens in close
succession, leading to collisions in the rq lock during the execution
of sched_balance_rq().
2. try_to_block_task() sets curr->on_rq to 0, and during the rq lock
"hole" in pick_next_task(), rq->curr has not yet been assigned to next,
resulting in curr->on_rq being seen as 0.
Thanks
Xin Zhao