Re: [PATCH v4 1/2] sched/fair: Don't trigger active lb if src_rq->curr is CFS and not on_rq

From: Valentin Schneider

Date: Tue Jun 16 2026 - 09:38:56 EST


On 16/06/26 15:18, Xin Zhao wrote:
> In __schedule(), before setting curr to next, during the execution of
> pick_next_task(), sched_balance_rq() is called. It will unlock and then
> re-lock the rq, creating "holes" during which other CPUs may see zero
> rq->curr->on_rq. try_to_block_task() sets curr->on_rq to 0, and during the
> rq lock "hole" in pick_next_task(), rq->curr has not yet been assigned to
> next, resulting in curr->on_rq being seen as 0.
>
> We do not need to perform active balancing when src_rq->curr is CFS task
> but on_rq is 0, as other CFS tasks have been already checked just before.
> For cases where src_rq->curr is a non-CFS task, we retain the affinity
> check for dst_rq to trigger active balancing because such task is likely
> to wake-up or woken-by src_rq CFS task which has similar affinity
> characteristics to migrate.
>

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b5819c489..4391b6e5b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -13436,12 +13436,22 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
> * ->active_balance_work. Once set, it's cleared
> * only after active load balance is finished.
> */
> - if (!busiest->active_balance) {
> - busiest->active_balance = 1;
> - busiest->push_cpu = this_cpu;
> - active_balance = 1;
> - }
> + if (busiest->active_balance)
> + goto no_active_balance;
>
> + /*
> + * @busiest dropped its rq_lock in the middle of
> + * scheduling out its ->curr task (->on_rq := 0), no
> + * need to forcefully punt it away with active balance.
> + */
> + if ((busiest->curr->sched_class == &fair_sched_class) &&
> + !busiest->curr->on_rq)
> + goto no_active_balance;

I hadn't thought about that 'busiest->curr == CFS' condition much until
now; I thought we had something already to prevent active load balance from
accidentally poking at RT/DL tasks, but I must have been thinking about [1]
which never went anywhere

Either way, we could probably get rid of it and have that be just a
'!busiest->curr->on_rq' check.

[1]: https://lore.kernel.org/lkml/20190815145107.5318-5-valentin.schneider@xxxxxxx/

> +
> + busiest->active_balance = 1;
> + busiest->push_cpu = this_cpu;
> + active_balance = 1;
> +no_active_balance:
> preempt_disable();
> raw_spin_rq_unlock_irqrestore(busiest, flags);
> if (active_balance) {
> --
> 2.34.1