Re: [PATCH v2] sched/fair: Don't trigger active lb if src_rq->curr is CFS and not on_rq

From: Aiqun(Maria) Yu

Date: Sun Jun 14 2026 - 00:18:16 EST

On 6/13/2026 3:32 PM, Xin Zhao wrote:
> Active balancing needs the help by migration threads which will interrupt
> task on src_rq. It has a certain impact on overall performance. Active
> balancing often fails, there is a check to determine whether the current
> task(say it 'curr') on src_rq can run on dst_rq. We have observed that
> even that, if curr is a CFS task and on_rq is 0, the failure rate of
> active balancing is very high. Below are the test data from a certain
> fillback task scenario executed on a platform with 18 CPUs over 300
> seconds:
>
> total: the total count of cases that match
> cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr) &&
> busiest->curr->sched_class == &fair_sched_class &&
> !busiest->curr->on_rq
> succ/fail: the active balance success/fail cases that match
> cpumask_......->on_rq
>
> total succ fail
> cpu0 domain0 00003 0 0 0
> cpu0 domain1 3ffff 32 0 32
> cpu1 domain0 00003 0 0 0
> cpu1 domain1 3ffff 40 0 40
> cpu2 domain0 0003c 3 0 3
> cpu2 domain1 3ffff 6 0 6
> cpu3 domain0 0003c 3 1 2
> cpu3 domain1 3ffff 3 0 3
> cpu4 domain0 0003c 3 0 3
> cpu4 domain1 3ffff 4 0 4
> cpu5 domain0 0003c 1 0 1
> cpu5 domain1 3ffff 6 0 6
> cpu6 domain0 003c0 39 0 39
> cpu6 domain1 3ffff 36 0 36
> cpu7 domain0 003c0 213 4 209
> cpu7 domain1 3ffff 24 2 22
> cpu8 domain0 003c0 242 16 226
> cpu8 domain1 3ffff 16 0 16
> cpu9 domain0 003c0 0 0 0
> cpu9 domain1 3ffff 6 1 5
> cpu10 domain0 03c00 58 1 57
> cpu10 domain1 3ffff 0 0 0
> cpu11 domain0 03c00 54 4 50
> cpu11 domain1 3ffff 1 0 1
> cpu12 domain0 03c00 66 1 65
> cpu12 domain1 3ffff 0 0 0
> cpu13 domain0 03c00 66 1 65
> cpu13 domain1 3ffff 0 0 0
> cpu14 domain0 3c000 0 0 0
> cpu14 domain1 3ffff 57 5 52
> cpu15 domain0 3c000 15 0 15
> cpu15 domain1 3ffff 35 0 35
> cpu16 domain0 3c000 148 3 145
> cpu16 domain1 3ffff 109 1 108
> cpu17 domain0 3c000 182 2 180
> cpu17 domain1 3ffff 78 1 77

What's the probability that curr->on_rq is 1 throughout the entire check?

>
> In __schedule(), before setting curr to next, during the execution of
> pick_next_task(), sched_balance_rq() is called. It will unlock and then
> re-lock the rq, creating "holes" during which other CPUs may see zero
> rq->curr->on_rq. This situation occurs quite frequently because:
> 1. Periodic load balancing across CPUs often happens in close succession,
> leading to collisions in the rq lock during sched_balance_rq().
> 2. try_to_block_task() sets curr->on_rq to 0, and during the rq lock
> "hole" in pick_next_task(), rq->curr has not yet been assigned to next,
> resulting in curr->on_rq being seen as 0.

It is possible that curr->on_rq is seen as 0 and don't need to do active
balance.
While my concern is the overhead of check "curr->on_rq" every time to
the possibility of curr->on_rq should be considered as well.

>
> We do not need to perform active balancing when src_rq->curr is CFS task
> but on_rq is 0, as other CFS tasks have been already checked just before.
> For cases where src_rq->curr is a non-CFS task, we retain the affinity
> check for dst_rq to trigger active balancing because such task is likely
> to wake-up or woken-by src_rq CFS task which has similar affinity
> characteristics to migrate.
>
> Additionally, in sched_balance_rq(), we unconditionally reset the
> balance_interval to min_interval. The difference is that original logic
> does not reset the balance_interval when dst_cpu softirq handler is
> preempted while src_cpu successfully run the just-dispatched active
> balancing, during the gaps between two need_active_balance() checks. It
> seems that we haven't observed any substantial benefits from reducing the
> opportunities for balance under such fluctuating conditions. So simplify

This is not that clear to me. And could you pls help to have more data
and example to have more details?
Maybe it can be separate with different patch?

> the need_active_balance() checks logic.
>
> Signed-off-by: Xin Zhao <jackzxcui1989@xxxxxxx>
> ---
>
> Change in v2:
> - Add reason in the commit log why we can see zero rq->curr->on_rq when we
> hold rq lock,
> as suggested by Valentin Schneider.
>
> v1:
> - Link to v1: https://lore.kernel.org/all/20260603125938.1938115-1-jackzxcui1989@xxxxxxx/
> ---
> kernel/sched/fair.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b5819c489..cba6dc6da 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -13423,7 +13423,9 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
> * if the curr task on busiest CPU can't be
> * moved to this_cpu:
> */
> - if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
> + if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr) ||
> + (busiest->curr->sched_class == &fair_sched_class &&
> + !busiest->curr->on_rq)) {

if busiest->curr->sched_class != &fair_sched_class, do we really need to
do active balance here?

Also maybe unlikely(!busiest->curr->on_rq) instead.

> raw_spin_rq_unlock_irqrestore(busiest, flags);
> goto out_one_pinned;
> }
> @@ -13455,10 +13457,8 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
> sd->nr_balance_failed = 0;
> }
>
> - if (likely(!active_balance) || need_active_balance(&env)) {

If the active_balance already triggered, why still need reset balancing
interval?

> - /* We were unbalanced, so reset the balancing interval */
> - sd->balance_interval = sd->min_interval;
> - }
> + /* We were unbalanced, so reset the balancing interval */
> + sd->balance_interval = sd->min_interval;
>
> goto out;
>

--
Thx and BRs,
Aiqun(Maria) Yu