Re: [PATCH] sched/fair: Don't trigger active lb if src_rq->curr is CFS and not on_rq

From: Valentin Schneider

Date: Fri Jun 12 2026 - 10:50:17 EST


On 03/06/26 20:59, Xin Zhao wrote:
> Active balancing needs the help by migration threads which will interrupt
> task on src_rq. It has a certain impact on overall performance. Active
> balancing often fails, there is a check to determine whether the current
> task(say it 'curr') on src_rq can run on dst_rq. We have observed that
> even that, if curr is a CFS task and on_rq is 0, the failure rate of
> active balancing is very high. Below are the test data from a certain
> fillback task scenario executed on a platform with 18 CPUs over 300
> seconds:
>
> total: the total count of cases that match
> cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr) &&
> busiest->curr->sched_class == &fair_sched_class &&
> !busiest->curr->on_rq
> succ: the active balance success cases that match
> cpumask_......->on_rq
> fail: the active balance fail cases that match
> cpumask_......->on_rq
>
> total succ fail
> cpu0 domain0 00003 0 0 0
> cpu0 domain1 3ffff 32 0 32
> cpu1 domain0 00003 0 0 0
> cpu1 domain1 3ffff 40 0 40
> cpu2 domain0 0003c 3 0 3
> cpu2 domain1 3ffff 6 0 6
> cpu3 domain0 0003c 3 1 2
> cpu3 domain1 3ffff 3 0 3
> cpu4 domain0 0003c 3 0 3
> cpu4 domain1 3ffff 4 0 4
> cpu5 domain0 0003c 1 0 1
> cpu5 domain1 3ffff 6 0 6
> cpu6 domain0 003c0 39 0 39
> cpu6 domain1 3ffff 36 0 36
> cpu7 domain0 003c0 213 4 209
> cpu7 domain1 3ffff 24 2 22
> cpu8 domain0 003c0 242 16 226
> cpu8 domain1 3ffff 16 0 16
> cpu9 domain0 003c0 0 0 0
> cpu9 domain1 3ffff 6 1 5
> cpu10 domain0 03c00 58 1 57
> cpu10 domain1 3ffff 0 0 0
> cpu11 domain0 03c00 54 4 50
> cpu11 domain1 3ffff 1 0 1
> cpu12 domain0 03c00 66 1 65
> cpu12 domain1 3ffff 0 0 0
> cpu13 domain0 03c00 66 1 65
> cpu13 domain1 3ffff 0 0 0
> cpu14 domain0 3c000 0 0 0
> cpu14 domain1 3ffff 57 5 52
> cpu15 domain0 3c000 15 0 15
> cpu15 domain1 3ffff 35 0 35
> cpu16 domain0 3c000 148 3 145
> cpu16 domain1 3ffff 109 1 108
> cpu17 domain0 3c000 182 2 180
> cpu17 domain1 3ffff 78 1 77
>
> We add the situation not to perform active balancing. For cases where
> src_rq->curr is a non-CFS task, we retain the affinity check for dst_rq
> because such task is likely to wake-up or woken-by src_rq CFS task which
> has similar affinity characteristics to migrate. We attempted to execute
> list_move_tail for the curr CFS task on src_rq before unlocking src_rq,
> but testing showed no improvement in the traversal count of cfs_tasks in
> detach_one_task().
>
> Additionally, in sched_balance_rq(), we unconditionally reset the
> balance_interval to min_interval. The difference is that original logic
> does not reset the balance_interval when dst_cpu softirq handler is
> preempted while src_cpu successfully run the just-dispatched active
> balancing, during the gaps between two need_active_balance() checks. It
> seems that we haven't observed any substantial benefits from reducing the
> opportunities for balance under such fluctuating conditions.
>
> Signed-off-by: Xin Zhao <jackzxcui1989@xxxxxxx>
> ---
> kernel/sched/fair.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b5819c489..cba6dc6da 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -13423,7 +13423,9 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
> * if the curr task on busiest CPU can't be
> * moved to this_cpu:
> */
> - if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
> + if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr) ||
> + (busiest->curr->sched_class == &fair_sched_class &&
> + !busiest->curr->on_rq)) {

AFAICT the standard pattern for a blocking task is:

__schedule()
rq_lock(rq)
try_to_block_task(rq, prev)
prev->on_rq=0;

rq->curr = next;

context_switch()
raw_spin_rq_unlock_irq(rq);

proxy & blocked-on tasks shouldn't get near p->on_rq.

I'm struggling to see how can load balance grab a rq lock and observe
rq->curr->on_rq=0? Obviously I'm missing something but it's Friday...