Re: [PATCH v5 1/2] sched/fair: Don't trigger active lb if src_rq->curr is not on_rq

From: Valentin Schneider

Date: Wed Jun 17 2026 - 05:34:52 EST


On 17/06/26 15:21, Xin Zhao wrote:
> Active balancing needs the help by migration threads which will interrupt
> task on src_rq. It has a certain impact on overall performance. Active
> balancing often fails, there is a check to determine whether the current
> task(say it 'curr') on src_rq can run on dst_rq. We have observed that
> even that, if curr is a CFS task and on_rq is 0, the failure rate of
> active balancing is very high. Below are the test data from a certain
> fillback task scenario executed on a platform with 18 CPUs over 300
> seconds:
>
> fair: busiest->curr->sched_class == &fair_sched_class
> on_rq: busiest->curr->on_rq
> total: active balance count triggered of correspondent type
> fail: fail to migrate one task in active_load_balance_cpu_stop()
>
> fair && !on_rq !fair && !on_rq
> domain total fail total fail
> cpu0 0x00003 0 0 0 0
> cpu0 0x3ffff 33 33 1 1
> cpu1 0x00003 0 0 0 0
> cpu1 0x3ffff 42 42 0 0
> cpu2 0x0003c 4 4 0 0
> cpu2 0x3ffff 12 12 0 0
> cpu3 0x0003c 3 3 0 0
> cpu3 0x3ffff 8 7 0 0
> cpu4 0x0003c 2 2 0 0
> cpu4 0x3ffff 5 4 0 0
> cpu5 0x0003c 4 4 0 0
> cpu5 0x3ffff 8 8 0 0
> cpu6 0x003c0 60 60 0 0
> cpu6 0x3ffff 28 27 0 0
> cpu7 0x003c0 194 184 0 0
> cpu7 0x3ffff 35 35 1 1
> cpu8 0x003c0 240 228 0 0
> cpu8 0x3ffff 28 28 0 0
> cpu9 0x003c0 0 0 0 0
> cpu9 0x3ffff 10 10 0 0
> cpu10 0x03c00 52 50 0 0
> cpu10 0x3ffff 0 0 0 0
> cpu11 0x03c00 70 68 0 0
> cpu11 0x3ffff 1 1 0 0
> cpu12 0x03c00 73 72 0 0
> cpu12 0x3ffff 0 0 0 0
> cpu13 0x03c00 79 76 0 0
> cpu13 0x3ffff 0 0 0 0
> cpu14 0x3c000 0 0 0 0
> cpu14 0x3ffff 57 55 1 0
> cpu15 0x3c000 53 52 1 0
> cpu15 0x3ffff 30 29 0 0
> cpu16 0x3c000 344 341 10 6
> cpu16 0x3ffff 103 100 2 1
> cpu17 0x3c000 183 179 2 2
> cpu17 0x3ffff 78 77 0 0
> sum 1839 1791 18 11
>
> In __schedule(), before setting curr to next, during the execution of
> pick_next_task(), sched_balance_rq() is called. It will unlock and then
> re-lock the rq, creating "holes" during which other CPUs may see zero
> rq->curr->on_rq. try_to_block_task() sets curr->on_rq to 0, and during the
> rq lock "hole" in pick_next_task(), rq->curr has not yet been assigned to
> next, resulting in curr->on_rq being seen as 0.
>
> We do not need to perform active balancing when src_rq->curr is CFS task
> but on_rq is 0, as other CFS tasks have been probably checked just before.
> For cases where src_rq->curr is a non-CFS task, we retain the affinity
> check for dst_rq to trigger active balancing because such task is likely
> to wake-up or woken-by src_rq CFS task which has similar affinity
> characteristics to migrate. Also, after executing detach_tasks(), rq lock
> is released. Tasks on the rq awakened during detach_tasks() may preempt
> the previous CFS task. Based on my test(though not shown above), success
> rate of active balancing under the condition of !fair && on_rq is 98.4%.
> This scenario does not require the use of stop work, but need to add
> another path to detach attach task(s). It seems not necessary enough to
> add it, Valentin and Vincent have already discussed about it, see [1].
>
> Additionally, sched_class field is a bit far from on_cpu in task_struct.
> The previous traversal of cfs_tasks checks on_cpu in can_migrate_task(),
> so the additional check for on_rq will not incur much cpu cycle loss, due
> to cache locality.
>
> Two reasons why not check sched_class and on_rq of busiest->curr with the
> cpumask_test_cpu() check:
> 1. Let the PATCH not introduce new cases that skip logic for resetting
> balance_interval to min_interval.
> 2. The check of whether busiest cpu has been just triggered active balance
> filters a bit more cases than the check of sched_class and on_rq.
>
> [1]: https://lore.kernel.org/lkml/20190815145107.5318-5-valentin.schneider@xxxxxxx/
>
> Signed-off-by: Xin Zhao <jackzxcui1989@xxxxxxx>

Reviewed-by: Valentin Schneider <vschneid@xxxxxxxxxx>