Re: [PATCH v4] sched/fair: Skip sched_balance_running cmpxchg when balance is not due
From: Peter Zijlstra
Date: Wed Nov 12 2025 - 08:44:47 EST
On Wed, Nov 12, 2025 at 02:39:37PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 12, 2025 at 04:55:48PM +0530, Srikar Dronamraju wrote:
>
> > If the CPU that was doing the balance was not the first CPU of the domain
> > span, but it was doing the balance since the first CPU was busy, and the
> > first CPU now happens to be idle at redo, the scheduler would have chosen the
> > first CPU to do the balance. However it will now choose the CPU that had the atomic..
> >
> > I think this is better because
> > - The first CPU may have tried just before this CPU dropped the atomic and
> > hence we may miss the balance opportunity.
> > - The first CPU and the other CPU may not be sharing cache and hence there
> > may be a cache-miss, which we are avoiding by doing this.
>
> I'm not sure I understand what you're arguing for. Are you saying it
> would be better to retain the lock where possible?
FWIW this is also the behaviour we had before this patch, where the lock
is taken in the caller, it would have covered the whole redo case as
well.
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11717,23 +11717,22 @@ static int sched_balance_rq(int this_cpu
> .fbq_type = all,
> .tasks = LIST_HEAD_INIT(env.tasks),
> };
> - bool need_unlock;
> + bool need_unlock = false;
>
> cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>
> schedstat_inc(sd->lb_count[idle]);
>
> redo:
> - need_unlock = false;
> if (!should_we_balance(&env)) {
> *continue_balancing = 0;
> goto out_balanced;
> }
>
> - if (sd->flags & SD_SERIALIZE) {
> - if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1)) {
> + if (!need_unlock && sd->flags & SD_SERIALIZE) {
> + if (!atomic_try_cmpxchg_acquire(&sched_balance_running, 0, 1))
> goto out_balanced;
> - }
> +
> need_unlock = true;
> }
>
> @@ -11861,9 +11860,6 @@ static int sched_balance_rq(int this_cpu
> if (!cpumask_subset(cpus, env.dst_grpmask)) {
> env.loop = 0;
> env.loop_break = SCHED_NR_MIGRATE_BREAK;
> - if (need_unlock)
> - atomic_set_release(&sched_balance_running, 0);
> -
> goto redo;
> }
> goto out_all_pinned;