Re: [PATCH] sched: Skip useless sched_balance_running acquisition if load balance is not due

From: Mel Gorman
Date: Mon Oct 27 2025 - 14:09:01 EST


On Fri, Jun 06, 2025 at 03:51:34PM +0200, Vincent Guittot wrote:
> On Wed, 16 Apr 2025 at 05:51, Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote:
> >
> > At load balance time, balance of last level cache domains and
> > above needs to be serialized. The scheduler checks the atomic var
> > sched_balance_running first and then see if time is due for a load
> > balance. This is an expensive operation as multiple CPUs can attempt
> > sched_balance_running acquisition at the same time.
> >
> > On a 2 socket Granite Rapid systems enabling sub-numa cluster and
> > running OLTP workloads, 7.6% of cpu cycles are spent on cmpxchg of
> > sched_balance_running. Most of the time, a balance attempt is aborted
> > immediately after acquiring sched_balance_running as load balance time
> > is not due.
> >
> > Instead, check balance due time first before acquiring
> > sched_balance_running. This skips many useless acquisitions
> > of sched_balance_running and knocks the 7.6% CPU overhead on
> > sched_balance_domain() down to 0.05%. Throughput of the OLTP workload
> > improved by 11%.
> >
> > Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> > Reported-by: Mohini Narkhede <mohini.narkhede@xxxxxxxxx>
> > Tested-by: Mohini Narkhede <mohini.narkhede@xxxxxxxxx>
>
> Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>

Reviewed-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>

I've been missing for a while and even now on reduced workload so I'm
only looking at this patch now. It was never merged, but why? It looks
like a no-brainer to avoid an atomic operation with minimal effort even
if it only applies to balancing across NUMA domains.

Performance looks better for a small number of workloads on multi-socket
machines including some Zen variants. Most results were neutral which is
not very surprising given the path affected. I made no effort to determine
how hot this particular path is for any of the tested workloads but nothing
obviously superceded this patch or made it irrelevant.

--
Mel Gorman
SUSE Labs