Re: [PATCH v4] sched/fair: Skip sched_balance_running cmpxchg when balance is not due
From: Shrikanth Hegde
Date: Tue Nov 11 2025 - 01:26:00 EST
Hi Tim,
On 11/11/25 12:17 AM, Tim Chen wrote:
The NUMA sched domain sets the SD_SERIALIZE flag by default, allowingLooks good to me. Thanks for getting this into current shape.
only one NUMA load balancing operation to run system-wide at a time.
Currently, each sched group leader directly under NUMA domain attempts
to acquire the global sched_balance_running flag via cmpxchg() before
checking whether load balancing is due or whether it is the designated
load balancer for that NUMA domain. On systems with a large number
of cores, this causes significant cache contention on the shared
sched_balance_running flag.
This patch reduces unnecessary cmpxchg() operations by first checking
that the balancer is the designated leader for a NUMA domain from
should_we_balance(), and the balance interval has expired before
trying to acquire sched_balance_running to load balance a NUMA
domain.
On a 2-socket Granite Rapids system with sub-NUMA clustering enabled,
running an OLTP workload, 7.8% of total CPU cycles were previously spent
in sched_balance_domain() contending on sched_balance_running before
this change.
I see hackbench improving slightly across its variations. So,
Tested-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>