Re: [RFC PATCH] sched/fair: scale wake_wide() threshold by SMT width
From: Shrikanth Hegde
Date: Tue Apr 07 2026 - 13:59:24 EST
Hi.
On 4/7/26 12:09 PM, Zhang Qiao wrote:
wake_wide() uses sd_llc_size as the spreading threshold to detect wide
waker/wakee relationships and to disable wake_affine() for those cases.
On SMT systems, sd_llc_size counts logical CPUs rather than physical
cores. This inflates the wake_wide() threshold, allowing wake_affine()
to pack more tasks into one LLC domain than the actual compute capacity
of its physical cores can sustain. The resulting SMT interference may
cost more than the cache-locality benefit wake_affine() intends to gain.
Isn't load balance to move it out? What does the workload do?
Scale the factor by the SMT width of the current CPU so that it
approximates the number of independent physical cores in the LLC domain,
making wake_wide() more likely to kick in before SMT interference
becomes significant. On non-SMT systems the SMT width is 1 and behaviour
is unchanged.
There are systems where LLC_SIZE == SMT_SIZE. i.e one core in the LLC.
This would effectively disable wake_affine feature in such systems.
Power10 being a major example.
Signed-off-by: Zhang Qiao <zhangqiao22@xxxxxxxxxx>
---
kernel/sched/fair.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f07df8987a5ef..4896582c6e904 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7334,6 +7334,11 @@ static int wake_wide(struct task_struct *p)
unsigned int slave = p->wakee_flips;
int factor = __this_cpu_read(sd_llc_size);
+ /* Scale factor to physical-core count to account for SMT interference. */
+ if (sched_smt_active())
+ factor = DIV_ROUND_UP(factor,
+ cpumask_weight(cpu_smt_mask(smp_processor_id())));
+
if (master < slave)
swap(master, slave);
if (slave < factor || master < slave * factor)