Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity
From: Shrikanth Hegde
Date: Fri Mar 27 2026 - 12:41:29 EST
Hi Andrea.
On 3/26/26 8:32 PM, Andrea Righi wrote:
This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by
introducing SMT awareness.
= Problem =
Nominal per-logical-CPU capacity can overstate usable compute when an SMT
sibling is busy, because the physical core doesn't deliver its full nominal
capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs
that are not actually good destinations.
How does energy model define the opp for SMT?
SMT systems have multiple of different functional blocks, a few ALU(arithmetic),
LSU(load store unit) etc. If same/similar workload runs on sibling, it would affect the
performance, but sibling is using different functional blocks, then it would
not.
So underlying actual CPU Capacity of each thread depends on what each sibling is running.
I don't understand how does the firmware/energy models define this.
= Proposed Solution =
This patch set aligns those paths with a simple rule already used
elsewhere: when SMT is active, prefer fully idle cores and avoid treating
partially idle SMT siblings as full-capacity targets where that would
mislead load balance.
Patch set summary:
- [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
Prefer fully-idle SMT cores in asym-capacity idle selection. In the
wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so
idle selection can prefer CPUs on fully idle cores, with a safe fallback.
- [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY.
Provided for consistency with PATCH 1/4.
- [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for
consistency with PATCH 1/4. I've also tested with/without
/proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't
noticed any regression.
- [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
When choosing the housekeeping CPU that runs the idle load balancer,
prefer an idle CPU on a fully idle core so migrated work lands where
effective capacity is available.
The change is still consistent with the same "avoid CPUs with busy
sibling" logic and it shows some benefits on Vera, but could have
negative impact on other systems, I'm including it for completeness
(feedback is appreciated).
This patch set has been tested on the new NVIDIA Vera Rubin platform, where
SMT is enabled and the firmware exposes small frequency variations (+/-~5%)
as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set.
I assume the CPU_CAPACITY values fixed?
first sibling has max, while other has less?
Without these patches, performance can drop up to ~2x with CPU-intensive
workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not
account for busy SMT siblings.
How is the performance measured here? Which benchmark?
By any chance you are running number_running_task <= (nr_cpus / smt_threads_per_core),
so it is all fitting nicely?
If you increase those numbers, how does the performance numbers compare?
Also, whats the system is like? SMT level?
Alternative approaches have been evaluated, such as equalizing CPU
capacities, either by exposing uniform values via firmware (ACPI/CPPC) or
normalizing them in the kernel by grouping CPUs within a small capacity
window (+-5%) [1][2], or enabling asympacking [3].
However, adding SMT awareness to SD_ASYM_CPUCAPACITY has shown better
results so far. Improving this policy also seems worthwhile in general, as
other platforms in the future may enable SMT with asymmetric CPU
topologies.
[1] https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@xxxxxxxxxx
[2] https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@xxxxxxxxxx
[3] https://lore.kernel.org/all/20260325181314.3875909-1-christian.loehle@xxxxxxx/
Andrea Righi (4):
sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
kernel/sched/fair.c | 163 +++++++++++++++++++++++++++++++++++++++++++-----
kernel/sched/topology.c | 9 ---
2 files changed, 147 insertions(+), 25 deletions(-)