Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity
From: Balbir Singh
Date: Sat Mar 28 2026 - 09:03:54 EST
On 3/27/26 02:02, Andrea Righi wrote:
> This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by
> introducing SMT awareness.
>
> = Problem =
>
> Nominal per-logical-CPU capacity can overstate usable compute when an SMT
> sibling is busy, because the physical core doesn't deliver its full nominal
> capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs
> that are not actually good destinations.
>
> = Proposed Solution =
>
> This patch set aligns those paths with a simple rule already used
> elsewhere: when SMT is active, prefer fully idle cores and avoid treating
> partially idle SMT siblings as full-capacity targets where that would
> mislead load balance.
In kernel/sched/topology.c
/* Don't attempt to spread across CPUs of different capacities. */
if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
sd->child->flags &= ~SD_PREFER_SIBLING;
Should handle the selection, but I guess this does not work for SMT level sd's?
>
> Patch set summary:
>
> - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
>
> Prefer fully-idle SMT cores in asym-capacity idle selection. In the
> wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so
> idle selection can prefer CPUs on fully idle cores, with a safe fallback.
>
> - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
>
> Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY.
> Provided for consistency with PATCH 1/4.
>
> - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
>
> Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for
> consistency with PATCH 1/4. I've also tested with/without
> /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't
> noticed any regression.
>
> - [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
>
> When choosing the housekeeping CPU that runs the idle load balancer,
> prefer an idle CPU on a fully idle core so migrated work lands where
> effective capacity is available.
>
> The change is still consistent with the same "avoid CPUs with busy
> sibling" logic and it shows some benefits on Vera, but could have
> negative impact on other systems, I'm including it for completeness
> (feedback is appreciated).
>
> This patch set has been tested on the new NVIDIA Vera Rubin platform, where
> SMT is enabled and the firmware exposes small frequency variations (+/-~5%)
> as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set.
>
Are you referring to nominal_freq?
> Without these patches, performance can drop up to ~2x with CPU-intensive
> workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not
> account for busy SMT siblings.
>
> Alternative approaches have been evaluated, such as equalizing CPU
> capacities, either by exposing uniform values via firmware (ACPI/CPPC) or
> normalizing them in the kernel by grouping CPUs within a small capacity
> window (+-5%) [1][2], or enabling asympacking [3].
>
> However, adding SMT awareness to SD_ASYM_CPUCAPACITY has shown better
> results so far. Improving this policy also seems worthwhile in general, as
> other platforms in the future may enable SMT with asymmetric CPU
> topologies.
>
> [1] https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@xxxxxxxxxx
> [2] https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@xxxxxxxxxx
> [3] https://lore.kernel.org/all/20260325181314.3875909-1-christian.loehle@xxxxxxx/
>
> Andrea Righi (4):
> sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
> sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
> sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
> sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
>
> kernel/sched/fair.c | 163 +++++++++++++++++++++++++++++++++++++++++++-----
> kernel/sched/topology.c | 9 ---
> 2 files changed, 147 insertions(+), 25 deletions(-)
Thanks,
Balbir