[PATCH v2 0/8] x86, sched: Dynamic ITMT core ranking support and some yak shaving
From: K Prateek Nayak
Date: Sun Dec 22 2024 - 23:34:50 EST
The ITMT infrastructure currently assumes ITMT rankings to be static and
is set correctly prior to enabling ITMT support which allows the CPU
with the highest core ranking to be cached as the "asym_prefer_cpu" in
the sched_group struct. However, with the introduction of Preferred Core
support in amd-pstate, these rankings can change at runtime.
v1: https://lore.kernel.org/lkml/20241211185552.4553-1-kprateek.nayak@xxxxxxx/
Tim confirmed that that ITMT changes will not alter the behavior of
Intel Systems that contains multiple MC groups in a PKG domain and
support ITMT - both current and future ones.
Patch 8 uncaches the asym_prefer_cpu from the sched_group struct and
finds it during load balancing in update_sg_lb_stats() before it is used
to make any scheduling decisions. This is the simplest approach; an
alternate approach would be to move the asym_prefer_cpu to
sched_domain_shared and allow the first load balancing instance post a
priority change to update the cached asym_prefer_cpu. On systems with
static priorities, this would allow benefits of caching while on systems
with dynamic priorities, it'll reduce the overhead of finding
"asym_prefer_cpu" each time update_sg_lb_stats() is called however the
benefits come with added code complexity which is why Patch 8 is marked
as an RFC. Srikanth confirmed it works as expected on a PowerPC VM
however, there are no comments yet on the performance impact which is
expected to be minimal if any since update_sglb_stats() is in load
balancing slow-path.
One notable comment that has not been addressed since v1 is moving of
overutilized status to below idle CPU check. On an idle CPU, since there
are no UCLAMP constraints, the cpu_overutilized() boils down to:
!fits_capacity(cpu_util_cfs(cpu), capacity_of(cpu))
But the averages can capture blocked averages and capacity_of(cpu) can
depends on arch_scale_cpu_capacity(). I couldn't say for sure with 100%
confident that an idle CPU cannot appear overutilized as a result of
blocked averages. find_energy_efficient_cpu() does not look at
idle_cpu() and only performs search purely based on utilization and
capacity - an idle CPu that may appear overutilized will be still
skipped in this search path. If there are no concerns, that update can
be moved to below idle_cpu() check too in Patch 6 as suggested by
Srikanth.
This series is based on
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
at commit af98d8a36a96 ("sched/fair: Fix CPU bandwidth limit bypass
during CPU hotplug") and is a spiritual successor to a previous attempt
at fixing the x86_die_flags() on Preferred Core enabled system by Mario
that can be found at
https://lore.kernel.org/lkml/20241203201129.31957-1-mario.limonciello@xxxxxxx/
---
v1..v2:
- Collected tags from Tim, Vincent, and Srikanth.
- Modified the layout of struct sg_lb_stats to keep all fields
concerning SD_ASYM_PACKING together (Srikanth)
- Modified commit message of debugfs move to highlight "N"/"0" can be
used to disable the feature and "Y"/"1" can be used to enable it back
(Tim, Peter)
---
K Prateek Nayak (8):
x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
x86/itmt: Use guard() for itmt_update_mutex
x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
sched/fair: Do not compute NUMA Balancing stats unnecessarily during
lb
sched/fair: Do not compute overloaded status unnecessarily during lb
sched/fair: Uncache asym_prefer_cpu and find it during
update_sd_lb_stats()
arch/x86/include/asm/topology.h | 4 +-
arch/x86/kernel/itmt.c | 81 ++++++++++++++-------------------
arch/x86/kernel/smpboot.c | 19 +-------
kernel/sched/fair.c | 42 +++++++++++++----
kernel/sched/sched.h | 1 -
kernel/sched/topology.c | 15 +-----
6 files changed, 70 insertions(+), 92 deletions(-)
base-commit: af98d8a36a963e758e84266d152b92c7b51d4ecb
--
2.43.0