[PATCH 0/8] x86, sched: Dynamic ITMT core ranking support and some yak shaving

From: K Prateek Nayak
Date: Wed Dec 11 2024 - 14:14:21 EST


The ITMT infrastructure currently assumes ITMT rankings to be static and
is set correctly prior to enabling ITMT support which allows the CPU
with the highest core ranking to be cached as the "asym_prefer_cpu" in
the sched_group struct. However, with the introduction of Preferred Core
support in amd-pstate, these rankings can change at runtime.

This series adds support for dynamic ranking in generic scheduler layer
without the need to rebuild the sched domain hierarchy and fixes an
issue with x86_die_flags() on AMD systems that support Preferred Core
ranking with some yak shaving done along the way.

Patch 1 to 4 are independent cleanup around ITMT infrastructure, removal
of x86_smt_flags wrapper, and moving the "sched_itmt_enabled" sysctl to
debugfs.

Patch 5 adds the SD_ASYM_PACKING flag to the PKG domain on all ITMT
enabled systems. The rationale behind the addition is elaborates in the
same. One open question remains is for Intel processors with multiple
Tiles in a PKG which advertises itself as multiple LLCs in a PKG and
supports ITMT - is it okay to set SD_ASYM_PACKING for PKG domain on
these processors?

Patch 6 and 7 are independent possible micro-optimizations discovered
when auditing update_sg_lb_stats()

Patch 8 uncaches the asym_prefer_cpu from the sched_group struct and
finds it during load balancing in update_sg_lb_stats() before it is used
to make any scheduling decisions. This is the simplest approach; an
alternate approach would be to move the asym_prefer_cpu to
sched_domain_shared and allow the first load balancing instance post a
priority change to update the cached asym_prefer_cpu. On systems with
static priorities, this would allow benefits of caching while on systems
with dynamic priorities, it'll reduce the overhead of finding
"asym_prefer_cpu" each time update_sg_lb_stats() is called however the
benefits come with added code complexity which is why Patch 8 is marked
as an RFC.

This series is based on

git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core

at commit 2a77e4be12cb ("sched/fair: Untangle NEXT_BUDDY and
pick_next_task()") and is a spiritual successor to a previous attempt
at fixing the x86_die_flags() on Preferred Core enabled system by Mario
that can be found at
https://lore.kernel.org/lkml/20241203201129.31957-1-mario.limonciello@xxxxxxx/

---
K Prateek Nayak (8):
x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
x86/itmt: Use guard() for itmt_update_mutex
x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
sched/fair: Do not compute NUMA Balancing stats unnecessarily during
lb
sched/fair: Do not compute overloaded status unnecessarily during lb
sched/fair: Uncache asym_prefer_cpu and find it during
update_sd_lb_stats()

arch/x86/include/asm/topology.h | 4 +-
arch/x86/kernel/itmt.c | 81 ++++++++++++++-------------------
arch/x86/kernel/smpboot.c | 19 +-------
kernel/sched/fair.c | 41 +++++++++++++----
kernel/sched/sched.h | 1 -
kernel/sched/topology.c | 15 +-----
6 files changed, 69 insertions(+), 92 deletions(-)


base-commit: 2a77e4be12cb58bbf774e7c717c8bb80e128b7a4
--
2.34.1