[PATCH v6 0/4] x86/cacheinfo: Set the number of leaves per CPU
From: Ricardo Neri
Date: Thu Sep 05 2024 - 01:54:57 EST
Hi,
The interface /sys/devices/system/cpu/cpuX/cache is broken (not populated)
if CPUs have different numbers of subleaves in CPUID 4. This is the case
of Intel Meteor Lake, which now is out in the world. Tools that rely on
sysfs (e.g., lstopo) fail.
Patches 3 and 4 fix the described issue on Meteor Lake. Patches 1 and 2
are prework in the cacheinfo base driver and fix issues uncovered while
updating cacheinfo for x86.
This is v5 of a patchset to fix the cache sysfs interface by setting the
number of cache leaves independently for each CPU. It includes a cosmetic
change and Reviewed-by tags from Andreas and Nikolay as well as Tested-by
tags from Andreas.
Previous versions can be found in [1], [2], [3], [4], and [5].
All the tests described in detail in [6] and [7] passed. This is the
summary:
* /sys/devices/system/cpu/cpuX/cache is populated in Meteor Lake.
* No inconsistencies are found in /sys/devices/system/cpu/cpuX/cache
and the tools x86info, lstopo, and lscpu.
* No splat is observed with and without CONFIG_PREEMPT_RT.
* No new warnings/errors are seen the kernel log.
* Tests done on assorted Intel and AMD client and server parts.
Changes since v5:
* Reordered the arguments of set_num_cache_leaves().
* Fixed wording on the subject of patch 2.
* Added Reviewed-by tags from Andreas and Nikolay. Thanks!
* Added Tested-by tags from Andreas. Thanks!
Changes since v4:
* Combined two condition checks into one line. (Sudeep)
* Added one more Reviewed-by tag from Sudeep. Thanks!
Changes since v3:
* Fixed another NULL-pointer dereference when checking the validity of
the last-level cache info.
* Added the Reviewed-by tags from Radu and Sudeep. Thanks!
* Rebased on v6.7-rc5.
Changes since v2:
* This version uncovered a NULL-pointer dereference in recent changes to
cacheinfo[8]. This dereference is observed when the system does not
configure cacheinfo early during boot nor makes corrections later
during CPU hotplug; as is the case in x86. Patch 1 fixes this issue.
Changes since v1:
* Dave Hansen suggested to use the existing per-CPU ci_cpu_cacheinfo
variable. Now the global variable num_cache_leaves became useless.
* While here, I noticed that init_cache_level() also became useless:
x86 does not need ci_cpu_cacheinfo::num_levels.
Thanks and BR,
Ricardo
[1]. https://lore.kernel.org/lkml/20230314231658.30169-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
[2]. https://lore.kernel.org/all/20230424001956.21434-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
[3]. https://lore.kernel.org/lkml/20230805012421.7002-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
[4]. https://lore.kernel.org/all/20231212222519.12834-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
[5]. https://lore.kernel.org/all/20240827051635.9114-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
[6]. https://lore.kernel.org/lkml/20230912032350.GA17008@xxxxxxxxxxxxxxxxxxxxxxxxx/
[7]. https://lore.kernel.org/all/20240902074140.GA4179@alberich/
[8]. https://lore.kernel.org/all/20230412185759.755408-1-rrendec@xxxxxxxxxx/
Ricardo Neri (4):
cacheinfo: Check for null last-level cache info
cacheinfo: Allocate memory during CPU hotplug if not done from the
primary CPU
x86/cacheinfo: Delete global num_cache_leaves
x86/cacheinfo: Clean out init_cache_level()
arch/x86/kernel/cpu/cacheinfo.c | 49 +++++++++++++++++----------------
drivers/base/cacheinfo.c | 8 ++++--
2 files changed, 32 insertions(+), 25 deletions(-)
--
2.34.1