[PATCH -next] arch_topology: Fix cache attributes detection in the CPU hotplug path

From: Sudeep Holla
Date: Wed Jul 13 2022 - 09:33:57 EST


init_cpu_topology() is called only once at the boot and all the cache
attributes are detected early for all the possible CPUs. However when
the CPUs are hotplugged out, the cacheinfo gets removed. While the
attributes are added back when the CPUs are hotplugged back in as part
of CPU hotplug state machine, it ends up called quite late after the
update_siblings_masks() are called in the secondary_start_kernel()
resulting in wrong llc_sibling_masks.

Move the call to detect_cache_attributes() inside update_siblings_masks()
to ensure the cacheinfo is updated before the LLC sibling masks are
updated. This will fix the incorrect LLC sibling masks generated when
the CPUs are hotplugged out and hotplugged back in again.

Reported-by: Ionela Voinescu <ionela.voinescu@xxxxxxx>
Signed-off-by: Sudeep Holla <sudeep.holla@xxxxxxx>
---
drivers/base/arch_topology.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)

Hi Conor,

Ionela reported an issue with the CPU hotplug and as a fix I need to
move the call to detect_cache_attributes() which I had thought to keep
it there from first but for no reason had moved it to init_cpu_topology().

Wonder if this fixes the -ENOMEM on RISC-V as this one is called on the
cpu in the secondary CPUs init path while init_cpu_topology executed
detect_cache_attributes() for all possible CPUs much earlier. I think
this might help as the percpu memory might be initialised in this case.

Anyways give this a try, also test the CPU hotplug and check if nothing
is broken on RISC-V. We noticed this bug only on one platform while

Regards,
Sudeep

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 441e14ac33a4..0424b59b695e 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -732,7 +732,11 @@ const struct cpumask *cpu_clustergroup_mask(int cpu)
void update_siblings_masks(unsigned int cpuid)
{
struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
- int cpu;
+ int cpu, ret;
+
+ ret = detect_cache_attributes(cpuid);
+ if (ret)
+ pr_info("Early cacheinfo failed, ret = %d\n", ret);
/* update core and thread sibling masks */
for_each_online_cpu(cpu) {
@@ -821,7 +825,7 @@ __weak int __init parse_acpi_topology(void)
#if defined(CONFIG_ARM64) || defined(CONFIG_RISCV)
void __init init_cpu_topology(void)
{
- int ret, cpu;
+ int ret;
reset_cpu_topology();
ret = parse_acpi_topology();
@@ -836,13 +840,5 @@ void __init init_cpu_topology(void)
reset_cpu_topology();
return;
}
-
- for_each_possible_cpu(cpu) {
- ret = detect_cache_attributes(cpu);
- if (ret) {
- pr_info("Early cacheinfo failed, ret = %d\n", ret);
- break;
- }
- }
}
#endif
--2.37.1