Re: Kernel 6.9 regression: X86: Bogus messages from topology detection

From: Thomas Gleixner
Date: Thu May 30 2024 - 04:30:24 EST


Peter!

On Mon, May 27 2024 at 23:15, Peter Schneider wrote:

Thanks for providing all the information!

> I want to add one thing: there is a log entry in the dmesg output of a "bad" kernel, which
> I initially overlooked, because it is way up, and I noticed this just now. I guess this
> might be relevant:
>
> [ 1.683564] [Firmware Bug]: CPU0: Topology domain 0 shift 1 != 5

Yes. That's absolutely related. I can see what goes wrong, but I have
absolutely no idea how that happens.

Can you please apply the debug patch below ad provide the full dmesg
after boot?

Thanks,

tglx
---
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -65,6 +65,7 @@ static void parse_legacy(struct topo_sca
cores <<= smt_shift;
}

+ pr_info("Legacy: %u %u %u\n", c->cpuid_level, smt_shift, core_shift);
topology_set_dom(tscan, TOPO_SMT_DOMAIN, smt_shift, 1U << smt_shift);
topology_set_dom(tscan, TOPO_CORE_DOMAIN, core_shift, cores);
}
--- a/arch/x86/kernel/cpu/topology_ext.c
+++ b/arch/x86/kernel/cpu/topology_ext.c
@@ -72,6 +72,9 @@ static inline bool topo_subleaf(struct t

cpuid_subleaf(leaf, subleaf, &sl);

+ pr_info("L:%0x %0x %0x S:%u N:%u T:%u\n", leaf, subleaf, sl.level, sl.x2apic_shift,
+ sl.num_processors, sl.type);
+
if (!sl.num_processors || sl.type == INVALID_TYPE)
return false;

@@ -97,6 +100,7 @@ static inline bool topo_subleaf(struct t
leaf, subleaf, tscan->c->topo.initial_apicid, sl.x2apic_id);
}

+ pr_info("D: %u\n", dom);
topology_set_dom(tscan, dom, sl.x2apic_shift, sl.num_processors);
return true;
}