RE: [patch v2 21/38] x86/cpu: Provide cpu_init/parse_topology()

From: Michael Kelley (LINUX)
Date: Wed Aug 02 2023 - 10:43:33 EST


From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Sent: Tuesday, August 1, 2023 3:25 PM
>
> Michael!
>
> On Tue, Aug 01 2023 at 00:12, Thomas Gleixner wrote:
> > On Mon, Jul 31 2023 at 21:27, Michael Kelley wrote:
> > Clearly the hyper-v BIOS people put a lot of thoughts into this:
> >
> >> x2APIC features / processor topology (0xb):
> >> extended APIC ID = 0
> >> --- level 0 ---
> >> level number = 0x0 (0)
> >> level type = thread (1)
> >> bit width of level = 0x1 (1)
> >> number of logical processors at level = 0x2 (2)
> >> --- level 1 ---
> >> level number = 0x1 (1)
> >> level type = core (2)
> >> bit width of level = 0x6 (6)
> >> number of logical processors at level = 0x40 (64)
> >
> > FAIL: ^^^^^
> >
> > While that field is not meant for topology evaluation it is at least
> > expected to tell the actual number of logical processors at that level
> > which are actually available.
> >
> > The CPUID APIC ID aka initial_apicid clearly tells that the topology has
> > 36 logical CPUs in package 0 and 36 in package 1 according to your
> > table.
> >
> > On real hardware this looks like this:
> >
> > --- level 1 ---
> > level number = 0x1 (1)
> > level type = core (2)
> > bit width of level = 0x6 (6)
> > number of logical processors at level = 0x38 (56)
> >
> > Which corresponds to reality and is consistent. But sure, consistency is
> > overrated.
>
> So I looked really hard to find some hint how to detect this situation
> on the boot CPU, which allows us to mitigate it, but there is none at
> all.
>
> So we are caught between a rock and a hard place, which provides us two
> mutually exclusive options to chose from:
>
> 1) Have a sane topology evaluation mechanism which solves the known
> problems of hybrid systems, wrong sizing estimates and other
> unpleasantries.
>
> 2) Support the Hyper-V BIOS trainwreck forever.
>
> Unsurprisingly #2 is not really an option as #1 is a crucial issue for
> the kernel and we need it resolved urgently as of yesterday.
>
> So while I'm definitely a strong supporter of no-regression policy, I
> have to make an argument here why this particular issue is _not_
> covered:
>
> 1) Hyper-V BIOS/firmware violates the firmware specification and
> requirements which are clearly spelled out in the SDM.
>
> 2) This violatation is reported on every boot with one promiment
> message per brought up AP where the initial APIC ID as provided by
> CPUID leaf 0xB deviates from the APIC ID read from "hardware", which is
> also provided by MADT starting with CPU 36 in the provided example:
>
> "[FIRMWARE BUG] CPU36: APIC id mismatch. Firmware: 40 APIC: 24"
>
> repeating itself up to CPU71 with the relevant diverging APIC IDs.
>
> At least that's what the upstream kernel produces according to
> validate_apic_and_package_id() in such an situation.
>
> 3) This is known for years and the Hyper-V Linux team tried to get this
> resolved, but obviously their arguments fell on deaf ears.
>
> IOW, the firmware BUG message has been ignored willfully for years
> due to "works for me, why should I care?" attitude.
>
> Seriously, kernel development cannot be held hostage forever by the
> wilful ignorance of a BIOS team, which refuses to adhere to
> specifications and defines their own world order.
>
> The x86 maintainer team is chosing the lesser of two evils and lets
> those who created the problem and refused to resolve it deal with the
> outcome.

Fair enough. I don't have any basis to argue otherwise. I'm in
discussions with the Hyper-V team about getting it fully fixed in
Hyper-V, and it looks like there's some movement to make it happen.

>
> Just to clarify. This is not preventing affected guests from booting.
> The worst consequence is a slight performance regression because the
> firmware provided topology information is not matching reality and
> therefore the scheduler placement vs. L3 affinity sucks. That's clearly
> not a kernel problem.

Yes, if Linux will still boots and runs, that helps. Then it really is up the
(virtual) firmware in Hyper-V to provide the correct topology information
so performance is as expected.

>
> I'm happy to aid accelerating this thought process by elevating the
> existing pr_err(FW_BUG....) to a solid WARN_ON_ONCE(). See below.

Your choice. In this particular case, it won't make a difference either
way.

Michael

>
> Thanks,
>
> tglx
> ---
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1688,7 +1688,7 @@ static void validate_apic_and_package_id
>
> apicid = apic->cpu_present_to_apicid(cpu);
>
> - if (apicid != c->topo.apicid) {
> + if (WARN_ON_ONCE(apicid != c->topo.apicid)) {
> pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: %x\n",
> cpu, apicid, c->topo.initial_apicid);
> }