Re: [PATCH] x86/cpuid: Deal with broken firmware once more

From: Charles (Chas) Williams
Date: Wed Nov 09 2016 - 13:16:39 EST


On 11/09/2016 10:35 AM, Thomas Gleixner wrote:
Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map.

There exist Virtualbox and Xen implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. If the physical
package IDs differ use the package information from the ACPI/MP tables so
the existing logical package map just works.

Reported-by: "Charles (Chas) Williams" <ciwillia@xxxxxxxxxxx>,
Reported-by: M. Vefa Bicakci <m.v.b@xxxxxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>


For 4 virtual sockets, 1 core per socket VM:

[ 0.235459] .... node #0, CPUs: #1
[ 0.238579] Disabled fast string operations
[ 0.238620] mce: CPU supports 0 MCE banks
[ 0.238864] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[ 0.238878] [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
[ 0.239502] #2
[ 0.241298] Disabled fast string operations
[ 0.241356] mce: CPU supports 0 MCE banks
[ 0.241429] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 2 CPUID: 4
[ 0.241431] [Firmware Bug]: CPU2: Using firmware package id 2 instead of 4
[ 0.241631] #3
[ 0.244075] Disabled fast string operations
[ 0.244112] mce: CPU supports 0 MCE banks
[ 0.244284] [Firmware Bug]: CPU3: APIC id mismatch. Firmware: 3 CPUID: 6
[ 0.244293] [Firmware Bug]: CPU3: Using firmware package id 3 instead of 6

For a 2 virtual sockets, 2 cores per socket, VMware seems to get its
APIC table correct as this fixup code isn't triggered. The mapping looks like:

[ 0.028911] topology_update_package_map: apicid 0 pkg 0 cpu 0
[ 0.029068] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0
[ 0.029072] topology_update_package_map: apicid 1 pkg 0 cpu 1
[ 0.029220] topology_update_package_map: apicid 2 pkg 1 cpu 2
[ 0.029376] smpboot: APIC(2) Converting physical 1 to logical package 1, cpu 2
[ 0.029381] topology_update_package_map: apicid 3 pkg 1 cpu 3
[ 0.029525] smpboot: Max logical packages: 2

For a VM with 1 virtual socket and 4 cores, the behavior is again correct.

[ 0.016198] topology_update_package_map: apicid 0 pkg 0 cpu 0
[ 0.016271] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040)
[ 0.016273] topology_update_package_map: apicid 1 pkg 0 cpu 1
[ 0.016336] topology_update_package_map: apicid 2 pkg 0 cpu 2
[ 0.016397] topology_update_package_map: apicid 3 pkg 0 cpu 3

It looks like VMware might have some assumption about the minimum number
of cores on a virtual socket. Regardless, the fix solves my problem!

Thanks!