Re: [RFC PATCH] x86/acpi: Ignore invalid x2APIC entries

From: Jim Mattson
Date: Mon Oct 14 2024 - 14:00:41 EST


On Mon, Oct 14, 2024 at 6:05 AM Zhang, Rui <rui.zhang@xxxxxxxxx> wrote:
>
> > > >
> > > > TBH, I'm not sure that there is actually anything wrong with the
> > > > new
> > > > numbering scheme.
> > > > The topology is reported correctly (e.g. in
> > > > /sys/devices/system/cpu/cpu0/topology/thread_siblings_list). Yet,
> > > > the
> > > > new enumeration does seem to contradict user expectations.
> > > >
> > >
> > > Well, we can say this is a violation of the ACPI spec.
> > > "OSPM should initialize processors in the order that they appear in
> > > the
> > > MADT." even for interleaved LAPIC and X2APIC entries.
> >
> > Ah. Thanks. I didn't know that.
> >
> > > Maybe we need two steps for LAPIC/X2APIC parsing.
> > > 1. check if there is valid LAPIC entry by going through all LAPIC
> > > entries first
> > > 2. parse LAPIC/X2APIC strictly following the order in MADT. (like
> > > we do
> > > before)
> >
> > That makes sense to me.
> >
> > Thanks,
> >
> > --jim
>
> Hi, Jim,
>
> Please check if below patch restores the CPU IDs or not.
>
> thanks,
> rui
>
> From ec786dfe693cad2810b54b0d8afbfc7e4c4b3f8a Mon Sep 17 00:00:00 2001
> From: Zhang Rui <rui.zhang@xxxxxxxxx>
> Date: Mon, 14 Oct 2024 13:26:55 +0800
> Subject: [PATCH] x86/acpi: Fix LAPIC/x2APIC parsing order
>
> On some systems, the same CPU (with same APIC ID) is assigned with a
> different logical CPU id after commit ec9aedb2aa1a ("x86/acpi: Ignore
> invalid x2APIC entries").
>
> This means Linux enumerates the CPUs in a different order and it is a
> violation of https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#madt-processor-local-apic-sapic-structure-entry-order,
>
> "OSPM should initialize processors in the order that they appear in
> the MADT"
>
> The offending commit wants to ignore x2APIC entries with APIC ID < 255
> when valid LAPIC entries exist, so it parses all LAPIC entries before
> parsing any x2APIC entries. This breaks the CPU enumeration order for
> systems that have x2APIC entries listed before LAPIC entries in MADT.
>
> Fix the problem by checking the valid LAPIC entries separately, before
> parsing any LAPIC/x2APIC entries.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Reported-by: Jim Mattson <jmattson@xxxxxxxxxx>
> Closes: https://lore.kernel.org/all/20241010213136.668672-1-jmattson@xxxxxxxxxx/
> Fixes: ec9aedb2aa1a ("x86/acpi: Ignore invalid x2APIC entries")
> Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx>
> ---
> arch/x86/kernel/acpi/boot.c | 50 +++++++++++++++++++++++++++++++++----
> 1 file changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 4efecac49863..c70b86f1f295 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -226,6 +226,28 @@ acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end)
> return 0;
> }
>
> +static int __init
> +acpi_check_lapic(union acpi_subtable_headers *header, const unsigned long end)
> +{
> + struct acpi_madt_local_apic *processor = NULL;
> +
> + processor = (struct acpi_madt_local_apic *)header;
> +
> + if (BAD_MADT_ENTRY(processor, end))
> + return -EINVAL;
> +
> + /* Ignore invalid ID */
> + if (processor->id == 0xff)
> + return 0;
> +
> + /* Ignore processors that can not be onlined */
> + if (!acpi_is_processor_usable(processor->lapic_flags))
> + return 0;
> +
> + has_lapic_cpus = true;
> + return 0;
> +}
> +
> static int __init
> acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end)
> {
> @@ -257,7 +279,6 @@ acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end)
> processor->processor_id, /* ACPI ID */
> processor->lapic_flags & ACPI_MADT_ENABLED);
>
> - has_lapic_cpus = true;
> return 0;
> }
>
> @@ -1029,6 +1050,8 @@ static int __init early_acpi_parse_madt_lapic_addr_ovr(void)
> static int __init acpi_parse_madt_lapic_entries(void)
> {
> int count, x2count = 0;
> + struct acpi_subtable_proc madt_proc[2];
> + int ret;
>
> if (!boot_cpu_has(X86_FEATURE_APIC))
> return -ENODEV;
> @@ -1037,10 +1060,27 @@ static int __init acpi_parse_madt_lapic_entries(void)
> acpi_parse_sapic, MAX_LOCAL_APIC);
>
> if (!count) {
> - count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
> - acpi_parse_lapic, MAX_LOCAL_APIC);
> - x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
> - acpi_parse_x2apic, MAX_LOCAL_APIC);

The point is moot now, but I don't think the previous code did the
right thing when acpi_table_parse_madt() returned a negative value
(for errors).

> + /* Check if there are valid LAPIC entries */
> + acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC, acpi_check_lapic, MAX_LOCAL_APIC);

Two comments:

1) Should we check for a return value < 0 here, or just wait for one
of the later walks to error out?
2) It seems unfortunate to walk the entire table when the first entry
may give you the answer, but perhaps modern systems have only X2APIC
entries, so we will typically have to walk the entire table anyway.

Reviewed-and-tested-by: Jim Mattson <jmattson@xxxxxxxxxx>