Re: [tip: x86/urgent] x86/acpi: Ignore invalid x2APIC entries

From: John Sperbeck
Date: Fri Dec 01 2023 - 18:32:49 EST


On Fri, Dec 1, 2023 at 12:32 AM Zhang, Rui <rui.zhang@xxxxxxxxx> wrote:
>
> On Thu, 2023-11-23 at 20:50 +0800, Exchange Online wrote:
> > Hi, John,
> >
> > Thanks for catching this issue.
> >
> > On Wed, 2023-11-22 at 22:19 +0000, John Sperbeck wrote:
> > > I have a platform with both LOCAL_APIC and LOCAL_X2APIC entries for
> > > each CPU. However, the ids for the LOCAL_APIC entries are all
> > > invalid ids of 255, so they have always been skipped in
> > > acpi_parse_lapic()
> > > by this code from f3bf1dbe64b6 ("x86/acpi: Prevent LAPIC id 0xff
> > > from
> > > being
> > > accounted"):
> > >
> > > /* Ignore invalid ID */
> > > if (processor->id == 0xff)
> > > return 0;
> > >
> > > With the change in this thread, the return value of 0 means that
> > > the
> > > 'count' variable in acpi_parse_entries_array() is incremented. The
> > > positive return value means that 'has_lapic_cpus' is set, even
> > > though
> > > no entries were actually matched.
> >
> > So in acpi_parse_madt_lapic_entries, without this patch,
> > madt_proc[0].count is a positive value on this platform, right?
> >
> > This sounds like a potential issue because the following checks to
> > fall
> > back to MPS mode can also break. (If all LOCAL_APIC entries have
> > apic_id 0xff and all LOCAL_X2APIC entries have apic_id 0xffffffff)
> >
> > > Then, when the MADT is iterated
> > > with acpi_parse_x2apic(), the x2apic entries with ids less than 255
> > > are skipped and most of my CPUs aren't recognized.
> > >
> > > I think the original version of this change was okay for this case
> > > in
> > > https://lore.kernel.org/lkml/87pm4bp54z.ffs@tglx/T/
> >
> > Yeah.
> >
> > But if we want to fix the potential issue above, we need to do
> > something more.
> >
> > Say we can still use acpi_table_parse_entries_array() and convert
> > acpi_parse_lapic()/acpi_parse_x2apic() to
> > acpi_subtable_proc.handler_arg and save the real valid entries via
> > the
> > parameter.
> >
> > or can we just use num_processors & disabled_cpus to check if there
> > is
> > any CPU probed when parsing LOCAL_APIC/LOCAL_X2APIC entires?
> >
>
> Hi, John,
>
> As a quick fix, I'm not going to fix the "potential issue" describes
> above because we have not seen a real problem caused by this yet.
>
> Can you please try the below patch to confirm if the problem is gone on
> your system?
> This patch falls back to the previous way as sent at
> https://lore.kernel.org/lkml/87pm4bp54z.ffs@tglx/T/
>
> thanks,
> rui
>
> From bdb45e241b4fea8a12b958e490979e96b064e43d Mon Sep 17 00:00:00 2001
> From: Zhang Rui <rui.zhang@xxxxxxxxx>
> Date: Fri, 1 Dec 2023 15:06:34 +0800
> Subject: [PATCH] x86/acpi: Do strict X2APIC ID check only when an enabled CPU
> is enumerated via LAPIC
>
> Commit 8e9c42d776d6 ("x86/acpi: Ignore invalid x2APIC entries") does
> strict X2APIC ID check if LAPIC contains valid CPUs by checking the
> acpi_table_parse_madt() return value.
>
> This is wrong because acpi_table_parse_madt() return value only
> represents the number of legal entries parsed. For example, LAPIC entry
> with LAPIC ID 0xff is counted as a legal entry, but it doesn't describe
> a valid CPU.
>
> This causes issues on a system which has 0xff LAPIC ID in all LAPIC
> entries. Because the code does strict X2APIC IDs check and ignores most
> of the CPUs in X2APIC entries.
>
> Fix the problem by doing strict X2APIC ID check less aggressively, say
> only when an enabled CPU is enumerated via LAPIC.
>
> Fixes: 8e9c42d776d6 ("x86/acpi: Ignore invalid x2APIC entries")
> Link: https://lore.kernel.org/all/20231122221947.781812-1-jsperbeck@xxxxxxxxxx/
> Reported-by: John Sperbeck <jsperbeck@xxxxxxxxxx>
> Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx>
> ---
> arch/x86/kernel/acpi/boot.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 1a0dd80d81ac..8cc566ce486a 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -266,6 +266,7 @@ static int __init
> acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end)
> {
> struct acpi_madt_local_apic *processor = NULL;
> + int cpu;
>
> processor = (struct acpi_madt_local_apic *)header;
>
> @@ -289,9 +290,13 @@ acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end)
> * to not preallocating memory for all NR_CPUS
> * when we use CPU hotplug.
> */
> - acpi_register_lapic(processor->id, /* APIC ID */
> - processor->processor_id, /* ACPI ID */
> - processor->lapic_flags & ACPI_MADT_ENABLED);
> + cpu = acpi_register_lapic(processor->id, /* APIC ID */
> + processor->processor_id, /* ACPI ID */
> + processor->lapic_flags & ACPI_MADT_ENABLED);
> +
> + /* Do strict X2APIC ID check only when an enabled CPU is enumerated via LAPIC */
> + if (cpu >= 0 )
> + has_lapic_cpus = true;
>
> return 0;
> }
> @@ -1134,7 +1139,6 @@ static int __init acpi_parse_madt_lapic_entries(void)
> if (!count) {
> count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
> acpi_parse_lapic, MAX_LOCAL_APIC);
> - has_lapic_cpus = count > 0;
> x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
> acpi_parse_x2apic, MAX_LOCAL_APIC);
> }
> --
> 2.34.1
>

Yes, with that patch, the problem is gone on my system. All of the
CPUs are recognized and active.

Before the patch (only one CPU active):

# cat /proc/cpuinfo | grep '^processor' | wc -l
1

With the patch (all CPUs active):

oxco8:~# cat /proc/cpuinfo | grep '^processor' | wc -l
112