Re: [tip: x86/urgent] x86/acpi: Ignore invalid x2APIC entries
From: Thomas Gleixner
Date: Tue Dec 12 2023 - 12:34:57 EST
On Thu, Nov 23 2023 at 12:50, Rui Zhang wrote:
> On Wed, 2023-11-22 at 22:19 +0000, John Sperbeck wrote:
>> I have a platform with both LOCAL_APIC and LOCAL_X2APIC entries for
>> each CPU. However, the ids for the LOCAL_APIC entries are all
>> invalid ids of 255, so they have always been skipped in
>> acpi_parse_lapic()
>> by this code from f3bf1dbe64b6 ("x86/acpi: Prevent LAPIC id 0xff from
>> being
>> accounted"):
>>
>> /* Ignore invalid ID */
>> if (processor->id == 0xff)
>> return 0;
>>
>> With the change in this thread, the return value of 0 means that the
>> 'count' variable in acpi_parse_entries_array() is incremented. The
>> positive return value means that 'has_lapic_cpus' is set, even though
>> no entries were actually matched.
>
> So in acpi_parse_madt_lapic_entries, without this patch,
> madt_proc[0].count is a positive value on this platform, right?
>
> This sounds like a potential issue because the following checks to fall
> back to MPS mode can also break. (If all LOCAL_APIC entries have
> apic_id 0xff and all LOCAL_X2APIC entries have apic_id 0xffffffff)
>
>> Then, when the MADT is iterated
>> with acpi_parse_x2apic(), the x2apic entries with ids less than 255
>> are skipped and most of my CPUs aren't recognized.
>>
>> I think the original version of this change was okay for this case in
>> https://lore.kernel.org/lkml/87pm4bp54z.ffs@tglx/T/
>
> Yeah.
>
> But if we want to fix the potential issue above, we need to do
> something more.
>
> Say we can still use acpi_table_parse_entries_array() and convert
> acpi_parse_lapic()/acpi_parse_x2apic() to
> acpi_subtable_proc.handler_arg and save the real valid entries via the
> parameter.
Nah.
> or can we just use num_processors & disabled_cpus to check if there is
> any CPU probed when parsing LOCAL_APIC/LOCAL_X2APIC entires?
No, we are not going to do that because that's just a proliferation of
boundary violations.
Let ACPI deal with it's own problems and not depend on something which
is subject to change.
The simple change below should do the trick.
Thanks,
tglx
---
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 1a0dd80d81ac..85a3ce2a3666 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -293,6 +293,7 @@ acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end)
processor->processor_id, /* ACPI ID */
processor->lapic_flags & ACPI_MADT_ENABLED);
+ has_lapic_cpus = true;
return 0;
}
@@ -1134,7 +1135,6 @@ static int __init acpi_parse_madt_lapic_entries(void)
if (!count) {
count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC,
acpi_parse_lapic, MAX_LOCAL_APIC);
- has_lapic_cpus = count > 0;
x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC,
acpi_parse_x2apic, MAX_LOCAL_APIC);
}