Re: AMD topology broken on various 754/AM2+/AM3/AM3+ systems causes NB/EDAC/GART regression since 6.14

From: Yazen Ghannam
Date: Fri Oct 24 2025 - 17:32:56 EST


On Fri, Oct 24, 2025 at 08:46:58PM +0200, Michal Pecio wrote:
> Hi,
>
> This report is related to discussion here:
> https://lore.kernel.org/all/20251022011610.60d0ba6e.michal.pecio@xxxxxxxxx/
>
> Commit bc7b2e629e0c ("x86/amd_nb: Use topology info to get AMD node
> count") bails out if it can't find the NB of each node reportedy by
> topology. Then NB features like EDAC or GART IOMMU aren't available.
>
> Which was maybe not a bad idea, nobody expects those things to work
> on selected nodes only. (I think?) But it relies on the optimistic
> assumption that topology knows the true number of nodes.
>
> Today I tested 5 older AMD64 systems with socket 754/AM2+/AM3/AM3+
> on MSI/ASUS motherboards. *All* of them report more than one node if
> the CPU has fewer cores than supported by the BIOS.
>
> (I also have one AM4 system which is OK, but can't speak for others).
>
> This is due to peculiarity of their MADT tables - they report as many
> LAPICs as the BIOS can support and excess LAPICs are simply disabled.
> FWIW, it's also a pattern that disabled APIC IDs have 0x80 bit set.
>
> The kernel counts this as "hotpluggable CPUs", since supposedly it's
> indistinguishable from actual multi-socket systems before ACPI 6.3,
> where the "online capable" flag was added to disambiguate hotplug and
> nonexistent but theoretically supported CPUs.
>
> Or at least that's what commit fed8d8773b8e ("x86/acpi/boot: Correct
> acpi_is_processor_usable() check") seems to imply.
>
> On pre-ACPI 6.3 systems those disabled LAPICs inflate topology size
> and result in breakage on recent kernels. A few examples below give
> an idea what those MADTs look like and how the kernel reads them.
>
> Regards,
> Michal
>
>
> Athlon 3000+ on S754:
>
> [02Fh 0047 001h] Local Apic ID : 00
> [030h 0048 004h] Flags (decoded below) : 00000001 # enabled
> --
> [037h 0055 001h] Local Apic ID : 81
> [038h 0056 004h] Flags (decoded below) : 00000000
>
> [ 0.027690] CPU topo: Max. logical packages: 2
> [ 0.027691] CPU topo: Max. logical dies: 2
> [ 0.027692] CPU topo: Max. dies per package: 1
> [ 0.027703] CPU topo: Max. threads per core: 1
> [ 0.027704] CPU topo: Num. cores per package: 1
> [ 0.027705] CPU topo: Num. threads per package: 1
> [ 0.027706] CPU topo: Allowing 1 present CPUs plus 1 hotplug CPUs
>
> Athlon II X2 250 on AM3+:
>
> [02Fh 0047 001h] Local Apic ID : 00
> [030h 0048 004h] Flags (decoded below) : 00000001 # enabled
> --
> [037h 0055 001h] Local Apic ID : 01
> [038h 0056 004h] Flags (decoded below) : 00000001 # enabled
> --
> [03Fh 0063 001h] Local Apic ID : 82
> [040h 0064 004h] Flags (decoded below) : 00000000
> --
> [047h 0071 001h] Local Apic ID : 83
> [048h 0072 004h] Flags (decoded below) : 00000000
> --
> [04Fh 0079 001h] Local Apic ID : 84
> [050h 0080 004h] Flags (decoded below) : 00000000
> --
> [057h 0087 001h] Local Apic ID : 85
> [058h 0088 004h] Flags (decoded below) : 00000000
> --
> [05Fh 0095 001h] Local Apic ID : 86
> [060h 0096 004h] Flags (decoded below) : 00000000
> --
> [067h 0103 001h] Local Apic ID : 87
> [068h 0104 004h] Flags (decoded below) : 00000000
>
> [ 0.147372] CPU topo: Max. logical packages: 3 # not sure why not 4
> [ 0.147372] CPU topo: Max. logical dies: 3
> [ 0.147373] CPU topo: Max. dies per package: 1
> [ 0.147379] CPU topo: Max. threads per core: 1
> [ 0.147379] CPU topo: Num. cores per package: 2
> [ 0.147380] CPU topo: Num. threads per package: 2
> [ 0.147381] CPU topo: Allowing 2 present CPUs plus 6 hotplug CPUs
>
> Phenom II X4 965 on AM3:
>
> [02Fh 0047 1] Local Apic ID : 00
> [030h 0048 4] Flags (decoded below) : 00000001 # enabled
> --
> [037h 0055 1] Local Apic ID : 01
> [038h 0056 4] Flags (decoded below) : 00000001 # enabled
> --
> [03Fh 0063 1] Local Apic ID : 02
> [040h 0064 4] Flags (decoded below) : 00000001 # enabled
> --
> [047h 0071 1] Local Apic ID : 03
> [048h 0072 4] Flags (decoded below) : 00000001 # enabled
> --
> [04Fh 0079 1] Local Apic ID : 84
> [050h 0080 4] Flags (decoded below) : 00000000
> --
> [057h 0087 1] Local Apic ID : 85
> [058h 0088 4] Flags (decoded below) : 00000000
>
> [ 0.072112] CPU topo: Max. logical packages: 2
> [ 0.072112] CPU topo: Max. logical dies: 2
> [ 0.072113] CPU topo: Max. dies per package: 1
> [ 0.072118] CPU topo: Max. threads per core: 1
> [ 0.072118] CPU topo: Num. cores per package: 4
> [ 0.072119] CPU topo: Num. threads per package: 4
> [ 0.072120] CPU topo: Allowing 4 present CPUs plus 2 hotplug CPUs

So far, I think the way to go is add explicit quirk for known issues.

Please see the patch below.

Thanks,
Yazen