Re: [PATCH v3 06/12] x86/amd_nb: Use topology info to get AMD node count

From: Yazen Ghannam
Date: Fri Oct 24 2025 - 09:45:25 EST


On Fri, Oct 24, 2025 at 10:48:51AM +0200, Michal Pecio wrote:
> On Thu, 23 Oct 2025 15:09:01 -0400, Yazen Ghannam wrote:

[...]

>
> > Sorry for the rapid emails. Here's another interesting commit:
> > f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package")
>
> I have this commit on 6.12 but it doesn't help.
>
> As I understand, APIC ID is a bitfield of the form:
>
> [package ID] ... [core ID] [thread ID]
>
> In my case, per debugfs:
>
> domain: Thread shift: 0 dom_size: 1 max_threads: 1
> domain: Core shift: 2 dom_size: 4 max_threads: 4
> domain: Module shift: 2 dom_size: 1 max_threads: 4
> domain: Tile shift: 2 dom_size: 1 max_threads: 4
> domain: Die shift: 2 dom_size: 1 max_threads: 4
> domain: DieGrp shift: 2 dom_size: 1 max_threads: 4
> domain: Package shift: 2 dom_size: 1 max_threads: 4
>
> So my phantom APICs simply look like another package with weird
> non-sequential ID. (Probably not an ACPI spec violation yet?)
>
> f0551af02130 only rejects disabled APICs in the same packages as
> enabled ones. An earlier proposal in that thread was to reject all
> disabled APICs on bare metal unless explicitly "online capable":
>
> https://lore.kernel.org/all/87sf15ugsz.ffs@tglx/
>
> This clearly goes against fed8d8773b8e and it seems to go against
> what you wrote about AMD BIOSes potentially marking CPUs as disabled
> in MADT and presumably allowing OS to wake them up with ACPI?

Yes, that's right. It's not clear how this should be handled. :/

>
> You asked elsewhere what happens if I online CPU5/6. I don't have
> directories for them in /sys/, so not sure if I need any extra steps
> to make them appear, or the kernel considers those CPUs bogus for
> some reason and amd_nb could do the same?
>
> Bitmaps from /sys/:
> /sys/devices/system/cpu/enabled:0-3
> /sys/devices/system/cpu/kernel_max:5
> /sys/devices/system/cpu/offline:4-5
> /sys/devices/system/cpu/online:0-3
> /sys/devices/system/cpu/possible:0-5
> /sys/devices/system/cpu/present:0-3

Right, good question. Why bother marking some CPUs as "possible" if we
can't bring them online?

>
> I tried 6.18-rc2 and it's same thing, except EDAC and GART don't work.
> On both kernels, possible_cpus=4 fixes it:
>
> [ 0.072066] CPU topo: Limiting to 4 possible CPUs
> [ 0.072074] CPU topo: CPU limit of 4 reached. Ignoring further CPUs
> [ 0.072082] IOAPIC[0]: apic_id 4, version 33, address 0xfec00000, GSI 0-23
> [ 0.072084] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [ 0.072086] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> [ 0.072089] ACPI: Using ACPI (MADT) for SMP configuration information
> [ 0.072090] ACPI: HPET id: 0x8300 base: 0xfed00000
> [ 0.072097] CPU topo: Max. logical packages: 1
> [ 0.072097] CPU topo: Max. logical dies: 1
> [ 0.072098] CPU topo: Max. dies per package: 1
> [ 0.072103] CPU topo: Max. threads per core: 1
> [ 0.072105] CPU topo: Num. cores per package: 4
> [ 0.072105] CPU topo: Num. threads per package: 4
> [ 0.072106] CPU topo: Allowing 4 present CPUs plus 0 hotplug CPUs
> [ 0.072107] CPU topo: Rejected CPUs 2

Thanks for checking this.

By the way, have you looked through your BIOS settings to see if there's
something relevant? Maybe there's an option to remove the
bogus/placeholder APIC entries?

Here's the K10 BKDG for reference:
https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/31116.pdf

The "CPU Cores and Downcoring" section has some explicit restrictions on
what is possible. So maybe something there can be used to filter out
bogus CPU entries.

Thanks,
Yazen