Re: [tip: x86/urgent] x86/acpi: Ignore invalid x2APIC entries

From: Zhang, Rui
Date: Wed Dec 06 2023 - 21:41:42 EST


Hi, Andres,

On Tue, 2023-12-05 at 22:58 -0800, Andres Freund wrote:
> Hi,
>
> On 2023-12-01 08:31:48 +0000, Zhang, Rui wrote:
> > As a quick fix, I'm not going to fix the "potential issue"
> > describes
> > above because we have not seen a real problem caused by this yet.
> >
> > Can you please try the below patch to confirm if the problem is
> > gone on
> > your system?
> > This patch falls back to the previous way as sent at
> > https://lore.kernel.org/lkml/87pm4bp54z.ffs@tglx/T/
>
>
> I've just spent a couple hours bisecting why upgrading to 6.7-rc4
> left me with
> just a single CPU core on my dual socket workstation.
>
>
> before:
> [    0.000000] Linux version 6.6.0-andres-00003-g31255e072b2e ...
> ...
> [    0.022960] ACPI: Using ACPI (MADT) for SMP configuration
> information
> ...
> [    0.022968] smpboot: Allowing 40 CPUs, 0 hotplug CPUs
> ...
> [    0.345921] smpboot: CPU0: Intel(R) Xeon(R) Gold 5215 CPU @
> 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
> ...
> [    0.347229] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6 
> #7  #8  #9
> [    0.349082] .... node  #1, CPUs:   #10 #11 #12 #13 #14 #15 #16 #17
> #18 #19
> [    0.003190] smpboot: CPU 10 Converting physical 0 to logical die 1
>
> [    0.361053] .... node  #0, CPUs:   #20 #21 #22 #23 #24 #25 #26 #27
> #28 #29
> [    0.363990] .... node  #1, CPUs:   #30 #31 #32 #33 #34 #35 #36 #37
> #38 #39
> ...
> [    0.370886] smp: Brought up 2 nodes, 40 CPUs
> [    0.370891] smpboot: Max logical packages: 2
> [    0.370896] smpboot: Total of 40 processors activated (200000.00
> BogoMIPS)
> [    0.403905] node 0 deferred pages initialised in 32ms
> [    0.408865] node 1 deferred pages initialised in 37ms
>
>
> after:
> [    0.000000] Linux version 6.6.0-andres-00004-gec9aedb2aa1a ...
> ...
> [    0.022935] ACPI: Using ACPI (MADT) for SMP configuration
> information
> ...
> [    0.022942] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
> ...
> [    0.356424] smpboot: CPU0: Intel(R) Xeon(R) Gold 5215 CPU @
> 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
> ...
> [    0.357098] smp: Bringing up secondary CPUs ...
> [    0.357107] smp: Brought up 2 nodes, 1 CPU
> [    0.357108] smpboot: Max logical packages: 1
> [    0.357110] smpboot: Total of 1 processors activated (5000.00
> BogoMIPS)
> [    0.726283] node 0 deferred pages initialised in 368ms
> [    0.774704] node 1 deferred pages initialised in 418ms
>
>
> There does seem to be something off with the ACPI data, when booting
> without
> the patch,

which patch are you referring to? the original patch in this thread?

Does the second patch fixes the problem? I mean the patch at
https://lore.kernel.org/all/904ce2b870b8a7f34114f93adc7c8170420869d1.camel@xxxxxxxxx/

thanks,
rui


> I do see messages like:
> [    0.715228] APIC: NR_CPUS/possible_cpus limit of 40 reached.
> Processor 40/0x7f00 ignored.
> [    0.715231] ACPI: Unable to map lapic to logical cpu number
>
> But other than that, the system has worked for a couple years.
>
>
> It's obviously not good to regress from 2x10/20 cores/threads to a
> single
> core.   I guess it's at least somewhat funny to imagine a 2 socket
> system with
> a single core...
>
>
> It seems particularly worrying that this patch has apparently been
> selected
> for -stable:
> https://lore.kernel.org/all/20231122153212.852040-2-sashal@xxxxxxxxxx/
>
> Even if it didn't have these unintended consequences, it seems like a
> commit
> like this hardly is -stable material?
>
>
> I've attached .config, dmesg of a boot with gec9aedb2aa1a and one
> with
> gec9aedb2aa1a^.
>
> Greetings,
>
> Andres Freund