Re: [PATCH v3] x86/acpi: fix panic while AP online later with kernel parameter maxcpus=1

From: Huang, Kai
Date: Tue Jul 02 2024 - 19:55:48 EST


On Tue, 2024-07-02 at 14:45 +0200, Borislav Petkov wrote:
> On Tue, Jul 02, 2024 at 12:05:38PM +0000, Huang, Kai wrote:
> > On Tue, 2024-07-02 at 08:58 +0800, Zhiquan Li wrote:
> > > The issue was found on the platform that using "Multiprocessor Wakeup
> > > Structure"[1] to startup secondary CPU, which is usually used by
> > > encrypted guest. When restrict boot time CPU to 1 with the kernel
> > > parameter "maxcpus=1" and bring other CPUs online later, there will be
> > > a kernel panic.
> > >
> > > The variable acpi_mp_wake_mailbox, which holds the virtual address of
> > > the MP Wakeup Structure mailbox, will be set as read-only after init.
> > > If the first AP gets online later, after init, the attempt to update
> > > the variable results in panic.
> > >
> > > The memremap() call that initializes the variable cannot be moved into
> > > acpi_parse_mp_wake() because memremap() is not functional at that point
> > > in the boot process.
> > >
> > > [1] Details about the MP Wakeup structure can be found in ACPI v6.4, in
> > > the "Multiprocessor Wakeup Structure" section.
> > >
> > > Signed-off-by: Zhiquan Li <zhiquan1.li@xxxxxxxxx>
> > > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> >
> > Seems this changelog only mentions the problem, but doesn't say how to fix:
> >
> > Remove the __ro_after_init annotation of acpi_mp_wake_mailbox to fix.
>
> Do not talk about *what* the patch is doing in the commit message - that
> should be obvious from the diff itself. Rather, concentrate on the *why*
> it needs to be done.
>
> Imagine one fine day you're doing git archeology, you find the place in
> the code about which you want to find out why it was changed the way it
> is now.
>
> You do git annotate <filename> ... find the line, see the commit id and
> you do:
>
> git show <commit id>
>
> You read the commit message and there's just gibberish and nothing's
> explaining *why* that change was done. And you start scratching your
> head, trying to figure out why. Because the damn commit message is worth
> sh*t.

Yeah fully agree. Thanks for saying this again.

>
> This happens to us maintainers at least once a week. Well, I don't want
> that to happen in my tree anymore.
>
> So none of this text above still doesn't explain to me *why* this is
> happening.
>
> Why do APs need to update acpi_mp_wake_mailbox?

They don't need to if acpi_mp_wake_mailbox can be setup before smp_init()
once for all.

But currently the setup of acpi_mp_wake_mailbox is done when the first AP is
brought up because memremap() doesn't work in acpi_parse_mp_wake(), as
mentioned in the changelog of this patch.

I also feel it's not ideal to setup acpi_mp_wake_mailbox when bringing up
the first AP, so I provided my diff. IIUC, if memremap() works for
acpi_mp_wake_mailbox when bringing up the first AP, then it should also work
in
the early_initcall().

>
> Which patch is this fixing?

It fiexes below commit AFAICT:

24dd05da8c79 ("x86/apic: Mark acpi_mp_wake_* variables as
__ro_after_init")

Which didn't consider 'maxvcpus=xx' case.


But I will leave to Kirill to confirm.