Re: [PATCHv14 5/9] efi: Add unaccepted memory support

From: Kirill A. Shutemov
Date: Fri Oct 13 2023 - 08:34:14 EST


On Tue, Oct 10, 2023 at 04:05:18PM -0500, Michael Roth wrote:
> On Tue, Jun 06, 2023 at 05:26:33PM +0300, Kirill A. Shutemov wrote:
> > efi_config_parse_tables() reserves memory that holds unaccepted memory
> > configuration table so it won't be reused by page allocator.
> >
> > Core-mm requires few helpers to support unaccepted memory:
> >
> > - accept_memory() checks the range of addresses against the bitmap and
> > accept memory if needed.
> >
> > - range_contains_unaccepted_memory() checks if anything within the
> > range requires acceptance.
> >
> > Architectural code has to provide efi_get_unaccepted_table() that
> > returns pointer to the unaccepted memory configuration table.
> >
> > arch_accept_memory() handles arch-specific part of memory acceptance.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > Reviewed-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > Reviewed-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
> > ---
> > arch/x86/platform/efi/efi.c | 3 +
> > drivers/firmware/efi/Makefile | 1 +
> > drivers/firmware/efi/efi.c | 25 +++++
> > drivers/firmware/efi/unaccepted_memory.c | 112 +++++++++++++++++++++++
> > include/linux/efi.h | 1 +
> > 5 files changed, 142 insertions(+)
> > create mode 100644 drivers/firmware/efi/unaccepted_memory.c
> >
> > diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c
> > new file mode 100644
> > index 000000000000..08a9a843550a
> > --- /dev/null
> > +++ b/drivers/firmware/efi/unaccepted_memory.c
> > @@ -0,0 +1,112 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +
> > +#include <linux/efi.h>
> > +#include <linux/memblock.h>
> > +#include <linux/spinlock.h>
> > +#include <asm/unaccepted_memory.h>
> > +
> > +/* Protects unaccepted memory bitmap */
> > +static DEFINE_SPINLOCK(unaccepted_memory_lock);
> > +
> > +/*
> > + * accept_memory() -- Consult bitmap and accept the memory if needed.
> > + *
> > + * Only memory that is explicitly marked as unaccepted in the bitmap requires
> > + * an action. All the remaining memory is implicitly accepted and doesn't need
> > + * acceptance.
> > + *
> > + * No need to accept:
> > + * - anything if the system has no unaccepted table;
> > + * - memory that is below phys_base;
> > + * - memory that is above the memory that addressable by the bitmap;
> > + */
> > +void accept_memory(phys_addr_t start, phys_addr_t end)
> > +{
> > + struct efi_unaccepted_memory *unaccepted;
> > + unsigned long range_start, range_end;
> > + unsigned long flags;
> > + u64 unit_size;
> > +
> > + unaccepted = efi_get_unaccepted_table();
> > + if (!unaccepted)
> > + return;
> > +
> > + unit_size = unaccepted->unit_size;
> > +
> > + /*
> > + * Only care for the part of the range that is represented
> > + * in the bitmap.
> > + */
> > + if (start < unaccepted->phys_base)
> > + start = unaccepted->phys_base;
> > + if (end < unaccepted->phys_base)
> > + return;
> > +
> > + /* Translate to offsets from the beginning of the bitmap */
> > + start -= unaccepted->phys_base;
> > + end -= unaccepted->phys_base;
> > +
> > + /* Make sure not to overrun the bitmap */
> > + if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
> > + end = unaccepted->size * unit_size * BITS_PER_BYTE;
> > +
> > + range_start = start / unit_size;
> > +
> > + spin_lock_irqsave(&unaccepted_memory_lock, flags);
> > + for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap,
> > + DIV_ROUND_UP(end, unit_size)) {
> > + unsigned long phys_start, phys_end;
> > + unsigned long len = range_end - range_start;
> > +
> > + phys_start = range_start * unit_size + unaccepted->phys_base;
> > + phys_end = range_end * unit_size + unaccepted->phys_base;
> > +
> > + arch_accept_memory(phys_start, phys_end);
> > + bitmap_clear(unaccepted->bitmap, range_start, len);
> > + }
> > + spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
> > +}
>
> While testing SNP guests running today's tip/master (ef19bc9dddc3) I ran
> into what seems to be fairly significant lock contention due to the
> unaccepted_memory_lock spinlock above, which results in a constant stream
> of soft-lockups until the workload gets all its memory accepted/faulted
> in if the guest has around 16+ vCPUs.
>
> I've included the guest dmesg traces I was seeing below.
>
> In this case I was running a 32 vCPU guest with 200GB of memory running on
> a 256 thread EPYC (Milan) system, and can trigger the above situation fairly
> reliably by running the following workload in a freshly-booted guests:
>
> stress --vm 32 --vm-bytes 5G --vm-keep
>
> Scaling up the number of stress threads and vCPUs should make it easier
> to reproduce.
>
> Other than unresponsiveness/lockup messages until the memory is accepted,
> the guest seems to continue running fine, but for large guests where
> unaccepted memory is more likely to be useful, it seems like it could be
> an issue, especially when consider 100+ vCPU guests.

Okay, sorry for delay. It took time to reproduce it with TDX.

I will look what can be done.

--
Kiryl Shutsemau / Kirill A. Shutemov