Re: [PATCHv11 0/9] mm, x86/cc, efi: Implement support for unaccepted memory

From: Kirill A. Shutemov
Date: Tue May 16 2023 - 19:22:17 EST


On Tue, May 16, 2023 at 05:41:55PM -0500, Tom Lendacky wrote:
> On 5/13/23 17:04, Kirill A. Shutemov wrote:
> > UEFI Specification version 2.9 introduces the concept of memory
> > acceptance: some Virtual Machine platforms, such as Intel TDX or AMD
> > SEV-SNP, requiring memory to be accepted before it can be used by the
> > guest. Accepting happens via a protocol specific for the Virtual
> > Machine platform.
> >
> > Accepting memory is costly and it makes VMM allocate memory for the
> > accepted guest physical address range. It's better to postpone memory
> > acceptance until memory is needed. It lowers boot time and reduces
> > memory overhead.
> >
> > The kernel needs to know what memory has been accepted. Firmware
> > communicates this information via memory map: a new memory type --
> > EFI_UNACCEPTED_MEMORY -- indicates such memory.
> >
> > Range-based tracking works fine for firmware, but it gets bulky for
> > the kernel: e820 has to be modified on every page acceptance. It leads
> > to table fragmentation, but there's a limited number of entries in the
> > e820 table
> >
> > Another option is to mark such memory as usable in e820 and track if the
> > range has been accepted in a bitmap. One bit in the bitmap represents
> > 2MiB in the address space: one 4k page is enough to track 64GiB or
> > physical address space.
> >
> > In the worst-case scenario -- a huge hole in the middle of the
> > address space -- It needs 256MiB to handle 4PiB of the address
> > space.
> >
> > Any unaccepted memory that is not aligned to 2M gets accepted upfront.
> >
> > The approach lowers boot time substantially. Boot to shell is ~2.5x
> > faster for 4G TDX VM and ~4x faster for 64G.
> >
> > TDX-specific code isolated from the core of unaccepted memory support. It
> > supposed to help to plug-in different implementation of unaccepted memory
> > such as SEV-SNP.
> >
> > -- Fragmentation study --
> >
> > Vlastimil and Mel were concern about effect of unaccepted memory on
> > fragmentation prevention measures in page allocator. I tried to evaluate
> > it, but it is tricky. As suggested I tried to run multiple parallel kernel
> > builds and follow how often kmem:mm_page_alloc_extfrag gets hit.
> >
> > See results in the v9 of the patchset[1][2]
> >
> > [1] https://lore.kernel.org/all/20230330114956.20342-1-kirill.shutemov@xxxxxxxxxxxxxxx
> > [2] https://lore.kernel.org/all/20230416191940.ex7ao43pmrjhru2p@xxxxxxxxxxxxxxxxx
> >
> > --
> >
> > The tree can be found here:
> >
> > https://github.com/intel/tdx.git guest-unaccepted-memory
>
> I get some failures when building without TDX support selected in my
> kernel config after adding unaccepted memory support for SNP:
>
> In file included from arch/x86/boot/compressed/../../coco/tdx/tdx-shared.c:1,
> from arch/x86/boot/compressed/tdx-shared.c:2:
> ./arch/x86/include/asm/tdx.h: In function ‘tdx_kvm_hypercall’:
> ./arch/x86/include/asm/tdx.h:72:17: error: ‘ENODEV’ undeclared (first use in this function)
> 72 | return -ENODEV;
> | ^~~~~~
> ./arch/x86/include/asm/tdx.h:72:17: note: each undeclared identifier is reported only once for each function it appears in
>
> Adding an include for linux/errno.h gets past that error, but then
> I get the following:
>
> ld: arch/x86/boot/compressed/tdx-shared.o: in function `tdx_enc_status_changed_phys':
> tdx-shared.c:(.text+0x42): undefined reference to `__tdx_hypercall'
> ld: tdx-shared.c:(.text+0x7f): undefined reference to `__tdx_module_call'
> ld: tdx-shared.c:(.text+0xce): undefined reference to `__tdx_module_call'
> ld: tdx-shared.c:(.text+0x13b): undefined reference to `__tdx_module_call'
> ld: tdx-shared.c:(.text+0x153): undefined reference to `cc_mkdec'
> ld: tdx-shared.c:(.text+0x15d): undefined reference to `cc_mkdec'
> ld: tdx-shared.c:(.text+0x18e): undefined reference to `__tdx_hypercall'
> ld: arch/x86/boot/compressed/vmlinux: hidden symbol `__tdx_hypercall' isn't defined
> ld: final link failed: bad value
>
> So it looks like arch/x86/boot/compressed/tdx-shared.c is being
> built, while arch/x86/boot/compressed/tdx.c isn't.

Right. I think this should help:

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 78f67e0a2666..b13a58021086 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -106,8 +106,8 @@ ifdef CONFIG_X86_64
endif

vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
-vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
-vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o $(obj)/tdx-shared.o
+vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o $(obj)/tdx-shared.o
+vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o

vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o

> After setting TDX in the kernel config, I can build successfully, but
> I'm running into an error when trying to accept memory during
> decompression.
>
> In drivers/firmware/efi/libstub/unaccepted_memory.c, I can see that the
> unaccepted_table is allocated, but when accept_memory() is invoked the
> table address is now zero. I thought maybe it had to do with bss, but even
> putting it in the .data section didn't help. I'll keep digging, but if you
> have any ideas, that would be great.

Not right away. But maybe seeing your side of enabling would help.

--
Kiryl Shutsemau / Kirill A. Shutemov