Re: [PATCH 19/21] KVM: TDX: Add an ioctl to create initial guest memory

From: Paolo Bonzini
Date: Tue Sep 10 2024 - 06:22:50 EST


On 9/4/24 05:07, Rick Edgecombe wrote:
From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

Add a new ioctl for the user space VMM to initialize guest memory with the
specified memory contents.

Because TDX protects the guest's memory, the creation of the initial guest
memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of
directly copying the memory contents into the guest's memory in the case of
the default VM type.

Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped
KVM_MEMORY_ENCRYPT_OP. Check if the GFN is already pre-allocated, assign
the guest page in Secure-EPT, copy the initial memory contents into the
guest memory, and encrypt the guest memory. Optionally, extend the memory
measurement of the TDX guest.

Discussion history:

While useful for the reviewers, in the end this is the simplest possible userspace API (the one that we started with) and the objections just went away because it reuses the infrastructure that was introduced for pre-faulting memory.

So I'd replace everything with:

---
The ioctl uses the vCPU file descriptor because of the TDX module's requirement that the memory is added to the S-EPT (via TDH.MEM.SEPT.ADD) prior to initialization (TDH.MEM.PAGE.ADD). Accessing the MMU in turn requires a vCPU file descriptor, just like for KVM_PRE_FAULT_MEMORY. In fact, the post-populate callback is able to reuse the same logic used by KVM_PRE_FAULT_MEMORY, so that userspace can do everything with a single ioctl.

Note that this is the only way to invoke TDH.MEM.SEPT.ADD before the TD in finalized, as userspace cannot use KVM_PRE_FAULT_MEMORY at that point. This ensures that there cannot be pages in the S-EPT awaiting TDH.MEM.PAGE.ADD, which would be treated incorrectly as spurious by tdp_mmu_map_handle_target_level() (KVM would see the SPTE as PRESENT, but the corresponding S-EPT entry will be !PRESENT).
---

Part of the second paragraph comes from your link [4], https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@xxxxxxxxxx/, but updated for recent changes to KVM_PRE_FAULT_MEMORY.

This drops the historical information that is not particularly relevant for the future, it updates what's relevant to mention changes done for SEV-SNP, and also preserves most of the other information:

* why the vCPU file descriptor

* the desirability of a single ioctl for userspace

* the relationship between KVM_TDX_INIT_MEM_REGION and KVM_PRE_FAULT_MEMORY

Paolo