Re: [PATCH 09/11] KVM: guest_memfd: Add interface for populating gmem pages with user data

From: Isaku Yamahata
Date: Wed Apr 24 2024 - 21:13:01 EST


On Wed, Apr 24, 2024 at 03:24:58PM -0700,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

> On Tue, Apr 23, 2024, Isaku Yamahata wrote:
> > On Thu, Apr 04, 2024 at 02:50:31PM -0400,
> > Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> >
> > > During guest run-time, kvm_arch_gmem_prepare() is issued as needed to
> > > prepare newly-allocated gmem pages prior to mapping them into the guest.
> > > In the case of SEV-SNP, this mainly involves setting the pages to
> > > private in the RMP table.
> > >
> > > However, for the GPA ranges comprising the initial guest payload, which
> > > are encrypted/measured prior to starting the guest, the gmem pages need
> > > to be accessed prior to setting them to private in the RMP table so they
> > > can be initialized with the userspace-provided data. Additionally, an
> > > SNP firmware call is needed afterward to encrypt them in-place and
> > > measure the contents into the guest's launch digest.
> > >
> > > While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that
> > > this handling can be done in an open-coded/vendor-specific manner, this
> > > may expose more gmem-internal state/dependencies to external callers
> > > than necessary. Try to avoid this by implementing an interface that
> > > tries to handle as much of the common functionality inside gmem as
> > > possible, while also making it generic enough to potentially be
> > > usable/extensible for TDX as well.
> >
> > I explored how TDX will use this hook. However, it resulted in not using this
> > hook, and instead used kvm_tdp_mmu_get_walk() with a twist. The patch is below.
> >
> > Because SEV-SNP manages the RMP that is not tied to NPT directly, SEV-SNP can
> > ignore TDP MMU page tables when updating RMP.
> > On the other hand, TDX essentially updates Secure-EPT when it adds a page to
> > the guest by TDH.MEM.PAGE.ADD(). It needs to protect KVM TDP MMU page tables
> > with mmu_lock, not guest memfd file mapping with invalidate_lock. The hook
> > doesn't apply to TDX well. The resulted KVM_TDX_INIT_MEM_REGION logic is as
> > follows.
> >
> > get_user_pages_fast(source addr)
> > read_lock(mmu_lock)
> > kvm_tdp_mmu_get_walk_private_pfn(vcpu, gpa, &pfn);
> > if the page table doesn't map gpa, error.
> > TDH.MEM.PAGE.ADD()
> > TDH.MR.EXTEND()
> > read_unlock(mmu_lock)
> > put_page()
>
> Hmm, KVM doesn't _need_ to use invalidate_lock to protect against guest_memfd
> invalidation, but I also don't see why it would cause problems. I.e. why not
> take mmu_lock() in TDX's post_populate() implementation?

We can take the lock. Because we have already populated the GFN of guest_memfd,
we need to make kvm_gmem_populate() not pass FGP_CREAT_ONLY. Otherwise we'll
get -EEXIST.


> That would allow having
> a sanity check that the PFN that guest_memfd() has is indeed the PFN that KVM's
> S-EPT mirror has, i.e. the PFN that KVM is going to PAGE.ADD.

Because we have PFN from the mirrored EPT, I thought it's duplicate to get PFN
again via guest memfd. We can check if two PFN matches.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>