Re: [PATCH 09/11] KVM: guest_memfd: Add interface for populating gmem pages with user data

From: Sean Christopherson
Date: Wed Apr 24 2024 - 18:25:10 EST


On Tue, Apr 23, 2024, Isaku Yamahata wrote:
> On Thu, Apr 04, 2024 at 02:50:31PM -0400,
> Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> > During guest run-time, kvm_arch_gmem_prepare() is issued as needed to
> > prepare newly-allocated gmem pages prior to mapping them into the guest.
> > In the case of SEV-SNP, this mainly involves setting the pages to
> > private in the RMP table.
> >
> > However, for the GPA ranges comprising the initial guest payload, which
> > are encrypted/measured prior to starting the guest, the gmem pages need
> > to be accessed prior to setting them to private in the RMP table so they
> > can be initialized with the userspace-provided data. Additionally, an
> > SNP firmware call is needed afterward to encrypt them in-place and
> > measure the contents into the guest's launch digest.
> >
> > While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that
> > this handling can be done in an open-coded/vendor-specific manner, this
> > may expose more gmem-internal state/dependencies to external callers
> > than necessary. Try to avoid this by implementing an interface that
> > tries to handle as much of the common functionality inside gmem as
> > possible, while also making it generic enough to potentially be
> > usable/extensible for TDX as well.
>
> I explored how TDX will use this hook. However, it resulted in not using this
> hook, and instead used kvm_tdp_mmu_get_walk() with a twist. The patch is below.
>
> Because SEV-SNP manages the RMP that is not tied to NPT directly, SEV-SNP can
> ignore TDP MMU page tables when updating RMP.
> On the other hand, TDX essentially updates Secure-EPT when it adds a page to
> the guest by TDH.MEM.PAGE.ADD(). It needs to protect KVM TDP MMU page tables
> with mmu_lock, not guest memfd file mapping with invalidate_lock. The hook
> doesn't apply to TDX well. The resulted KVM_TDX_INIT_MEM_REGION logic is as
> follows.
>
> get_user_pages_fast(source addr)
> read_lock(mmu_lock)
> kvm_tdp_mmu_get_walk_private_pfn(vcpu, gpa, &pfn);
> if the page table doesn't map gpa, error.
> TDH.MEM.PAGE.ADD()
> TDH.MR.EXTEND()
> read_unlock(mmu_lock)
> put_page()

Hmm, KVM doesn't _need_ to use invalidate_lock to protect against guest_memfd
invalidation, but I also don't see why it would cause problems. I.e. why not
take mmu_lock() in TDX's post_populate() implementation? That would allow having
a sanity check that the PFN that guest_memfd() has is indeed the PFN that KVM's
S-EPT mirror has, i.e. the PFN that KVM is going to PAGE.ADD.

> >From 7d4024049b51969a2431805c2117992fc7ec0981 Mon Sep 17 00:00:00 2001
> Message-ID: <7d4024049b51969a2431805c2117992fc7ec0981.1713913379.git.isaku.yamahata@xxxxxxxxx>
> In-Reply-To: <cover.1713913379.git.isaku.yamahata@xxxxxxxxx>
> References: <cover.1713913379.git.isaku.yamahata@xxxxxxxxx>
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> Date: Tue, 23 Apr 2024 11:33:44 -0700
> Subject: [PATCH] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU
>
> KVM_TDX_INIT_MEM_REGION needs to check if the given GFN is already
> populated. Add wrapping logic to kvm_tdp_mmu_get_walk() to export it.

> +int kvm_tdp_mmu_get_walk_private_pfn(struct kvm_vcpu *vcpu, u64 gpa,
> + kvm_pfn_t *pfn)
> +{
> + u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte;
> + int leaf;
> +
> + lockdep_assert_held(&vcpu->kvm->mmu_lock);
> +
> + kvm_tdp_mmu_walk_lockless_begin();

This is obviously not a lockless walk.

> + leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, true);
> + kvm_tdp_mmu_walk_lockless_end();
> + if (leaf < 0)
> + return -ENOENT;
> +
> + spte = sptes[leaf];
> + if (is_shadow_present_pte(spte) && is_last_spte(spte, leaf)) {
> + *pfn = spte_to_pfn(spte);
> + return leaf;
> }
>
> - return leaf;
> + return -ENOENT;
> }
> +EXPORT_SYMBOL_GPL(kvm_tdp_mmu_get_walk_private_pfn);
>
> /*
> * Returns the last level spte pointer of the shadow page walk for the given
> --
> 2.43.2
>
> --
> Isaku Yamahata <isaku.yamahata@xxxxxxxxx>