Re: [PATCH 09/11] KVM: guest_memfd: Add interface for populating gmem pages with user data

From: Xu Yilun
Date: Mon Apr 22 2024 - 10:50:13 EST


On Thu, Apr 04, 2024 at 02:50:31PM -0400, Paolo Bonzini wrote:
> During guest run-time, kvm_arch_gmem_prepare() is issued as needed to
> prepare newly-allocated gmem pages prior to mapping them into the guest.
> In the case of SEV-SNP, this mainly involves setting the pages to
> private in the RMP table.
>
> However, for the GPA ranges comprising the initial guest payload, which
> are encrypted/measured prior to starting the guest, the gmem pages need
> to be accessed prior to setting them to private in the RMP table so they
> can be initialized with the userspace-provided data. Additionally, an
> SNP firmware call is needed afterward to encrypt them in-place and
> measure the contents into the guest's launch digest.
>
> While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that
> this handling can be done in an open-coded/vendor-specific manner, this
> may expose more gmem-internal state/dependencies to external callers
> than necessary. Try to avoid this by implementing an interface that
> tries to handle as much of the common functionality inside gmem as
> possible, while also making it generic enough to potentially be
> usable/extensible for TDX as well.
>
> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Michael Roth <michael.roth@xxxxxxx>
> Co-developed-by: Michael Roth <michael.roth@xxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> include/linux/kvm_host.h | 26 ++++++++++++++
> virt/kvm/guest_memfd.c | 78 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 104 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 33ed3b884a6b..97d57ec59789 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2450,4 +2450,30 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
> bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
> #endif
>
> +/**
> + * kvm_gmem_populate() - Populate/prepare a GPA range with guest data
> + *
> + * @kvm: KVM instance
> + * @gfn: starting GFN to be populated
> + * @src: userspace-provided buffer containing data to copy into GFN range
> + * (passed to @post_populate, and incremented on each iteration
> + * if not NULL)
> + * @npages: number of pages to copy from userspace-buffer
> + * @post_populate: callback to issue for each gmem page that backs the GPA
> + * range
> + * @opaque: opaque data to pass to @post_populate callback
> + *
> + * This is primarily intended for cases where a gmem-backed GPA range needs
> + * to be initialized with userspace-provided data prior to being mapped into
> + * the guest as a private page. This should be called with the slots->lock
> + * held so that caller-enforced invariants regarding the expected memory
> + * attributes of the GPA range do not race with KVM_SET_MEMORY_ATTRIBUTES.
> + *
> + * Returns the number of pages that were populated.
> + */
> +long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages,
> + int (*post_populate)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> + void __user *src, int order, void *opaque),
> + void *opaque);
> +
> #endif
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 51c99667690a..e7de97382a67 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -602,3 +602,81 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> return r;
> }
> EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
> +
> +static int kvm_gmem_undo_get_pfn(struct file *file, struct kvm_memory_slot *slot,
> + gfn_t gfn, int order)
> +{
> + pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
> + struct kvm_gmem *gmem = file->private_data;
> +
> + /*
> + * Races with kvm_gmem_unbind() must have been detected by
> + * __kvm_gmem_get_gfn(), because the invalidate_lock is
> + * taken between __kvm_gmem_get_gfn() and kvm_gmem_undo_get_pfn().
> + */
> + if (WARN_ON_ONCE(xa_load(&gmem->bindings, index) != slot))
> + return -EIO;
> +
> + return __kvm_gmem_punch_hole(file_inode(file), index << PAGE_SHIFT, PAGE_SIZE << order);
> +}
> +
> +long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages,
> + int (*post_populate)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> + void __user *src, int order, void *opaque),
> + void *opaque)
> +{
> + struct file *file;
> + struct kvm_memory_slot *slot;
> +
> + int ret = 0, max_order;
> + long i;
> +
> + lockdep_assert_held(&kvm->slots_lock);
> + if (npages < 0)
> + return -EINVAL;
> +
> + slot = gfn_to_memslot(kvm, gfn);
> + if (!kvm_slot_can_be_private(slot))
> + return -EINVAL;
> +
> + file = kvm_gmem_get_file(slot);
> + if (!file)
> + return -EFAULT;
> +
> + filemap_invalidate_lock(file->f_mapping);
> +
> + npages = min_t(ulong, slot->npages - (gfn - slot->base_gfn), npages);
> + for (i = 0; i < npages; i += (1 << max_order)) {
> + gfn_t this_gfn = gfn + i;
> + kvm_pfn_t pfn;
> +
> + ret = __kvm_gmem_get_pfn(file, slot, this_gfn, &pfn, &max_order, false);
> + if (ret)
> + break;
> +
> + if (!IS_ALIGNED(this_gfn, (1 << max_order)) ||
> + (npages - i) < (1 << max_order))
> + max_order = 0;
> +
> + if (post_populate) {
> + void __user *p = src ? src + i * PAGE_SIZE : NULL;
> + ret = post_populate(kvm, this_gfn, pfn, p, max_order, opaque);

I don't see the main difference between gmem_prepare() and post_populate()
from gmem's point of view. They are all vendor callbacks invoked after
__filemap_get_folio(). Is it possible gmem choose to call gmem_prepare()
or post_populate() outside __kvm_gmem_get_pfn()? Or even pass all
parameters to a single gmem_prepare() and let vendor code decide what to
do.

> + }
> +
> + put_page(pfn_to_page(pfn));
> + if (ret) {
> + /*
> + * Punch a hole so that FGP_CREAT_ONLY can succeed
> + * again.
> + */
> + kvm_gmem_undo_get_pfn(file, slot, this_gfn, max_order);
> + break;
> + }
> + }
> +
> + filemap_invalidate_unlock(file->f_mapping);
> +
> + fput(file);
> + return ret && !i ? ret : i;
> +}
> +EXPORT_SYMBOL_GPL(kvm_gmem_populate);
> --
> 2.43.0
>
>
>