Re: [PATCH RFC v8 04/56] KVM: Add HVA range operator

From: Michael Roth
Date: Tue Mar 28 2023 - 00:37:11 EST


On Mon, Feb 20, 2023 at 11:37:09PM +0200, Zhi Wang wrote:
> On Mon, 20 Feb 2023 12:37:55 -0600
> Michael Roth <michael.roth@xxxxxxx> wrote:
>
> > From: Vishal Annapurve <vannapurve@xxxxxxxxxx>
> >
> > Introduce HVA range operator so that other KVM subsystems
> > can operate on HVA range.
> >
> > Signed-off-by: Vishal Annapurve <vannapurve@xxxxxxxxxx>
> > [mdr: minor checkpatch alignment fixups]
> > Signed-off-by: Michael Roth <michael.roth@xxxxxxx>
> > ---
> > include/linux/kvm_host.h | 6 +++++
> > virt/kvm/kvm_main.c | 48 ++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 54 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 4d542060cd93..c615650ed256 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1402,6 +1402,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm);
> > void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
> > void kvm_mmu_invalidate_end(struct kvm *kvm);
> >
> > +typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
> > + struct kvm_gfn_range *range, void *data);
> > +
> > +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
> > + unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
> > +
> > long kvm_arch_dev_ioctl(struct file *filp,
> > unsigned int ioctl, unsigned long arg);
> > long kvm_arch_vcpu_ioctl(struct file *filp,
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index f7e00593cc5d..4ccd655dd5af 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -642,6 +642,54 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
> > return (int)ret;
> > }
> >
>
> Below function seems a reduced duplicate of __kvm_handle_hva_range()
> in virt/kvm/kvm_main.c. It would be nice to factor __kvm_handle_hva_range().

A few differences make it difficult to refactor this clearly:

- This handler is mainly used for loading initial contents into guest
image before booting and doesn't rely on the MMU lock being held. It
also *can't* be called with MMU lock held because it suffers from the
same issue with mem_attr_update() hook where it needs to take a
mutex as part of unmapping from directmap when transitioning page to
private state in RMP table
- This handler wants to return an error code, as opposed to existing
handlers which return a true/false values which are passed along to
MMU notifier call-site and handled differently.
- This handler wants to terminate iterating through memslots as soon
as it encounters the first failure, whereas the existing handlers
expect to be called for each slot regardless of return value.

So it's a pretty different use-case that adds enough complexity to
__kvm_handle_hva_range() that it might need be worth refactoring it,
since it complicates some bits that are closely tied to dealing with
invalidations where the extra complexity probably needs to be
well-warranted.

I took a stab at it here for reference, but even with what seems to be
the minimal set of changes it doesn't save on any code and ultimately I
think it makes it harder to make sense of what going on:

https://github.com/mdroth/linux/commit/976c5fb708f7babe899fd80e27e19f8ba3f6818d

Is there a better approach?

Thanks,

-Mike

>
> > +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
> > + unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
> > +{
> > + int ret = 0;
> > + struct kvm_gfn_range gfn_range;
> > + struct kvm_memory_slot *slot;
> > + struct kvm_memslots *slots;
> > + int i, idx;
> > +
> > + if (WARN_ON_ONCE(hva_end <= hva_start))
> > + return -EINVAL;
> > +
> > + idx = srcu_read_lock(&kvm->srcu);
> > +
> > + for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> > + struct interval_tree_node *node;
> > +
> > + slots = __kvm_memslots(kvm, i);
> > + kvm_for_each_memslot_in_hva_range(node, slots,
> > + hva_start, hva_end - 1) {
> > + unsigned long start, end;
> > +
> > + slot = container_of(node, struct kvm_memory_slot,
> > + hva_node[slots->node_idx]);
> > + start = max(hva_start, slot->userspace_addr);
> > + end = min(hva_end, slot->userspace_addr +
> > + (slot->npages << PAGE_SHIFT));
> > +
> > + /*
> > + * {gfn(page) | page intersects with [hva_start, hva_end)} =
> > + * {gfn_start, gfn_start+1, ..., gfn_end-1}.
> > + */
> > + gfn_range.start = hva_to_gfn_memslot(start, slot);
> > + gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
> > + gfn_range.slot = slot;
> > +
> > + ret = handler(kvm, &gfn_range, data);
> > + if (ret)
> > + goto e_ret;
> > + }
> > + }
> > +
> > +e_ret:
> > + srcu_read_unlock(&kvm->srcu, idx);
> > +
> > + return ret;
> > +}
> > +
> > static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
> > unsigned long start,
> > unsigned long end,
>