Re: [PATCH] KVM: Use precise range-based flush in mmu_notifier hooks when possible

From: Will Deacon
Date: Tue Aug 20 2024 - 11:42:03 EST


Hi Sean,

On Fri, Aug 02, 2024 at 12:16:17PM -0700, Sean Christopherson wrote:
> Do arch-specific range-based TLB flushes (if they're supported) when
> flushing in response to mmu_notifier events, as a single range-based flush
> is almost always more performant. This is especially true in the case of
> mmu_notifier events, as the majority of events that hit a running VM
> operate on a relatively small range of memory.
>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
>
> This is *very* lightly tested, a thumbs up from the ARM world would be much
> appreciated.
>
> virt/kvm/kvm_main.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d0788d0a72cc..46bb95d58d53 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -599,6 +599,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
> struct kvm_gfn_range gfn_range;
> struct kvm_memory_slot *slot;
> struct kvm_memslots *slots;
> + bool need_flush = false;
> int i, idx;
>
> if (WARN_ON_ONCE(range->end <= range->start))
> @@ -651,10 +652,22 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
> goto mmu_unlock;
> }
> r.ret |= range->handler(kvm, &gfn_range);
> +
> + /*
> + * Use a precise gfn-based TLB flush when possible, as
> + * most mmu_notifier events affect a small-ish range.
> + * Fall back to a full TLB flush if the gfn-based flush
> + * fails, and don't bother trying the gfn-based flush
> + * if a full flush is already pending.
> + */
> + if (range->flush_on_ret && !need_flush && r.ret &&
> + kvm_arch_flush_remote_tlbs_range(kvm, gfn_range.start,
> + gfn_range.end - gfn_range.start))
> + need_flush = true;

Thanks for having a crack at this.

We could still do better in the ->clear_flush_young() case if the
handler could do the invalidation as part of its page-table walk (for
example, it could use information about the page-table structure such
as the level of the leaves to optimise the invalidation further), but
this does at least avoid zapping the whole VMID on CPUs with range
support.

My only slight concern is that, should clear_flush_young() be extended
to operate on more than a single page-at-a-time in future, this will
silently end up invalidating the entire VMID for each memslot unless we
teach kvm_arch_flush_remote_tlbs_range() to return !0 in that case.

Will