Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()

From: Marc Zyngier
Date: Wed May 31 2023 - 04:47:12 EST


On Tue, 30 May 2023 22:22:23 +0100,
Raghavendra Rao Ananta <rananta@xxxxxxxxxx> wrote:
>
> On Mon, May 29, 2023 at 7:00 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >
> > On Fri, 19 May 2023 01:52:28 +0100,
> > Raghavendra Rao Ananta <rananta@xxxxxxxxxx> wrote:
> > >
> > > Implement kvm_arch_flush_remote_tlbs_range() for arm64
> > > to invalidate the given range in the TLB.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <rananta@xxxxxxxxxx>
> > > ---
> > > arch/arm64/include/asm/kvm_host.h | 3 +++
> > > arch/arm64/kvm/hyp/nvhe/tlb.c | 4 +---
> > > arch/arm64/kvm/mmu.c | 11 +++++++++++
> > > 3 files changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 81ab41b84f436..343fb530eea9c 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> > > #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> > > int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> > >
> > > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> > > +
> > > static inline bool kvm_vm_is_protected(struct kvm *kvm)
> > > {
> > > return false;
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > index d4ea549c4b5c4..d2c7c1bc6d441 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > > return;
> > > }
> > >
> > > - dsb(ishst);
> > > -
> > > /* Switch to requested VMID */
> > > - __tlb_switch_to_guest(mmu, &cxt);
> > > + __tlb_switch_to_guest(mmu, &cxt, false);
> >
> > This hunk is in the wrong patch, isn't it?
> >
> Ah, you are right. It should be part of the previous patch. I think I
> introduced it accidentally when I rebased the series. I'll remove it
> in the next spin.
>
>
> > >
> > > __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> > >
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index d0a0d3dca9316..e3673b4c10292 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> > > return 0;
> > > }
> > >
> > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> > > +{
> > > + phys_addr_t start, end;
> > > +
> > > + start = start_gfn << PAGE_SHIFT;
> > > + end = (start_gfn + pages) << PAGE_SHIFT;
> > > +
> > > + kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
> >
> > So that's the point that I think is not right. It is the MMU code that
> > should drive the invalidation method, and not the HYP code. The HYP
> > code should be as dumb as possible, and the logic should be kept in
> > the MMU code.
> >
> > So when a range invalidation is forwarded to HYP, it's a *valid* range
> > invalidation. not something that can fallback to VMID-wide invalidation.
> >
> I'm guessing that you are referring to patch-2. Do you recommend
> moving the 'pages >= MAX_TLBI_RANGE_PAGES' logic here and simply
> return an error? How about for the other check:
> system_supports_tlb_range()?
> The idea was for __kvm_tlb_flush_vmid_range() to also implement a
> fallback mechanism in case the system doesn't support the range-based
> instructions. But if we end up calling __kvm_tlb_flush_vmid_range()
> from multiple cases, we'd end up duplicating the checks. WDYT?

My take is that there should be a single helper deciding to issue
either a number of range-based TLBIs depending on start/end, or a
single VMID-based TLBI. Having multiple calling sites is not a
problem, and even if that code gets duplicated, big deal.

But a hypercall that falls back to global invalidation based on a
range evaluation error (more than MAX_TLBI_RANGE_PAGES) is papering
over a latent bug.

There should be no logic whatsoever in any of the two tlb.c files.
Only a switch to the correct context, and the requested invalidation,
which *must* be architecturally correct.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.