Re: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V

From: Sean Christopherson
Date: Wed Mar 08 2023 - 14:20:40 EST


On Wed, Mar 08, 2023, Jeremi Piotrowski wrote:
> On 08/03/2023 16:55, Jeremi Piotrowski wrote:
> >
> >
> > On 08/03/2023 01:39, Sean Christopherson wrote:
> >> On Wed, Mar 08, 2023, Paolo Bonzini wrote:
> >>> On Tue, Mar 7, 2023 at 6:36 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >>>> Thinking about this more, I would rather revert commit 1e0c7d40758b ("KVM: SVM:
> >>>> hyper-v: Remote TLB flush for SVM") or fix the thing properly straitaway. KVM
> >>>> doesn't magically handle the flushes correctly for the shadow/legacy MMU, KVM just
> >>>> happens to get lucky and not run afoul of the underlying bugs.
> >>>
> >>> I don't think it's about luck---the legacy MMU's zapping/invalidation
> >>> seems to invoke the flush hypercall correctly:
> >>
> >> ...for the paths that Jeremi has exercised, and for which a stale TLB entry is
> >> fatal to L2. E.g. kvm_unmap_gfn_range() does not have a range-based TLB flush
> >> in its path and fully relies on the buggy kvm_flush_remote_tlbs().
> >>
> >
> > Why do you say "buggy kvm_flush_remote_tlbs"? kvm_flush_remote_tlbs calls the hypercall
> > that is needed, I don't see how this might be an issue of a missing "range-based TLB flush".
> >
> > kvm_unmap_gfn_range is called from kvm_mmu_notifier_invalidate_range_start and 'flush_on_ret=true'
> > is set, so it is followed by kvm_flush_remote_tlbs which calls hv_remote_flush_tlb.
> >
> >> In other words, KVM is getting lucky :-)
> >>
> >>> Jeremi, did you ever track the call stack where
> >>> hyperv_nested_flush_guest_mapping is triggered?
> >>
> >> I don't think it matters. As above, it only takes one path where KVM is fully
> >> relying on kvm_flush_remote_tlbs() for the whole thing to fall apart
>
> Slowly I'm starting to understand what we've been talking about, thank you :)
>
> Paolo/Sean, what do you think about smth like the following, except I would make
> it SVM only, and I'd need to think about what to do with the return.
> I believe this accurately reflects what the optimization is about. hv_track_root_tdp
> is called from kvm_mmu_load_pgd, which covers both kvm_mmu_load and kvm_mmu_new_pgd
> (which requests KVM_REQ_LOAD_MMU_PGD).

It's close, but KVM doesn't *always* need to flush when loading a root. KVM needs
to flush when loading a brand spanking new root, which is the kvm_mmu_load() path.
But when KVM loads a root via KVM_REQ_LOAD_MMU_PGD/kvm_mmu_new_pgd(), a flush may
or may not be necessary, e.g. if KVM reuses an old, but still valid, root (each
vCPU has a 3-entry root cache) and a TLB flush isn't architecturally required, then
there is no need to flush.

And as mentioned in the other tendril of this thread, I'd really like to fix
svm_flush_tlb_current() since it's technically broken, even though it's highly
unlikely (maybe even impossible?) to cause issues in practice.

> diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
> index 482d6639ef88..6a5bd3cbace8 100644
> --- a/arch/x86/kvm/kvm_onhyperv.c
> +++ b/arch/x86/kvm/kvm_onhyperv.c
> @@ -29,6 +29,18 @@ static inline int hv_remote_flush_root_tdp(hpa_t root_tdp,
> return hyperv_flush_guest_mapping(root_tdp);
> }
>
> +static int hv_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_arch *kvm_arch = &vcpu->kvm->arch;
> + hpa_t root_tdp = vcpu->arch.hv_root_tdp;
> + int ret;
> +
> + ret = hyperv_flush_guest_mapping(root_tdp);
> + if (!ret)
> + kvm_arch->hv_root_tdp = root_tdp;
> + return ret;
> +}
> +
> int hv_remote_flush_tlb_with_range(struct kvm *kvm,
> struct kvm_tlb_range *range)
> {
> @@ -101,8 +113,10 @@ void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
> if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) {
> spin_lock(&kvm_arch->hv_root_tdp_lock);
> vcpu->arch.hv_root_tdp = root_tdp;
> - if (root_tdp != kvm_arch->hv_root_tdp)
> + if (root_tdp != kvm_arch->hv_root_tdp) {
> kvm_arch->hv_root_tdp = INVALID_PAGE;
> + hv_vcpu_flush_tlb_current(vcpu);
> + }
> spin_unlock(&kvm_arch->hv_root_tdp_lock);
> }
> }
>