Re: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V

From: Jeremi Piotrowski
Date: Thu Mar 09 2023 - 12:58:58 EST


@Alex,
do you know the answer to Sean's question below about ASID handling in Hyper-V?
Any other comments about how we're intending to fix things are also welcome.

Jeremi

On 08/03/2023 20:11, Sean Christopherson wrote:
> On Wed, Mar 08, 2023, Jeremi Piotrowski wrote:
>> On 08/03/2023 01:39, Sean Christopherson wrote:
>>> On Wed, Mar 08, 2023, Paolo Bonzini wrote:
>>>> On Tue, Mar 7, 2023 at 6:36 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>>>>> Thinking about this more, I would rather revert commit 1e0c7d40758b ("KVM: SVM:
>>>>> hyper-v: Remote TLB flush for SVM") or fix the thing properly straitaway. KVM
>>>>> doesn't magically handle the flushes correctly for the shadow/legacy MMU, KVM just
>>>>> happens to get lucky and not run afoul of the underlying bugs.
>>>>
>>>> I don't think it's about luck---the legacy MMU's zapping/invalidation
>>>> seems to invoke the flush hypercall correctly:
>>>
>>> ...for the paths that Jeremi has exercised, and for which a stale TLB entry is
>>> fatal to L2. E.g. kvm_unmap_gfn_range() does not have a range-based TLB flush
>>> in its path and fully relies on the buggy kvm_flush_remote_tlbs().
>>>
>>
>> Why do you say "buggy kvm_flush_remote_tlbs"? kvm_flush_remote_tlbs calls the
>> hypercall that is needed, I don't see how this might be an issue of a missing
>> "range-based TLB flush".
>
> Doh, I forgot that the arch hook in kvm_flush_remote_tlbs() leads to the Hyper-V
> hook.
>
> svm_flush_tlb_current is very much broken, but in practice it doesn't matter outside
> of the direct call from kvm_mmu_load(), because in all other paths KVM will flow
> through a Hyper-V flush if KVM actually modifies its MMU in any ways. E.g. the
> request from kvm_mmu_new_pgd() when force_flush_and_sync_on_reuse=true is neutered,
> but that exists only as a safeguard against MMU bugs. And for things like
> kvm_invalidate_pcid() and kvm_post_set_cr4(), my understanding is that Hyper-V
> will still flush the bare metal TLB, it's only Hyper-V's shadow page tables that
> are stale.
>
> Depending on how Hyper-V handles ASIDs, pre_svm_run() may also be broken. If
> Hyper-V tracks and rebuilds only the current ASID, then bumping the ASID won't
> rebuild potentially stale page tables. But I'm guessing Hyper-V ignores the ASID
> since the hypercall takes only the root PA.
>
> The truly problematic case is kvm_mmu_load(), where KVM relies on the flush to
> force Hyper-V to rebuild shadow page tables for an old, possibly stale nCR3. This
> affects only the TDP MMU because of an explicit optimization in the TDP MMU. So
> in practice we could squeak by with something like this:
>
> if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb)
> hyperv_flush_guest_mapping(vcpu->arch.mmu->root.hpa);
> else
> static_call(kvm_x86_flush_tlb_current)(vcpu);
>
> but I'm not convinced that avoiding a hypercall in svm_flush_tlb_current() just
> to avoid overhead when running an L3 (nested VM from L1 KVM's perspective) is
> worthwhile. The real problem there is that KVM nested SVM TLB/ASID support is an
> unoptimized mess, and I can't imagine someone running an L3 is going to be super
> concerned with performance.
>
> I also think we should have a sanity check in the flush_tlb_all() path, i.e. WARN
> if kvm_flush_remote_tlbs() falls back.
>
> Something like this (probably doesn't compile, likely needs #ifdefs or helpers):
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7ef4f9e3b35a..38afc5cac1c4 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3770,7 +3770,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> }
>
> -static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> +static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
>
> @@ -3794,6 +3794,23 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> svm->current_vmcb->asid_generation--;
> }
>
> +static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> +{
> + if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb &&
> + VALID_PAGE(vcpu->arch.mmu->root.hpa))
> + hyperv_flush_guest_mapping(vcpu->arch.mmu->root.hpa);
> +
> + svm_flush_tlb_asid(vcpu);
> +}
> +
> +static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
> +{
> + if (WARN_ON_ONCE(kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb))
> + hv_remote_flush_tlb(vcpu->kvm);
> +
> + svm_flush_tlb_asid(vcpu);
> +}
> +
> static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> @@ -4786,10 +4803,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> .set_rflags = svm_set_rflags,
> .get_if_flag = svm_get_if_flag,
>
> - .flush_tlb_all = svm_flush_tlb_current,
> + .flush_tlb_all = svm_flush_tlb_all,
> .flush_tlb_current = svm_flush_tlb_current,
> .flush_tlb_gva = svm_flush_tlb_gva,
> - .flush_tlb_guest = svm_flush_tlb_current,
> + .flush_tlb_guest = svm_flush_tlb_asid,
>
> .vcpu_pre_run = svm_vcpu_pre_run,
> .vcpu_run = svm_vcpu_run,