Re: [PATCH v3 04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT

From: Sean Christopherson
Date: Mon Mar 23 2020 - 12:04:35 EST


On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes:
>
> > From: Junaid Shahid <junaids@xxxxxxxxxx>
> >
> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as
> > outstanding changes to the page tables managed by L1 need to be
> > recognized. Because L1 and L2 share an MMU when EPT is disabled, and
> > because VPID is not tracked by the MMU role, all roots in the current
> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
> > stale SPTEs.
> >
> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation")
> > Signed-off-by: Junaid Shahid <junaids@xxxxxxxxxx>
> > [sean: ported to upstream KVM, reworded the comment and changelog]
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > ---
> > arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 9624cea4ed9f..bc74fbbf33c6 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
> > return kvm_skip_emulated_instruction(vcpu);
> > }
> >
> > + /*
> > + * Sync the shadow page tables if EPT is disabled, L1 is invalidating
> > + * linear mappings for L2 (tagged with L2's VPID). Free all roots as
> > + * VPIDs are not tracked in the MMU role.
> > + *
> > + * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share
> > + * an MMU when EPT is disabled.
> > + *
> > + * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR.
> > + */
> > + if (!enable_ept)
> > + kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu,
> > + KVM_MMU_ROOTS_ALL);
> > +
>
> This is related to my remark on the previous patch; the comment above
> makes me think I'm missing something obvious, enlighten me please)
>
> My understanding is that L1 and L2 will share arch.root_mmu not only
> when EPT is globally disabled, we seem to switch between
> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2
> guests may be different on this. Do we need to handle this somehow?

guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1.
enable_ept is global and cannot be changed without reloading kvm_intel.

This most definitely over-invalidates, e.g. it blasts away L1's page
tables. But, fixing that requires tracking VPID in mmu_role and/or adding
support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is
disabled. Assuming the vast majority of nested deployments enable EPT in
L0, the cost of both options likely outweighs the benefits.

> > return nested_vmx_succeed(vcpu);
> > }
>
> --
> Vitaly
>