Re: [RFC PATCH v2 16/69] KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot by default
From: Sean Christopherson
Date: Tue Jul 13 2021 - 16:19:46 EST
On Tue, Jul 06, 2021, Paolo Bonzini wrote:
> On 03/07/21 00:04, isaku.yamahata@xxxxxxxxx wrote:
> > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> >
> > Zap only leaf SPTEs when deleting/moving a memslot by default, and add a
> > module param to allow reverting to the old behavior of zapping all SPTEs
> > at all levels and memslots when any memslot is updated.
> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 21 ++++++++++++++++++++-
> > 1 file changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 8d5876dfc6b7..5b8a640f8042 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -85,6 +85,9 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint");
> > static bool __read_mostly force_flush_and_sync_on_reuse;
> > module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
> > +static bool __read_mostly memslot_update_zap_all;
> > +module_param(memslot_update_zap_all, bool, 0444);
> > +
> > /*
> > * When setting this variable to true it enables Two-Dimensional-Paging
> > * where the hardware walks 2 page tables:
> > @@ -5480,11 +5483,27 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
> > return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
> > }
> > +static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> > +{
> > + /*
> > + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
> > + * case scenario we'll have unused shadow pages lying around until they
> > + * are recycled due to age or when the VM is destroyed.
> > + */
> > + write_lock(&kvm->mmu_lock);
> > + slot_handle_level(kvm, slot, kvm_zap_rmapp, PG_LEVEL_4K,
> > + KVM_MAX_HUGEPAGE_LEVEL, true);
> > + write_unlock(&kvm->mmu_lock);
> > +}
> > +
> > static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
> > struct kvm_memory_slot *slot,
> > struct kvm_page_track_notifier_node *node)
> > {
> > - kvm_mmu_zap_all_fast(kvm);
> > + if (memslot_update_zap_all)
> > + kvm_mmu_zap_all_fast(kvm);
> > + else
> > + kvm_mmu_zap_memslot(kvm, slot);
> > }
> > void kvm_mmu_init_vm(struct kvm *kvm)
> >
>
> This is the old patch that broke VFIO for some unknown reason.
Yes, my white whale :-/
> The commit message should at least say why memslot_update_zap_all is not true
> by default. Also, IIUC the bug still there with NX hugepage splits disabled,
I strongly suspect the bug is also there with hugepage splits enabled, it's just
masked and/or harder to hit.
> but what if the TDP MMU is enabled?
This should not be a module param. IIRC, the original code I wrote had it as a
per-VM flag that wasn't even exposed to the user, i.e. TDX guests always do the
partial flush and non-TDX guests always do the full flush. I think that's the
least awful approach if we can't figure out the underlying bug before TDX is
ready for inclusion.