[PATCH 00/11] KVM: x86/mmu: Restore fast invalidate/zap flow
From: Sean Christopherson
Date: Thu Sep 12 2019 - 22:47:08 EST
Restore the fast invalidate flow for zapping shadow pages and use it
whenever vCPUs can be active in the VM. This fixes (in theory, not yet
confirmed) a regression reported by James Harvey where KVM can livelock
in kvm_mmu_zap_all() when it's invoked in response to a memslot update.
The fast invalidate flow was removed as it was deemed to be unnecessary
after its primary user, memslot flushing, was reworked to zap only the
memslot in question instead of all shadow pages. Unfortunately, zapping
only the memslot being (re)moved during a memslot update introduced a
regression for VMs with assigned devices. Because we could not discern
why zapping only the relevant memslot broke device assignment, or if the
regression extended beyond device assignment, we reverted to zapping all
shadow pages when a memslot is (re)moved.
The revert to "zap all" failed to account for subsequent changes that
have been made to kvm_mmu_zap_all() between then and now. Specifically,
kvm_mmu_zap_all() now conditionally drops reschedules and drops mmu_lock
if a reschedule is needed or if the lock is contended. Dropping the lock
allows other vCPUs to add shadow pages, and, with enough vCPUs, can cause
kvm_mmu_zap_all() to get stuck in an infinite loop as it can never zap all
pages before observing lock contention or the need to reschedule.
The reasoning behind having kvm_mmu_zap_all() conditionally reschedule was
that it would only be used when the VM is inaccesible, e.g. when its
mm_struct is dying or when the VM itself is being destroyed. In that case,
playing nice with the rest of the kernel instead of hogging cycles to free
unused shadow pages made sense.
Since it's unlikely we'll root cause the device assignment regression any
time soon, and that simply removing the conditional rescheduling isn't
guaranteed to return us to a known good state, restore the fast invalidate
flow for zapping on memslot updates, including mmio generation wraparound.
Opportunisticaly tack on a bug fix and a couple enhancements.
Alex and James, it probably goes without saying... please test, especially
patch 01/11 as a standalone patch as that'll likely need to be applied to
stable branches, assuming it works. Thanks!
Sean Christopherson (11):
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
KVM: x86/mmu: Treat invalid shadow pages as obsolete
KVM: x86/mmu: Use fast invalidate mechanism to zap MMIO sptes
KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow
page related tracepoints""
KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for
kvm_mmu_invalidate_all_pages""
KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""
KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap
all pages""
KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete
page first""
KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call"
KVM: x86/mmu: Explicitly track only a single invalid mmu generation
KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/kvm/mmu.c | 154 ++++++++++++++++++++++++++++----
arch/x86/kvm/mmutrace.h | 42 +++++++--
arch/x86/kvm/x86.c | 1 +
4 files changed, 173 insertions(+), 28 deletions(-)
--
2.22.0