[PATCH v9 00/11] KVM: x86/mmu: Age sptes locklessly

From: James Houghton
Date: Mon Feb 03 2025 - 19:41:04 EST


By aging sptes locklessly with the TDP MMU and the shadow MMU, neither
vCPUs nor reclaim (mmu_notifier_invalidate_range*) will get stuck
waiting for aging. This contention reduction improves guest performance
and saves a significant amount of Google Cloud's CPU usage, and it has
valuable improvements for ChromeOS, as Yu has mentioned previously[1].

Please see v8[8] for some performance results using
access_tracking_perf_test patched to use MGLRU.

Neither access_tracking_perf_test nor mmu_stress_test trigger any
splats (with CONFIG_LOCKDEP=y) with the TDP MMU and with the shadow MMU.

=== Previous Versions ===

Since v8[8]:
- Re-added the kvm_handle_hva_range helpers and applied Sean's
kvm_{handle -> age}_hva_range rename.
- Renamed spte_has_volatile_bits() to spte_needs_atomic_write() and
removed its Accessed bit check. Undid change to
tdp_mmu_spte_need_atomic_write().
- Renamed KVM_MMU_NOTIFIER_{YOUNG -> AGING}_LOCKLESS.
- cpu_relax(), lockdep, preempt_disable(), and locking fixups for
per-rmap lock (thanks Lai and Sean).
- Renamed kvm_{has -> may_have}_shadow_mmu_sptes().
- Rebased onto latest kvm/next, including changing
for_each_tdp_mmu_root_rcu to use `types`.
- Dropped MGLRU changes from access_tracking_perf_test.
- Picked up Acked-bys from Yu. (thank you!)

Since v7[7]:
- Dropped MGLRU changes.
- Dropped DAMON cleanup.
- Dropped MMU notifier changes completely.
- Made shadow MMU aging *always* lockless, not just lockless when the
now-removed "fast_only" clear notifier was used.
- Given that the MGLRU changes no longer introduce a new MGLRU
capability, drop the new capability check from the selftest.
- Rebased on top of latest kvm-x86/next, including the x86 mmu changes
for marking pages as dirty.

Since v6[6]:
- Rebased on top of kvm-x86/next and Sean's lockless rmap walking
changes.
- Removed HAVE_KVM_MMU_NOTIFIER_YOUNG_FAST_ONLY (thanks DavidM).
- Split up kvm_age_gfn() / kvm_test_age_gfn() optimizations (thanks
DavidM and Sean).
- Improved new MMU notifier documentation (thanks DavidH).
- Dropped arm64 locking change.
- No longer retry for CAS failure in TDP MMU non-A/D case (thanks
Sean).
- Added some R-bys and A-bys.

Since v5[5]:
- Reworked test_clear_young_fast_only() into a new parameter for the
existing notifiers (thanks Sean).
- Added mmu_notifier.has_fast_aging to tell mm if calling fast-only
notifiers should be done.
- Added mm_has_fast_young_notifiers() to inform users if calling
fast-only notifier helpers is worthwhile (for look-around to use).
- Changed MGLRU to invoke a single notifier instead of two when
aging and doing look-around (thanks Yu).
- For KVM/x86, check indirect_shadow_pages > 0 instead of
kvm_memslots_have_rmaps() when collecting age information
(thanks Sean).
- For KVM/arm, some fixes from Oliver.
- Small fixes to access_tracking_perf_test.
- Added missing !MMU_NOTIFIER version of mmu_notifier_clear_young().

Since v4[4]:
- Removed Kconfig that controlled when aging was enabled. Aging will
be done whenever the architecture supports it (thanks Yu).
- Added a new MMU notifier, test_clear_young_fast_only(), specifically
for MGLRU to use.
- Add kvm_fast_{test_,}age_gfn, implemented by x86.
- Fix locking for clear_flush_young().
- Added KVM_MMU_NOTIFIER_YOUNG_LOCKLESS to clean up locking changes
(thanks Sean).
- Fix WARN_ON and other cleanup for the arm64 locking changes
(thanks Oliver).

Since v3[3]:
- Vastly simplified the series (thanks David). Removed mmu notifier
batching logic entirely.
- Cleaned up how locking is done for mmu_notifier_test/clear_young
(thanks David).
- Look-around is now only done when there are no secondary MMUs
subscribed to MMU notifiers.
- CONFIG_LRU_GEN_WALKS_SECONDARY_MMU has been added.
- Fixed the lockless implementation of kvm_{test,}age_gfn for x86
(thanks David).
- Added MGLRU functional and performance tests to
access_tracking_perf_test (thanks Axel).
- In v3, an mm would be completely ignored (for aging) if there was a
secondary MMU but support for secondary MMU walking was missing. Now,
missing secondary MMU walking support simply skips the notifier
calls (except for eviction).
- Added a sanity check for that range->lockless and range->on_lock are
never both provided for the memslot walk.

For the changes since v2[2], see v3.

Based on latest kvm/next.

[1]: https://lore.kernel.org/kvm/CAOUHufYS0XyLEf_V+q5SCW54Zy2aW5nL8CnSWreM8d1rX5NKYg@xxxxxxxxxxxxxx/
[2]: https://lore.kernel.org/kvmarm/20230526234435.662652-1-yuzhao@xxxxxxxxxx/
[3]: https://lore.kernel.org/linux-mm/20240401232946.1837665-1-jthoughton@xxxxxxxxxx/
[4]: https://lore.kernel.org/linux-mm/20240529180510.2295118-1-jthoughton@xxxxxxxxxx/
[5]: https://lore.kernel.org/linux-mm/20240611002145.2078921-1-jthoughton@xxxxxxxxxx/
[6]: https://lore.kernel.org/linux-mm/20240724011037.3671523-1-jthoughton@xxxxxxxxxx/
[7]: https://lore.kernel.org/kvm/20240926013506.860253-1-jthoughton@xxxxxxxxxx/
[8]: https://lore.kernel.org/kvm/20241105184333.2305744-1-jthoughton@xxxxxxxxxx/

James Houghton (7):
KVM: Rename kvm_handle_hva_range()
KVM: Add lockless memslot walk to KVM
KVM: x86/mmu: Factor out spte atomic bit clearing routine
KVM: x86/mmu: Relax locking for kvm_test_age_gfn() and kvm_age_gfn()
KVM: x86/mmu: Rename spte_has_volatile_bits() to
spte_needs_atomic_write()
KVM: x86/mmu: Skip shadow MMU test_young if TDP MMU reports page as
young
KVM: x86/mmu: Only check gfn age in shadow MMU if
indirect_shadow_pages > 0

Sean Christopherson (4):
KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o
mmu_lock
KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of
mmu_lock
KVM: x86/mmu: Add support for lockless walks of rmap SPTEs
KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging
gfns

Documentation/virt/kvm/locking.rst | 4 +-
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/mmu/mmu.c | 364 +++++++++++++++++++++--------
arch/x86/kvm/mmu/spte.c | 19 +-
arch/x86/kvm/mmu/spte.h | 2 +-
arch/x86/kvm/mmu/tdp_iter.h | 26 ++-
arch/x86/kvm/mmu/tdp_mmu.c | 36 ++-
include/linux/kvm_host.h | 1 +
virt/kvm/Kconfig | 2 +
virt/kvm/kvm_main.c | 56 +++--
11 files changed, 364 insertions(+), 151 deletions(-)


base-commit: f7bafceba76e9ab475b413578c1757ee18c3e44b
--
2.48.1.362.g079036d154-goog