Re: [PATCH v3 17/17] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots()

From: Yan Zhao
Date: Fri Jun 21 2024 - 03:12:11 EST


On Wed, Jun 19, 2024 at 03:36:14PM -0700, Rick Edgecombe wrote:
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 630e6b6d4bf2..a1ab67a4f41f 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -37,7 +37,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
> * for zapping and thus puts the TDP MMU's reference to each root, i.e.
> * ultimately frees all roots.
> */
> - kvm_tdp_mmu_invalidate_all_roots(kvm);
> + kvm_tdp_mmu_invalidate_roots(kvm, KVM_VALID_ROOTS);
all roots (mirror + direct) are invalidated here.

> kvm_tdp_mmu_zap_invalidated_roots(kvm);
kvm_tdp_mmu_zap_invalidated_roots() will zap invalidated mirror root with
mmu_lock held for read, which should trigger KVM_BUG_ON() in
__tdp_mmu_set_spte_atomic(), which assumes "atomic zapping don't operate on
mirror roots".

But up to now, the KVM_BUG_ON() is not triggered because
kvm_mmu_notifier_release() is called earlier than kvm_destroy_vm() (as in below
call trace), and kvm_arch_flush_shadow_all() in kvm_mmu_notifier_release() has
zapped all mirror SPTEs before kvm_mmu_uninit_vm() called in kvm_destroy_vm().


kvm_mmu_notifier_release
kvm_flush_shadow_all
kvm_arch_flush_shadow_all
static_call_cond(kvm_x86_flush_shadow_all_private)(kvm);
kvm_mmu_zap_all ==>hold mmu_lock for write
kvm_tdp_mmu_zap_all ==>zap KVM_ALL_ROOTS with mmu_lock held for write

kvm_destroy_vm
kvm_arch_destroy_vm
kvm_mmu_uninit_vm
kvm_mmu_uninit_tdp_mmu
kvm_tdp_mmu_invalidate_roots ==>invalid all KVM_VALID_ROOTS
kvm_tdp_mmu_zap_invalidated_roots ==> zap all roots with mmu_lock held for read


A question is that kvm_mmu_notifier_release(), as a callback of primary MMU
notifier, why does it zap mirrored tdp when all other callbacks are with
KVM_FILTER_SHARED?

Could we just zap all KVM_DIRECT_ROOTS (valid | invalid) in
kvm_mmu_notifier_release() and move mirrord tdp related stuffs from
kvm_arch_flush_shadow_all() to kvm_mmu_uninit_tdp_mmu(), ensuring mmu_lock is
held for write?

>
> WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));