Re: [PATCH] KVM: x86: Zap all TDP leaf entries according noncoherent DMA count

From: Chao Gao
Date: Mon May 08 2023 - 03:20:34 EST

On Mon, May 08, 2023 at 11:47:00AM +0800, Yan Zhao wrote:
>Zap all TDP leaf entries when noncoherent DMA count goes from 0 to !0, or
>from !0 to 0.
>When there's no noncoherent DMA device, EPT memory type is
>When there're noncoherent DMA devices, EPT memory type needs to honor
>guest CR0_CD and MTRR settings.
>So, if noncoherent DMA count changes between 0 and !0, EPT leaf entries
>need to be zapped to clear stale memory type.
>This issue might be hidden when VFIO adding/removing MMIO regions of the
>noncoherent DMA devices on device attaching/de-attaching because
>usually the MMIO regions will be disabled/enabled for several times during
>guest PCI probing. And in KVM, TDP entries are all zapped on memslot
>However, this issue may appear when kvm_mmu_zap_all_fast() is not called
>before KVM slot removal, e.g. as for TDX, only leaf entries for the
>memslot to be removed is zapped.
>static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
> struct kvm_memory_slot *slot,
> struct kvm_page_track_notifier_node *node)
> if (kvm_gfn_shared_mask(kvm))
> /*
> * Secure-EPT requires to release PTs from the leaf. The
> * optimization to zap root PT first with child PT doesn't
> * work.
> */
> kvm_mmu_zap_memslot(kvm, slot);
> else
> kvm_mmu_zap_all_fast(kvm);

TDX code isn't merged. So, I think you'd better not use TDX as an argument.

>And even without TDX's case, in some extreme conditions if MMIO regions
>are not disabled during device attaching, e.g. if guest does not cause
>the MMIO region disabling in QEMU.
>Then TDP zap will not be called and wrong EPT memory type might be
>So, do the TDP zapping of all leaf entries when present/non-present state
>of noncoherent DMA devices changes to ensure stale entries cleaned away.
>And as this is not a frequent operation, the extra zap should be fine.
>Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
> arch/x86/kvm/x86.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index e7f78fe79b32..99a825722d95 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -13145,13 +13145,15 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
> void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
> {
>- atomic_inc(&kvm->arch.noncoherent_dma_count);
>+ if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1)

>+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));

The issue is specific to EPT. shouldn't this be conditional on tdp_enabled, like

Likewise, shouldn't we omit to call kvm_zap_gfn_range() in kvm_post_set_cr0() if
tdp_enabled is false?

> }
> EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);
> void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
> {
>- atomic_dec(&kvm->arch.noncoherent_dma_count);
>+ if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count))
>+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> }
> EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);