Re: [PATCH 07/17] KVM: x86/tdp_mmu: Centralize updates to present external PTEs

From: Yan Zhao

Date: Wed Apr 01 2026 - 22:39:55 EST


On Thu, Apr 02, 2026 at 07:45:54AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2026-03-30 at 14:14 +0800, Yan Zhao wrote:
> > > + KVM_MMU_WARN_ON(is_frozen_spte(new_spte));
> > > +
> > > + /*
> > > + * Temporarily freeze the SPTE until the external PTE operation has
> > > + * completed (unless the new SPTE itself will be frozen), e.g. so
> > > that
> > > + * concurrent faults don't attempt to install a child PTE in the
> > > + * external page table before the parent PTE has been written, or
> > > try
> > > + * to re-install a page table before the old one was removed.
> > > + */
> > > + if (is_mirror_sptep(iter->sptep))
> > > + ret = __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE);
> > > + else
> > > + ret = __tdp_mmu_set_spte_atomic(kvm, iter, new_spte);
> > >    if (ret)
> > >    return ret;
> > >  
> > > - handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> > > -     new_spte, iter->level, true);
> > > + ret = __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte,
> > > +     new_spte, iter->level, true);
> >
> > What about adding a comment for the tricky part for the mirror page table:
> > while new_spte is set to FROZEN_SPTE in the above __tdp_mmu_set_spte_atomic()
>
> You meant it sets iter->sptep I think.
>
> > for freezing the mirror page table, the original new_spte from the caller of
> > tdp_mmu_set_spte_atomic() is passed to __handle_changed_spte() in order to
> > properly update statistics and propagate to the external page table.
>
> new_spte was already passed in. What changed? You mean that
> __tdp_mmu_set_spte_atomic() sets iter->sptep and doesn't update new_spte? If so
> I'm not sure if it threshold TDP MMU.

For mirror page table, a successful path in tdp_mmu_set_spte_atomic() looks
like this:
tdp_mmu_set_spte_atomic() {
__tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE); ==>sets mirror to frozen
__handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte,
new_spte, iter->level, true);==>sets S-EPT to new_spte
__kvm_tdp_mmu_write_spte(iter->sptep, new_spte); ==>sets mirror to new_spte
}

So, the tricky part is that S-EPT is updated to new_spte ahead of mirror EPT.


> > > - return 0;
> > > + /*
> > > + * Unfreeze the mirror SPTE.  If updating the external SPTE failed,
> > > + * restore the old SPTE so that the SPTE isn't frozen in
> > > perpetuity,
> > > + * otherwise set the mirror SPTE to the new desired value.
> > > + */
> > > + if (is_mirror_sptep(iter->sptep)) {
> > > + if (ret)
> > > + __kvm_tdp_mmu_write_spte(iter->sptep, iter-
> > > >old_spte);
> > > + else
> > > + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte);
> > > + } else {
> > > + /*
> > > + * Bug the VM if handling the change failed, as failure is
> > > only
> > > + * allowed if KVM couldn't update the external SPTE.
> > > + */
> > > + KVM_BUG_ON(ret, kvm);
> > > + }
> > > + return ret;
> > >   }
> > >  
> > >   /*
> > > @@ -738,6 +759,8 @@ static inline int __must_check
> > > tdp_mmu_set_spte_atomic(struct kvm *kvm,
> > >   static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> > >        u64 old_spte, u64 new_spte, gfn_t gfn, int
> > > level)
> > >   {
> > > + struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(sptep));
> > > +
> > >    lockdep_assert_held_write(&kvm->mmu_lock);
> > >  
> > >    /*
> > > @@ -751,7 +774,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id,
> > > tdp_ptep_t sptep,
> > >  
> > >    old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte,
> > > level);
> > >  
> > > - handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
> > > false);
> > > + handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level,
> > > false);
> > >  
> > >    /*
> > >    * Users that do non-atomic setting of PTEs don't operate on mirror
> > > @@ -1373,6 +1396,9 @@ static void kvm_tdp_mmu_age_spte(struct kvm *kvm,
> > > struct tdp_iter *iter)
> > >   {
> > >    u64 new_spte;
> > >  
> > > + if (WARN_ON_ONCE(is_mirror_sptep(iter->sptep)))
> > > + return;
> > > +
> > Add a comment for why mirror page table is not expected here?
>
> Ehh, maybe. Thinking about what to put... The warning is kind of cheating a
> little bit on the idea of the patch: to forward all changes through limited ops
> in a central place, such that we don't have TDX specifics encoded in core MMU.
> Trying to forward this through properly would result in more burden to the TDP
> MMU, so that's not the right answer either.
>
> "Mirror TDP doesn't support PTE aging" is a pretty obvious comment. I'm fine
> just leaving it without comment, but I can add something like that. Or do you
> have another suggestion?
What about "External TDP doesn't support clearing PTE A/D bit"?
I'm ok with leaving without comment if you think it's too obvious.

> > And do we need a similar WARN_ON_ONCE() in kvm_tdp_mmu_clear_dirty_pt_masked()
> > or clear_dirty_pt_masked()?
>
> Nothing changes for those in this patch though? For the kvm_tdp_mmu_age_spte()
> case, warning coverage is removed in this patch.

In kvm_tdp_mmu_age_spte() and clear_dirty_pt_masked() SPTEs are updated via
__tdp_mmu_set_spte_atomic() or tdp_mmu_clear_spte_bits(), which don't propagate
changes to the external page table.
So, I think it's better to warn if a mirror root is accidentally passed in to
those functions.