Re: [PATCH 07/17] KVM: x86/tdp_mmu: Centralize updates to present external PTEs

From: Edgecombe, Rick P

Date: Thu Apr 02 2026 - 19:11:01 EST


On Thu, 2026-04-02 at 09:59 +0800, Yan Zhao wrote:
> On Thu, Apr 02, 2026 at 07:45:54AM +0800, Edgecombe, Rick P wrote:
> > On Mon, 2026-03-30 at 14:14 +0800, Yan Zhao wrote:
> > > > + KVM_MMU_WARN_ON(is_frozen_spte(new_spte));
> > > > +
> > > > + /*
> > > > + * Temporarily freeze the SPTE until the external PTE operation has
> > > > + * completed (unless the new SPTE itself will be frozen), e.g. so
> > > > that
> > > > + * concurrent faults don't attempt to install a child PTE in the
> > > > + * external page table before the parent PTE has been written, or
> > > > try
> > > > + * to re-install a page table before the old one was removed.
> > > > + */
> > > > + if (is_mirror_sptep(iter->sptep))
> > > > + ret = __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE);
> > > > + else
> > > > + ret = __tdp_mmu_set_spte_atomic(kvm, iter, new_spte);
> > > >    if (ret)
> > > >    return ret;
> > > >  
> > > > - handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> > > > -     new_spte, iter->level, true);
> > > > + ret = __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte,
> > > > +     new_spte, iter->level, true);
> > >
> > > What about adding a comment for the tricky part for the mirror page table:
> > > while new_spte is set to FROZEN_SPTE in the above __tdp_mmu_set_spte_atomic()
> >
> > You meant it sets iter->sptep I think.
> >
> > > for freezing the mirror page table, the original new_spte from the caller of
> > > tdp_mmu_set_spte_atomic() is passed to __handle_changed_spte() in order to
> > > properly update statistics and propagate to the external page table.
> >
> > new_spte was already passed in. What changed? You mean that
> > __tdp_mmu_set_spte_atomic() sets iter->sptep and doesn't update new_spte? If so
> > I'm not sure if it threshold TDP MMU.
>
> For mirror page table, a successful path in tdp_mmu_set_spte_atomic() looks
> like this:
> tdp_mmu_set_spte_atomic() {
> __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE); ==>sets mirror to frozen
> __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte,
> new_spte, iter->level, true);==>sets S-EPT to new_spte
> __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); ==>sets mirror to new_spte
> }
>
> So, the tricky part is that S-EPT is updated to new_spte ahead of mirror EPT.

I still don't see the point. That ordering is not new, and this patch actually
adds a bunch of comments around the operations above and below the
__handle_changed_spte() call. If you think something is still missing maybe you
can suggest something.

>
>
> > > > - return 0;
> > > > + /*
> > > > + * Unfreeze the mirror SPTE.  If updating the external SPTE failed,
> > > > + * restore the old SPTE so that the SPTE isn't frozen in
> > > > perpetuity,
> > > > + * otherwise set the mirror SPTE to the new desired value.
> > > > + */
> > > > + if (is_mirror_sptep(iter->sptep)) {
> > > > + if (ret)
> > > > + __kvm_tdp_mmu_write_spte(iter->sptep, iter-
> > > > > old_spte);
> > > > + else
> > > > + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte);
> > > > + } else {
> > > > + /*
> > > > + * Bug the VM if handling the change failed, as failure is
> > > > only
> > > > + * allowed if KVM couldn't update the external SPTE.
> > > > + */
> > > > + KVM_BUG_ON(ret, kvm);
> > > > + }
> > > > + return ret;
> > > >   }
> > > >  
> > > >   /*
> > > > @@ -738,6 +759,8 @@ static inline int __must_check
> > > > tdp_mmu_set_spte_atomic(struct kvm *kvm,
> > > >   static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> > > >        u64 old_spte, u64 new_spte, gfn_t gfn, int
> > > > level)
> > > >   {
> > > > + struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(sptep));
> > > > +
> > > >    lockdep_assert_held_write(&kvm->mmu_lock);
> > > >  
> > > >    /*
> > > > @@ -751,7 +774,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id,
> > > > tdp_ptep_t sptep,
> > > >  
> > > >    old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte,
> > > > level);
> > > >  
> > > > - handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
> > > > false);
> > > > + handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level,
> > > > false);
> > > >  
> > > >    /*
> > > >    * Users that do non-atomic setting of PTEs don't operate on mirror
> > > > @@ -1373,6 +1396,9 @@ static void kvm_tdp_mmu_age_spte(struct kvm *kvm,
> > > > struct tdp_iter *iter)
> > > >   {
> > > >    u64 new_spte;
> > > >  
> > > > + if (WARN_ON_ONCE(is_mirror_sptep(iter->sptep)))
> > > > + return;
> > > > +
> > > Add a comment for why mirror page table is not expected here?
> >
> > Ehh, maybe. Thinking about what to put... The warning is kind of cheating a
> > little bit on the idea of the patch: to forward all changes through limited ops
> > in a central place, such that we don't have TDX specifics encoded in core MMU.
> > Trying to forward this through properly would result in more burden to the TDP
> > MMU, so that's not the right answer either.
> >
> > "Mirror TDP doesn't support PTE aging" is a pretty obvious comment. I'm fine
> > just leaving it without comment, but I can add something like that. Or do you
> > have another suggestion?
> What about "External TDP doesn't support clearing PTE A/D bit"?

It sounds too close to "TDX doesn't support..." to me. I think I'd prefer to not
add a comment unless you strongly object.


> I'm ok with leaving without comment if you think it's too obvious.
>
> > > And do we need a similar WARN_ON_ONCE() in kvm_tdp_mmu_clear_dirty_pt_masked()
> > > or clear_dirty_pt_masked()?
> >
> > Nothing changes for those in this patch though? For the kvm_tdp_mmu_age_spte()
> > case, warning coverage is removed in this patch.
>
> In kvm_tdp_mmu_age_spte() and clear_dirty_pt_masked() SPTEs are updated via
> __tdp_mmu_set_spte_atomic() or tdp_mmu_clear_spte_bits(), which don't propagate
> changes to the external page table.
> So, I think it's better to warn if a mirror root is accidentally passed in to
> those functions.
>
>
>

I don't think we should add warnings for new paths in this patch. Meaning, the
patch shouldn't increase coverage, just maintain. But it seems reasonable in
general. How about we add it to the cleanup list.