Re: [PATCH 07/17] KVM: x86/tdp_mmu: Centralize updates to present external PTEs

From: Yan Zhao

Date: Fri Apr 03 2026 - 05:47:35 EST


On Thu, Apr 02, 2026 at 04:28:33PM -0700, Sean Christopherson wrote:
> On Thu, Apr 02, 2026, Rick P Edgecombe wrote:
> > On Thu, 2026-04-02 at 09:59 +0800, Yan Zhao wrote:
> > > On Thu, Apr 02, 2026 at 07:45:54AM +0800, Edgecombe, Rick P wrote:
> > > > On Mon, 2026-03-30 at 14:14 +0800, Yan Zhao wrote:
> > > > > > + KVM_MMU_WARN_ON(is_frozen_spte(new_spte));
> > > > > > +
> > > > > > + /*
> > > > > > + * Temporarily freeze the SPTE until the external PTE operation has
> > > > > > + * completed (unless the new SPTE itself will be frozen), e.g. so
> > > > > > that
> > > > > > + * concurrent faults don't attempt to install a child PTE in the
> > > > > > + * external page table before the parent PTE has been written, or
> > > > > > try
> > > > > > + * to re-install a page table before the old one was removed.
> > > > > > + */
> > > > > > + if (is_mirror_sptep(iter->sptep))
> > > > > > + ret = __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE);
> > > > > > + else
> > > > > > + ret = __tdp_mmu_set_spte_atomic(kvm, iter, new_spte);
> > > > > >    if (ret)
> > > > > >    return ret;
> > > > > >  
> > > > > > - handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> > > > > > -     new_spte, iter->level, true);
> > > > > > + ret = __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte,
> > > > > > +     new_spte, iter->level, true);
> > > > >
> > > > > What about adding a comment for the tricky part for the mirror page table:
> > > > > while new_spte is set to FROZEN_SPTE in the above __tdp_mmu_set_spte_atomic()
> > > >
> > > > You meant it sets iter->sptep I think.
> > > >
> > > > > for freezing the mirror page table, the original new_spte from the caller of
> > > > > tdp_mmu_set_spte_atomic() is passed to __handle_changed_spte() in order to
> > > > > properly update statistics and propagate to the external page table.
> > > >
> > > > new_spte was already passed in. What changed? You mean that
> > > > __tdp_mmu_set_spte_atomic() sets iter->sptep and doesn't update new_spte? If so
> > > > I'm not sure if it threshold TDP MMU.
> > >
> > > For mirror page table, a successful path in tdp_mmu_set_spte_atomic() looks
> > > like this:
> > > tdp_mmu_set_spte_atomic() {
> > > __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE); ==>sets mirror to frozen
> > > __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte,
> > > new_spte, iter->level, true);==>sets S-EPT to new_spte
> > > __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); ==>sets mirror to new_spte
> > > }
> > >
> > > So, the tricky part is that S-EPT is updated to new_spte ahead of mirror EPT.
> >
> > I still don't see the point. That ordering is not new, and this patch actually
> > adds a bunch of comments around the operations above and below the
> > __handle_changed_spte() call. If you think something is still missing maybe you
> > can suggest something.
Hmm, sorry for the confusion. I didn't express it clearly.

The ordering inside tdp_mmu_set_spte_atomic() for mirror root is:

Before this patch,
1. set mirror SPTE to frozen
2. invoke TDX op to update external PTE
3. set mirror SPTE to new_spte or restore old_spte
4. if 2 succeeds, invoke handle_changed_spte() to propagate changes to
child mirror SPTEs and child external PTEs

After this patch,
1. set mirror SPTE to frozen
2. invoke __handle_changed_spte(), which propagates changes to
(1) child mirror SPTEs and child external PTEs
(2) external PTE
3. set mirror SPTE to new_spte or restore old_spte

So, the step to propagate changes to child mirror SPTEs and child external PTEs
now occurs before the steps to update the external PTE and the mirror SPTE.

> Ya, I'm a bit confused too. For me, the "tricky" part is understanding the need
> to set the mirror SPTE to FROZE_SPTE while updating the external SPTE. Once that
> is understood, I don't find passing in @new_spte to be surprising in any way.
I still find it tricky because it seems strange to me to invoke a function named
handle_changed_spte() before the change actually occurs on the SPTE (i.e.,to me,
the SPTE has only changed from xxx to FROZEN_SPTE, but handle_changed_spte()
handles changes from xxx to new_spte).

Besides, another tricky point (currently benign to TDX) is that:
before this patch, tdp_mmu_set_spte_atomic() cannot be used to atomically zap
non-leaf mirror SPTEs, since TDX requires child PTEs to be zapped before the
parent PTE;
after this patch, performing atomic zapping of non-leaf mirror SPTEs seems to be
allowed in TDP MMU since the above step 2.1 now occurs before step 2.2. However,
if step 2.2 fails after step 2.1 succeeds, step 3 cannot easily restore the real
old state.
So, if we allow atomic zap on the mirror root in the future, it looks like we
need to ensure atomic zapping of S-EPT cannot fail.

> > > > > > @@ -1373,6 +1396,9 @@ static void kvm_tdp_mmu_age_spte(struct kvm *kvm,
> > > > > > struct tdp_iter *iter)
> > > > > >   {
> > > > > >    u64 new_spte;
> > > > > >  
> > > > > > + if (WARN_ON_ONCE(is_mirror_sptep(iter->sptep)))
> > > > > > + return;
> > > > > > +
> > > > > Add a comment for why mirror page table is not expected here?
> > > >
> > > > Ehh, maybe. Thinking about what to put... The warning is kind of cheating a
> > > > little bit on the idea of the patch: to forward all changes through limited ops
> > > > in a central place, such that we don't have TDX specifics encoded in core MMU.
> > > > Trying to forward this through properly would result in more burden to the TDP
> > > > MMU, so that's not the right answer either.
> > > >
> > > > "Mirror TDP doesn't support PTE aging" is a pretty obvious comment. I'm fine
> > > > just leaving it without comment, but I can add something like that. Or do you
> > > > have another suggestion?
> > > What about "External TDP doesn't support clearing PTE A/D bit"?
> >
> > It sounds too close to "TDX doesn't support..." to me. I think I'd prefer to not
> > add a comment unless you strongly object.
>
> How about something like this?
>
> /* TODO: Add support for aging external SPTEs, if necessary. */
>
> That makes it clear that this path is supposed to be unreachable because KVM doesn't
> yet support aging external SPTEs, while not trying to say anything about *why*
> KVM doesn't support aging external SPTEs.
LGTM :)