Re: [PATCH 07/17] KVM: x86/tdp_mmu: Centralize updates to present external PTEs
From: Yan Zhao
Date: Tue Apr 07 2026 - 05:14:42 EST
On Sat, Apr 04, 2026 at 08:15:16AM +0800, Edgecombe, Rick P wrote:
> On Fri, 2026-04-03 at 17:05 +0800, Yan Zhao wrote:
> > Hmm, sorry for the confusion. I didn't express it clearly.
> >
> > The ordering inside tdp_mmu_set_spte_atomic() for mirror root is:
> >
> > Before this patch,
> > 1. set mirror SPTE to frozen
> > 2. invoke TDX op to update external PTE
> > 3. set mirror SPTE to new_spte or restore old_spte
> > 4. if 2 succeeds, invoke handle_changed_spte() to propagate changes to
> > child mirror SPTEs and child external PTEs
> >
> > After this patch,
> > 1. set mirror SPTE to frozen
> > 2. invoke __handle_changed_spte(), which propagates changes to
> > (1) child mirror SPTEs and child external PTEs
> > (2) external PTE
> > 3. set mirror SPTE to new_spte or restore old_spte
> >
> > So, the step to propagate changes to child mirror SPTEs and child external PTEs
> > now occurs before the steps to update the external PTE and the mirror SPTE.
>
> How about I add this info to the log. I think you are right, it's an important
Ok.
> change to call out.
>
> Now it's making me think... Can you (if you haven't already) scrutinize for
> races/reasons that may trigger the KVM_BUG_ON() in handle_changed_spte() due to
> BUSY or other? Like in the handle_removed_pt() path. I guess the write lock
> saves us?
>
> Hmm... zero step?
Below is the list of all handle_changed_spte()-related scenarios.
Legends:
NP: !shadow-present SPTE
P: shadow-present SPTE
X: Yes or No.
shared: hold shared mmu_lock or not
valid: valid scenario in TDP MMU
mirror root allowed: could this scenario occur in the mirror root (after basic
TDX huge page support)
KVM_BUG_ON() hittable: is KVM_BUG_ON() in handle_changed_spte() hittable.
Scenarios 1-5 are for mapping,
Scenarios 6-10 are for shadow-present to shadow-present transitions.
Scenarios 11-14 are for zapping.
|mirror root| KVM_BUG_ON()
# | old_spte | new_spte | shared | valid | allowed | hittable
---|-----------|-----------|--------|-------|-----------|-----------------------
1 | NP | leaf P | Y | Y | Y | N
2 | NP | leaf P | N | Y | N | Y (a1)
3 | NP | nonleaf P | Y | Y | Y | N
4 | NP | nonleaf P | N | Y | N | Y (a2)
5 | NP | NP | X | N | N | warn !mmio_spte && !frozen_spte
---|-----------|-----------|--------|-------|-----------|-----------------------
6 | leaf P | leaf P | X | N | X | N
7 | leaf P | nonleaf P | Y | Y | N | N
8 | leaf P | nonleaf P | N | Y | Y | N
9 | nonleaf P | leaf P | X | Y | N | Y (b)
10 | nonleaf P | nonleaf P | X | N | X | warn pfn_changed
---|-----------|-----------|--------|-------|-----------|-----------------------
11 | leaf P | NP | Y | Y | N | N (c)
12 | leaf P | NP | N | Y | Y | N
13 | nonleaf P | NP | Y | Y | N | Y (d)
14 | nonleaf P | NP | N | Y | Y | N
Currently, only 4 scenarios (a1),(a2), (b), (d) may trigger KVM_BUG_ON() in
handle_changed_spte(), but none of them are currently reachable by mirror root.
(a1)(a2) May hit KVM_BUG_ON() in handle_changed_spte() if
tdx_sept_set_private_spte() fails due to contentions. e.g.,
tdh_mem_sept_add(), tdx_mem_page_aug(), or tdx_mem_page_add() may
contend with tdh_vp_enter() due to zero-step mitigation or may
potentially contend with TDCALLs.
(b) Promotion case. Currently unreachable in mirror root.
Need more complex changes in TDP MMU if we want to support it in the future.
(c) Will not hit KVM_BUG_ON() in TDP MMU, but will trigger warnings in
tdx_sept_remove_private_spte() due to lockdep_assert_held_write() or
TDX_BUG_ON() caused by concurrent BLOCK, TRACK, REMOVE.
(d) May hit the KVM_BUG_ON() in handle_changed_spte() due to failure to remove
child S-EPT entries and will trigger warnings in
tdx_sept_remove_private_spte() due to lockdep_assert_held_write() or
TDX_BUG_ON() caused by concurrent BLOCK, TRACK, REMOVE.
May also trigger TDX_BUG_ON() in tdx_sept_reclaim_private_spt().
> > > Ya, I'm a bit confused too. For me, the "tricky" part is understanding the need
> > > to set the mirror SPTE to FROZE_SPTE while updating the external SPTE. Once that
> > > is understood, I don't find passing in @new_spte to be surprising in any way.
> > I still find it tricky because it seems strange to me to invoke a function named
> > handle_changed_spte() before the change actually occurs on the SPTE (i.e.,to me,
> > the SPTE has only changed from xxx to FROZEN_SPTE, but handle_changed_spte()
> > handles changes from xxx to new_spte).
> >
> > Besides, another tricky point (currently benign to TDX) is that:
> > before this patch, tdp_mmu_set_spte_atomic() cannot be used to atomically zap
> > non-leaf mirror SPTEs, since TDX requires child PTEs to be zapped before the
> > parent PTE;
> > after this patch, performing atomic zapping of non-leaf mirror SPTEs seems to be
> > allowed in TDP MMU since the above step 2.1 now occurs before step 2.2. However,
> > if step 2.2 fails after step 2.1 succeeds, step 3 cannot easily restore the real
> > old state.
> > So, if we allow atomic zap on the mirror root in the future, it looks like we
> > need to ensure atomic zapping of S-EPT cannot fail.
>
> I think we shouldn't comment these kind of TDX specifics. It won't confuse non-
> TDX developers working in the TDP MMU I think.
Ok.