Re: [RFC PATCH v5 08/45] KVM: x86/mmu: Propagate mirror SPTE removal to S-EPT in handle_changed_spte()

From: Sean Christopherson

Date: Fri Feb 13 2026 - 19:37:19 EST

On Wed, Feb 11, 2026, Yan Zhao wrote:
> On Tue, Feb 10, 2026 at 11:52:09AM -0800, Sean Christopherson wrote:
> > > > +static void handle_changed_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> > > > + gfn_t gfn, u64 old_spte, u64 new_spte,
> > > > + int level, bool shared)
> > > > +{
> > > Do we need "WARN_ON_ONCE(is_mirror_sptep(sptep) && shared)" here ?
> >
> > No, because I want to call this code for all paths, including the fault path.
> Hmm. IIUC, handle_changed_spte() can't be invoked for mirror root under read
> mmu_lock.
> For read mmu_lock + mirror scenarios, they need to invoke
> tdp_mmu_set_spte_atomic() --> __handle_changed_spte().

Oh, sorry, I misread that. Now I see what you're saying. I think I'd still prefer
to omit the WARN? Because there's nothing inherently wrong with using
handle_changed_spte(). E.g. if the caller can somehow guarantee success, then
using handle_changed_spte() is a-ok.

> Besides, __handle_changed_spte() contains code like
> "kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);", which may have
> incorrectly updated the stats even if kvm_x86_call(set_external_spte)() fails
> later and the new_spte is never written to iter->sptep.

Oof, now _that_ is an actual problem. This is the least-ugly fix I can come up
with. Note, this will mean the trace order is "wrong" when removing a non-mirror
page table, as KVM will zap the page table before its children. I doubt that'll
be a problem in practice, so I'm inclined to take the simpler code.

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index d395da35d5e4..4ba789f2824d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -493,6 +493,7 @@ static int __handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
bool is_leaf = is_present && is_last_spte(new_spte, level);
bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
int as_id = kvm_mmu_page_as_id(sp);
+ int r;

WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL);
WARN_ON_ONCE(level < PG_LEVEL_4K);
@@ -524,8 +525,6 @@ static int __handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
if (old_spte == new_spte)
return 0;

- trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
-
if (is_leaf)
check_spte_writable_invariants(new_spte);

@@ -554,9 +553,6 @@ static int __handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
return 0;
}

- if (is_leaf != was_leaf)
- kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
-
/*
* Recursively handle child PTs if the change removed a subtree from
* the paging structure. Note the WARN on the PFN changing without the
@@ -567,11 +563,19 @@ static int __handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp,
* changes to the external SPTE.
*/
if (was_present && !was_leaf &&
- (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
+ (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) {
handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
- else if (is_mirror_sp(sp))
- return kvm_x86_call(set_external_spte)(kvm, gfn, old_spte,
- new_spte, level);
+ } else if (is_mirror_sp(sp)) {
+ r = kvm_x86_call(set_external_spte)(kvm, gfn, old_spte,
+ new_spte, level);
+ if (r)
+ return r;
+ }
+
+ trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
+
+ if (is_leaf != was_leaf)
+ kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);

return 0;
}

> > > 3. set *iter->sptep to new_spte
> > >
> > > what if __handle_changed_spte() reads *iter->sptep in step 2?
> >
> > For the most part, "don't do that". There are an infinite number of "what ifs".
> > I agree that re-reading iter->sptep is slightly more likely than other "what ifs",
> > but then if we convert to a boolean it creates the "what if we swap the order of
> > @as_id and @is_mirror_sp"? Given that @old_spte is provided, IMO re-reading the
> > SPTE from memory will stand out.
> As my above concern, re-reading SPTE in __handle_changed_spte() will just get
> value FROZEN_SPTE instead of the value of new_spte.
>
> > That said, I think we can have the best of both worlds. Rather than pass @as_id
> > and @sptep, pass the @sp, i.e. the owning kvm_mmu_page. That would address your
> > concern about re-reading the sptep, without needing another boolean.
> Hmm, my intention of passing boolean is to avoid re-reading sptep, because
> in step 2, we pass new_spte instead of the real value in sptep (which is
> FROZEN_SPTE for mirror sp) to __handle_changed_spte().
> So, passing @sp may not help?

It won't prevent someone that's bound and determined to introduce a bug from
re-reading the sptep, but it most definitely helps. To get at the sptep, someone
would have to compute its index based off @gfn and then look it up in @sp->spt.
At that point, they've earned the bug :-)