Re: [PATCH 02/17] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"

From: Yan Zhao

Date: Tue Mar 31 2026 - 06:00:00 EST


On Tue, Mar 31, 2026 at 05:47:29PM +0800, Huang, Kai wrote:
> On Fri, 2026-03-27 at 13:14 -0700, Rick Edgecombe wrote:
> > From: Sean Christopherson <seanjc@xxxxxxxxxx>
> >
> > Pass a pointer to iter->old_spte, not simply its value, when setting an
> > external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
> > will be updated if the cmpxchg64 to freeze the mirror SPTE fails. The bug
> > is currently benign as TDX is mutualy exclusive with all paths that do
> > "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().
> >
> > Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables")
> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
> > ---
> > arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 7b1102d26f9c..dbaeb80f2b64 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -509,10 +509,10 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte, int level)
> > }
> >
> > static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sptep,
> > - gfn_t gfn, u64 old_spte,
> > + gfn_t gfn, u64 *old_spte,
> > u64 new_spte, int level)
> > {
> > - bool was_present = is_shadow_present_pte(old_spte);
> > + bool was_present = is_shadow_present_pte(*old_spte);
> > bool is_present = is_shadow_present_pte(new_spte);
> > bool is_leaf = is_present && is_last_spte(new_spte, level);
> > int ret = 0;
> > @@ -525,7 +525,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
> > * page table has been modified. Use FROZEN_SPTE similar to
> > * the zapping case.
> > */
> > - if (!try_cmpxchg64(rcu_dereference(sptep), &old_spte, FROZEN_SPTE))
> > + if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
> > return -EBUSY;
> >
> > /*
> > @@ -541,7 +541,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
> > ret = kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt);
> > }
> > if (ret)
> > - __kvm_tdp_mmu_write_spte(sptep, old_spte);
> > + __kvm_tdp_mmu_write_spte(sptep, *old_spte);
> > else
> > __kvm_tdp_mmu_write_spte(sptep, new_spte);
> > return ret;
> > @@ -670,7 +670,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
> > return -EBUSY;
> >
> > ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
> > - iter->old_spte, new_spte, iter->level);
> > + &iter->old_spte, new_spte, iter->level);
> > if (ret)
> > return ret;
> > } else {
>
> The __tdp_mmu_set_spte_atomic() has a WARN() at the beginning to check the
> iter->old_spte isn't a frozen SPTE:
>
> WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));
>
> Thinking more, I _think_ this patch could potentially trigger this WARNING
> due to now set_external_spte_present() will set iter->old_spte to
> FROZEN_SPTE when try_cmpxchg64() fails.
>
> Consider there are 3 vCPUs trying to accept the same GFN, and they all reach
> __tdp_mmu_set_spte_atomic() simultaneously. Assuming vCPU1 does the 
>
> if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
> return -EBUSY;
>
> .. successfully in set_external_spte_present(), then vCPU2 will fail on the
> try_cmpxchg64(), but this will cause iter->old_spte to be updated to
> FROZEN_SPTE.
>
> Then when vCPU3 enters __tdp_mmu_set_spte_atomic(), AFAICT the WARNING will
> be triggered due to is_frozen_spte(iter->old_spte) will now return true.

The failed caller needs to check "if (is_frozen_spte(iter.old_spte))" before
retrying, as in kvm_tdp_mmu_map()?

> Or did I miss anything?
>
> Also, AFAICT this issue doesn't exist for non-TDX case because there's no
> case tdp_mmu_set_spte_atomic() is called to set new_spte as FROZEN_SPTE in
> such case.