Re: [PATCH 17/17] KVM: TDX: Move external page table freeing to TDX code
From: Yan Zhao
Date: Mon Mar 30 2026 - 04:36:28 EST
On Fri, Mar 27, 2026 at 01:14:21PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@xxxxxxxxxx>
>
> Move the freeing of external page tables into the reclaim operation that
> lives in TDX code.
>
> The TDP MMU supports traversing the TDP without holding locks. Page
> tables needs to be freed via RCU to prevent walking one that gets
> freed.
>
> While none of these lockless walk operations actually happen for the
> mirror EPT, the TDP MMU none-the-less frees the mirror EPT page tables in
> the same way, and because it’s a handy place to plug it in, the external
> page tables as well.
>
> However, the external page tables definitely can’t be walked once they are
> reclaimed from the TDX module. The TDX module releases the page for the
> host VMM to use, so this RCU-time free is unnecessary for external page
> tables.
>
> So move the free_page() call to TDX code. Create an
> tdp_mmu_free_unused_sp() to allow for freeing external page tables that
> have never left the TDP MMU code (i.e. don’t need freed in a special way.
>
> Link: https://lore.kernel.org/kvm/aYpjNrtGmogNzqwT@xxxxxxxxxx/
> Not-yet-Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> [Based on a diff by Sean, added log]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 16 +++++++++++-----
> arch/x86/kvm/vmx/tdx.c | 11 ++++++++++-
> 2 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 575033cc7fe4..18e11c1c7631 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -53,13 +53,18 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
> rcu_barrier();
> }
>
> -static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> +static void __tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> {
> - free_page((unsigned long)sp->external_spt);
> free_page((unsigned long)sp->spt);
> kmem_cache_free(mmu_page_header_cache, sp);
> }
>
> +static void tdp_mmu_free_unused_sp(struct kvm_mmu_page *sp)
> +{
> + free_page((unsigned long)sp->external_spt);
> + __tdp_mmu_free_sp(sp);
> +}
> +
> /*
> * This is called through call_rcu in order to free TDP page table memory
> * safely with respect to other kernel threads that may be operating on
> @@ -73,7 +78,8 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
> struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page,
> rcu_head);
>
> - tdp_mmu_free_sp(sp);
> + WARN_ON_ONCE(sp->external_spt);
> + __tdp_mmu_free_sp(sp);
> }
>
> void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
> @@ -1261,7 +1267,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> * failed, e.g. because a different task modified the SPTE.
> */
> if (r) {
> - tdp_mmu_free_sp(sp);
> + tdp_mmu_free_unused_sp(sp);
> goto retry;
> }
>
> @@ -1571,7 +1577,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
> * installs its own sp in place of the last sp we tried to split.
> */
> if (sp)
> - tdp_mmu_free_sp(sp);
> + tdp_mmu_free_unused_sp(sp);
>
> return 0;
> }
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index d064b40a6b31..1346e891ca94 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1782,7 +1782,16 @@ static void tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
> */
> if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) ||
> tdx_reclaim_page(virt_to_page(sp->external_spt)))
> - sp->external_spt = NULL;
> + goto out;
> +
> + /*
> + * Immediately free the S-EPT page as the TDX subsystem doesn't support
> + * freeing pages from RCU callbacks, and more importantly because
> + * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding readers.
> + */
> + free_page((unsigned long)sp->external_spt);
Since sp->external_spt is allocated in kvm_mmu_alloc_external_spt() in a similar
way as sp->spt in tdp_mmu_alloc_sp(), why bother making freeing of
sp->external_spt special? i.e., what's the downside of freeing it together with
sp->spt in tdp_mmu_free_sp() ?
> +out:
> + sp->external_spt = NULL;
> }
>
> static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> --
> 2.53.0
>