Re: [RFC PATCH v5 20/45] KVM: x86/mmu: Allocate/free S-EPT pages using tdx_{alloc,free}_control_page()

From: Huang, Kai

Date: Tue Feb 03 2026 - 06:18:14 EST


On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote:
> Now that kvm_mmu_memory_cache supports custom page allocators, wire up the
> S-EPT cache to use tdx_{alloc,free}_control_page() (arguably S-EPT pages
> aren't "control" pages, but they're not guest pages either). Using the
> TDX APIs will make S-EPT pages naturally play nice with Dynamic PAMT, by
> virtue of adding/removing PAMT entries when S-EPT pages are allocated and
> freed, as opposed to when they are added/removed from the S-EPT tree.
>
> Inserting into the PAMT entries on allocation does mean KVM will create
> unnecessary PAMT entries, e.g. once a vCPU stops faulting in memory, the
> remaining pages in the MMU cache will go unused. But in practice, odds
> are very good the containing 2MiB page will have other in-use S-EPT pages,
> i.e. will create PAMT entries anyways. And _if_ creating PAMT entries on
> allocation is problematic for memory consumption, that can be resolved by
> tweaking KVM's cache size.
>
> Suggested-by: Kai Huang <kai.huang@xxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>

Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>

Some nits below ..


[...]

> int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> u64 mirror_spte);
> -
> - /* Update external page tables for page table about to be freed. */
> void (*reclaim_external_sp)(struct kvm *kvm, gfn_t gfn,
> struct kvm_mmu_page *sp);
> -
> - /* Update external page table from spte getting removed, and flush TLB. */

The above two comments are still useful to me.

Not sure why do you want to remove them, especially in _this_ patch?

> void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> u64 mirror_spte);
>
> +

Unintentional change?

> bool (*has_wbinvd_exit)(void);
>
> u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 3911ac9bddfd..9b5a6861e2a4 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6690,11 +6690,13 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
> vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
> vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
>
> - vcpu->arch.mmu_shadow_page_cache.init_value =
> - SHADOW_NONPRESENT_VALUE;
> + vcpu->arch.mmu_shadow_page_cache.init_value = SHADOW_NONPRESENT_VALUE;
> if (!vcpu->arch.mmu_shadow_page_cache.init_value)
> vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;

Ditto. Not sure this adjustment is intentional?