Re: [RFC PATCH v2 10/23] KVM: TDX: Enable huge page splitting under write kvm->mmu_lock
From: Huang, Kai
Date: Tue Nov 11 2025 - 05:36:52 EST
On Thu, 2025-08-07 at 17:43 +0800, Yan Zhao wrote:
> Implement the split_external_spt hook to enable huge page splitting for
> TDX when kvm->mmu_lock is held for writing.
>
> Invoke tdh_mem_range_block(), tdh_mem_track(), kicking off vCPUs,
> tdh_mem_page_demote() in sequence. All operations are performed under
> kvm->mmu_lock held for writing, similar to those in page removal.
>
> Even with kvm->mmu_lock held for writing, tdh_mem_page_demote() may still
> contend with tdh_vp_enter() and potentially with the guest's S-EPT entry
> operations. Therefore, kick off other vCPUs and prevent tdh_vp_enter()
> from being called on them to ensure success on the second attempt. Use
> KVM_BUG_ON() for any other unexpected errors.
I thought we also need to do UNBLOCK after DEMOTE, but it turns out we don't
need to. Maybe we can call this out.
[...]
>
> +static int tdx_spte_demote_private_spte(struct kvm *kvm, gfn_t gfn,
> + enum pg_level level, struct page *page)
> +{
> + int tdx_level = pg_level_to_tdx_sept_level(level);
> + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> + gpa_t gpa = gfn_to_gpa(gfn);
> + u64 err, entry, level_state;
> +
> + err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> + &entry, &level_state);
> +
> + if (unlikely(tdx_operand_busy(err))) {
> + tdx_no_vcpus_enter_start(kvm);
> + err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> + &entry, &level_state);
> + tdx_no_vcpus_enter_stop(kvm);
> + }
> +
> + if (KVM_BUG_ON(err, kvm)) {
> + pr_tdx_error_2(TDH_MEM_PAGE_DEMOTE, err, entry, level_state);
> + return -EIO;
> + }
> + return 0;
> +}
> +
> +static int tdx_sept_split_private_spt(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> + void *private_spt)
> +{
> + struct page *page = virt_to_page(private_spt);
> + int ret;
> +
> + if (KVM_BUG_ON(to_kvm_tdx(kvm)->state != TD_STATE_RUNNABLE ||
> + level != PG_LEVEL_2M, kvm))
> + return -EINVAL;
> +
> + ret = tdx_sept_zap_private_spte(kvm, gfn, level, page);
I don't quite follow why you pass 'private_spt' to
tdx_sept_zap_private_spte(), but it doesn't matter anymore since it's gone
in Sean's latest tree.