Re: [RFC PATCH v2 10/23] KVM: TDX: Enable huge page splitting under write kvm->mmu_lock
From: Binbin Wu
Date: Mon Nov 17 2025 - 04:23:11 EST
On 11/13/2025 1:53 PM, Yan Zhao wrote:
On Tue, Nov 11, 2025 at 06:20:40PM +0800, Huang, Kai wrote:
On Thu, 2025-08-07 at 17:43 +0800, Yan Zhao wrote:
Implement the split_external_spt hook to enable huge page splitting for
Nit:
split_external_spt(), similar as Kai mentioned in patch 9.
Yes, the BLOCK operates on PG_LEVEL_2M, and a successful DEMOTE updates the SEPTTDX when kvm->mmu_lock is held for writing.I thought we also need to do UNBLOCK after DEMOTE, but it turns out we don't
Invoke tdh_mem_range_block(), tdh_mem_track(), kicking off vCPUs,
tdh_mem_page_demote() in sequence. All operations are performed under
kvm->mmu_lock held for writing, similar to those in page removal.
Even with kvm->mmu_lock held for writing, tdh_mem_page_demote() may still
contend with tdh_vp_enter() and potentially with the guest's S-EPT entry
operations. Therefore, kick off other vCPUs and prevent tdh_vp_enter()
from being called on them to ensure success on the second attempt. Use
KVM_BUG_ON() for any other unexpected errors.
need to.
non-leaf 2MB entry to point to the newly added page table page with RWX
permission, so there's no need to do UNBLOCK on success.
The purpose of BLOCK + TRACK + kick off vCPUs is to ensure all vCPUs must find
the old huge guest page is no longer mapped in the SEPT.
Maybe we can call this out.Will do.
Simply because tdx_sept_zap_private_spte() requires a "page", which is actually+static int tdx_spte_demote_private_spte(struct kvm *kvm, gfn_t gfn,I don't quite follow why you pass 'private_spt' to
+ enum pg_level level, struct page *page)
+{
+ int tdx_level = pg_level_to_tdx_sept_level(level);
+ struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+ gpa_t gpa = gfn_to_gpa(gfn);
+ u64 err, entry, level_state;
+
+ err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
+ &entry, &level_state);
+
+ if (unlikely(tdx_operand_busy(err))) {
+ tdx_no_vcpus_enter_start(kvm);
+ err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
+ &entry, &level_state);
+ tdx_no_vcpus_enter_stop(kvm);
+ }
+
+ if (KVM_BUG_ON(err, kvm)) {
+ pr_tdx_error_2(TDH_MEM_PAGE_DEMOTE, err, entry, level_state);
+ return -EIO;
+ }
+ return 0;
+}
+
+static int tdx_sept_split_private_spt(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+ void *private_spt)
+{
+ struct page *page = virt_to_page(private_spt);
+ int ret;
+
+ if (KVM_BUG_ON(to_kvm_tdx(kvm)->state != TD_STATE_RUNNABLE ||
+ level != PG_LEVEL_2M, kvm))
+ return -EINVAL;
+
+ ret = tdx_sept_zap_private_spte(kvm, gfn, level, page);
tdx_sept_zap_private_spte(),
not used by tdx_sept_zap_private_spte() in the split path.
but it doesn't matter anymore since it's goneRight.
in Sean's latest tree.