Re: [PATCH v3 11/24] KVM: x86/mmu: Introduce kvm_split_cross_boundary_leafs()

From: Vishal Annapurve

Date: Tue Jan 20 2026 - 12:57:46 EST


On Fri, Jan 16, 2026 at 3:39 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Jan 15, 2026, Kai Huang wrote:
> > static int __kvm_tdp_mmu_split_huge_pages(struct kvm *kvm,
> > struct kvm_gfn_range *range,
> > int target_level,
> > bool shared,
> > bool cross_boundary_only)
> > {
> > ...
> > }
> >
> > And by using this helper, I found the name of the two wrapper functions
> > are not ideal:
> >
> > kvm_tdp_mmu_try_split_huge_pages() is only for log dirty, and it should
> > not be reachable for TD (VM with mirrored PT). But currently it uses
> > KVM_VALID_ROOTS for root filter thus mirrored PT is also included. I
> > think it's better to rename it, e.g., at least with "log_dirty" in the
> > name so it's more clear this function is only for dealing log dirty (at
> > least currently). We can also add a WARN() if it's called for VM with
> > mirrored PT but it's a different topic.
> >
> > kvm_tdp_mmu_gfn_range_split_cross_boundary_leafs() doesn't have
> > "huge_pages", which isn't consistent with the other. And it is a bit
> > long. If we don't have "gfn_range" in __kvm_tdp_mmu_split_huge_pages(),
> > then I think we can remove "gfn_range" from
> > kvm_tdp_mmu_gfn_range_split_cross_boundary_leafs() too to make it shorter.
> >
> > So how about:
> >
> > Rename kvm_tdp_mmu_try_split_huge_pages() to
> > kvm_tdp_mmu_split_huge_pages_log_dirty(), and rename
> > kvm_tdp_mmu_gfn_range_split_cross_boundary_leafs() to
> > kvm_tdp_mmu_split_huge_pages_cross_boundary()
> >
> > ?
>
> I find the "cross_boundary" termininology extremely confusing. I also dislike
> the concept itself, in the sense that it shoves a weird, specific concept into
> the guts of the TDP MMU.
>
> The other wart is that it's inefficient when punching a large hole. E.g. say
> there's a 16TiB guest_memfd instance (no idea if that's even possible), and then
> userpace punches a 12TiB hole. Walking all ~12TiB just to _maybe_ split the head
> and tail pages is asinine.
>
> And once kvm_arch_pre_set_memory_attributes() is dropped, I'm pretty sure the
> _only_ usage is for guest_memfd PUNCH_HOLE, because unless I'm misreading the
> code, the usage in tdx_honor_guest_accept_level() is superfluous and confusing.
>
> For the EPT violation case, the guest is accepting a page. Just split to the
> guest's accepted level, I don't see any reason to make things more complicated
> than that.
>
> And then for the PUNCH_HOLE case, do the math to determine which, if any, head
> and tail pages need to be split, and use the existing APIs to make that happen.

Just a note: Through guest_memfd upstream syncs, we agreed that
guest_memfd will only allow the punch_hole operation for huge page
size-aligned ranges for hugetlb and thp backing. i.e. the PUNCH_HOLE
operation doesn't need to split any EPT mappings for foreseeable
future.