Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

From: Vishal Annapurve

Date: Fri Jan 09 2026 - 12:17:02 EST

On Fri, Jan 9, 2026 at 8:12 AM Vishal Annapurve <vannapurve@xxxxxxxxxx> wrote:
>
> > > >
> > >
> > > I think the central question I have among all the above is what TDX
> > > needs to actually care about (putting aside what KVM's folio size/memory
> > > contiguity vs mapping level rule for a while).
> > >
> > > I think TDX code can check what it cares about (if required to aid
> > > debugging, as Dave suggested). Does TDX actually care about folio sizes,
> > > or does it actually care about memory contiguity and alignment?
> > TDX cares about memory contiguity. A single folio ensures memory contiguity.
>
> In this slightly unusual case, I think the guarantee needed here is
> that as long as a range is mapped into SEPT entries, guest_memfd
> ensures that the complete range stays private.
>
> i.e. I think it should be safe to rely on guest_memfd here,
> irrespective of the folio sizes:
> 1) KVM TDX stack should be able to reclaim the complete range when unmapping.
> 2) KVM TDX stack can assume that as long as memory is mapped in SEPT
> entries, guest_memfd will not let host userspace mappings to access
> guest private memory.
>
> >
> > Allowing one S-EPT mapping to cover multiple folios may also mean it's no longer
> > reasonable to pass "struct page" to tdh_phymem_page_wbinvd_hkid() for a
> > contiguous range larger than the page's folio range.
>
> What's the issue with passing the (struct page*, unsigned long nr_pages) pair?
>
> >
> > Additionally, we don't split private mappings in kvm_gmem_error_folio().
> > If smaller folios are allowed, splitting private mapping is required there.
>
> Yes, I believe splitting private mappings will be invoked to ensure
> that the whole huge folio is not unmapped from KVM due to an error on
> just a 4K page. Is that a problem?
>
> If splitting fails, the implementation can fall back to completely
> zapping the folio range.

I forgot to mention that this is a future improvement that will
introduce hugetlb memory failure handling and is not covered by
Ackerley's current set of patches.

>
> > (e.g., after splitting a 1GB folio to 4KB folios with 2MB mappings. Also, is it
> > possible for splitting a huge folio to fail partially, without merging the huge
> > folio back or further zapping?).
>
> Yes, splitting can fail partially, but guest_memfd will not make the
> ranges available to host userspace and derivatives until:
> 1) The complete range to be converted is split to 4K granularity.
> 2) The complete range to be converted is zapped from KVM EPT mappings.
>
> > Not sure if there're other edge cases we're still missing.