Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

From: Vishal Annapurve

Date: Tue Jan 13 2026 - 11:40:26 EST

On Mon, Jan 12, 2026 at 10:13 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
>
> > >> > >>
> > >> > >> Additionally, we don't split private mappings in kvm_gmem_error_folio().
> > >> > >> If smaller folios are allowed, splitting private mapping is required there.
> > >> >
> > >> > It was discussed before that for memory failure handling, we will want
> > >> > to split huge pages, we will get to it! The trouble is that guest_memfd
> > >> > took the page from HugeTLB (unlike buddy or HugeTLB which manages memory
> > >> > from the ground up), so we'll still need to figure out it's okay to let
> > >> > HugeTLB deal with it when freeing, and when I last looked, HugeTLB
> > >> > doesn't actually deal with poisoned folios on freeing, so there's more
> > >> > work to do on the HugeTLB side.
> > >> >
> > >> > This is a good point, although IIUC it is a separate issue. The need to
> > >> > split private mappings on memory failure is not for confidentiality in
> > >> > the TDX sense but to ensure that the guest doesn't use the failed
> > >> > memory. In that case, contiguity is broken by the failed memory. The
> > >> > folio is split, the private EPTs are split. The folio size should still
> > >> > not be checked in TDX code. guest_memfd knows contiguity got broken, so
> > >> > guest_memfd calls TDX code to split the EPTs.
> > >>
> > >> Hmm, maybe the key is that we need to split S-EPT first before allowing
> > >> guest_memfd to split the backend folio. If splitting S-EPT fails, don't do the
> > >> folio splitting.
> > >>
> > >> This is better than performing folio splitting while it's mapped as huge in
> > >> S-EPT, since in the latter case, kvm_gmem_error_folio() needs to try to split
> > >> S-EPT. If the S-EPT splitting fails, falling back to zapping the huge mapping in
> > >> kvm_gmem_error_folio() would still trigger the over-zapping issue.
> > >>
> >
> > Let's put memory failure handling aside for now since for now it zaps
> > the entire huge page, so there's no impact on ordering between S-EPT and
> > folio split.
> Relying on guest_memfd's specific implemenation is not a good thing. e.g.,
>
> Given there's a version of guest_memfd allocating folios from buddy.
> 1. KVM maps a 2MB folio in a 2MB mappings.
> 2. guest_memfd splits the 2MB folio into 4KB folios, but fails and leaves the
> 2MB folio partially split.
> 3. Memory failure occurs on one of the split folio.
> 4. When splitting S-EPT fails, the over-zapping issue is still there.
>

Why is overzapping an issue?

Memory failure is supposed to be a rare occurrence and if there is no
memory to handle the splitting, I don't see any other choice than
overzapping. IIUC splitting the huge page range (in 1G -> 4K scenario)
requires even more memory than just splitting cross-boundary leaves
and has a higher chance of failing.

i.e. Whether the folio is split first or the SEPTs, there is always a
chance of failure leading to over-zapping. I don't see value in
optimizing rare failures within rarer memory failure handling
codepaths which are supposed to make best-effort decisions anyway.