Re: [v3 05/24] mm: thp: handle split failure in zap_pmd_range()

From: Kiryl Shutsemau

Date: Mon Mar 30 2026 - 10:18:35 EST


On Thu, Mar 26, 2026 at 07:08:47PM -0700, Usama Arif wrote:
> zap_pmd_range() splits a huge PMD when the zap range doesn't cover the
> full PMD (partial unmap). If the split fails, the PMD stays huge.
> Falling through to zap_pte_range() would dereference the huge PMD entry
> as a PTE page table pointer.
>
> Skip the range covered by the PMD on split failure instead.

Ughh... This is hacky as hell.

> The skip is safe across all call paths into zap_pmd_range():
>
> - exit_mmap() and OOM reaper: the zap range covers entire VMAs, so
> every PMD is fully covered (next - addr == HPAGE_PMD_SIZE). The
> zap_huge_pmd() branch handles these without splitting. The split
> failure path is unreachable.
>
> - munmap / mmap overlay: vma_adjust_trans_huge() (called from
> __split_vma) splits any PMD straddling the VMA boundary before the
> VMA is split. If that PMD split fails, __split_vma() returns
> -ENOMEM and the munmap is aborted before reaching zap_pmd_range().
> The split failure path is unreachable.
>
> - MADV_DONTNEED: advisory hint, the kernel is allowed to ignore it.
> The pages remain valid and accessible. A subsequent access returns
> existing data without faulting.

Em, no. MADV_DONTNEED users expect memory to be zeroed after the
"advise" is complete. At very least you need to zero the skipped range.

And are you sure that the list of users is complete?

I am also worried about a possible new user that is not aware about this
skip-on-split-failure semantics.

I think it hast o be opt-in. Maybe a ZAP_FLAG_WHATEVER?

--
Kiryl Shutsemau / Kirill A. Shutemov