Re: [v3 05/24] mm: thp: handle split failure in zap_pmd_range()
From: David Hildenbrand (Arm)
Date: Mon Mar 30 2026 - 11:12:40 EST
On 3/30/26 16:13, Kiryl Shutsemau wrote:
> On Thu, Mar 26, 2026 at 07:08:47PM -0700, Usama Arif wrote:
>> zap_pmd_range() splits a huge PMD when the zap range doesn't cover the
>> full PMD (partial unmap). If the split fails, the PMD stays huge.
>> Falling through to zap_pte_range() would dereference the huge PMD entry
>> as a PTE page table pointer.
>>
>> Skip the range covered by the PMD on split failure instead.
>
> Ughh... This is hacky as hell.
>
>> The skip is safe across all call paths into zap_pmd_range():
>>
>> - exit_mmap() and OOM reaper: the zap range covers entire VMAs, so
>> every PMD is fully covered (next - addr == HPAGE_PMD_SIZE). The
>> zap_huge_pmd() branch handles these without splitting. The split
>> failure path is unreachable.
>>
>> - munmap / mmap overlay: vma_adjust_trans_huge() (called from
>> __split_vma) splits any PMD straddling the VMA boundary before the
>> VMA is split. If that PMD split fails, __split_vma() returns
>> -ENOMEM and the munmap is aborted before reaching zap_pmd_range().
>> The split failure path is unreachable.
>>
>> - MADV_DONTNEED: advisory hint, the kernel is allowed to ignore it.
>> The pages remain valid and accessible. A subsequent access returns
>> existing data without faulting.
>
> Em, no. MADV_DONTNEED users expect memory to be zeroed after the
> "advise" is complete. At very least you need to zero the skipped range.
Fully agreed. This definitely needs more thought :)
--
Cheers,
David