Re: [v3 07/24] mm: thp: retry on split failure in change_pmd_range()

From: Kiryl Shutsemau

Date: Mon Mar 30 2026 - 10:34:54 EST


On Thu, Mar 26, 2026 at 07:08:49PM -0700, Usama Arif wrote:
> change_pmd_range() splits a huge PMD when mprotect() targets a sub-PMD
> range or when VMA flags require per-PTE protection bits that can't be
> represented at PMD granularity.
>
> If pte_alloc_one() fails inside __split_huge_pmd(), the huge PMD remains
> intact. Without this change, change_pte_range() would return -EAGAIN
> because pte_offset_map_lock() returns NULL for a huge PMD, sending the
> code back to the 'again' label to retry the split—without ever calling
> cond_resched().
>
> Now that __split_huge_pmd() returns an error code, handle it explicitly:
> yield the CPU with cond_resched() and retry via goto again, giving other
> tasks a chance to free memory.
>
> Trying to return an error all the way to change_protection_range would
> not work as it would leave a memory range with new protections, and
> others unchanged, with no easy way to roll back the already modified
> entries (and previous splits). __split_huge_pmd only requires an
> order-0 allocation and is extremely unlikely to fail.

I think this is wrong approach. We need to split page tables upfront
before going into depth of change_protection() and doing irreversible
changes.

Conceptually, it should be similar to vma_adjust_trans_huge() in vma
split/merge paths.

--
Kiryl Shutsemau / Kirill A. Shutemov