Re: [PATCH 2/3] mm/pagewalk: let folio_walk_start() run under the per-VMA lock

From: Rik van Riel

Date: Thu Jun 25 2026 - 07:20:46 EST


On Thu, 2026-06-25 at 08:34 +0100, Lorenzo Stoakes wrote:
> Rik, it really would have helped if you'd replied to review :)
>
> On Wed, Jun 24, 2026 at 09:50:52PM -0400, Rik van Riel wrote:
> > folio_walk_start() asserts the mmap lock is held.  For callers that
> > only
> > need to read a single, already-present page, the mmap lock is a
> > heavy and
> > often badly contended hammer.  Such a caller can instead hold the
> > per-VMA
> > lock, which keeps the VMA itself stable.
>
> <newline>
>
> > The per-VMA lock does not, however, keep the page tables walked
> > below that
> > VMA from being freed.  A concurrent munmap() or THP collapse of an
> > adjacent region in the same mm can free a shared upper-level table,
> > and
>
> Yeah I need to update the documentation on this at
> https://docs.kernel.org/mm/process_addrs.html it's more subtle than
> written
> there.
>
> Firstly you're wrong about munmap() - it acquires the VMA lock of the
> VMAs freed
> in the range and will only remove an upper level table if the entire
> range is
> spanned.
>
> And that's the only way higher level tables can be removed.
>
> PTE page tables can be removed via MADV_DONTNEED, but that a.
> acquires the VMA
> lock and b. frees the PTE page table under RCU.
>
> A THP collapse can happen concurrently, but PTEs are freed under RCU
> so you
> don't need to do this GUP fast imitating stuff.
>
> > THP collapse (collapse_huge_page() -> retract_page_tables()) frees
> > page
> > tables of VMAs whose lock it does not hold.  Page table freeing
>
> retract_page_tables() -> pte_free_defer() -> RCU
> try_collapse_pte_mapped_thp() -> pte_free_defer() -> RCU

One issue here is that while we can safely read
the old page table under the RCU read lock, in
the middle of a THP collapse there is no guarantee
that the old page table points at the process's 
current memory.

Khugepaged could fix this in one of two ways:
- zap all readers with an IPI, and use that as
synchronization
- make sure the old page table's PTEs point at
the individual pages inside the new PMD

Right now khugepaged does the first.

Relying only on the RCU read lock to read the
page table could result in us seeing old page
table contents, that no longer point at the
current process memory.

Unless I'm missing something...

--
All Rights Reversed.