Re: [PATCH 2/3] mm/pagewalk: let folio_walk_start() run under the per-VMA lock
From: Lorenzo Stoakes
Date: Fri Jun 19 2026 - 08:35:26 EST
On Tue, Jun 16, 2026 at 03:02:59PM -0400, Rik van Riel wrote:
> folio_walk_start() asserts that the mmap lock is held. For callers that
> only need to read a single, already-present page, the mmap lock is a
> heavy and often badly contended hammer: the VMA can instead be
> stabilized with the per-VMA lock, and the page table pages that are
> walked are kept alive by RCU page-table freeing
> (CONFIG_MMU_GATHER_RCU_TABLE_FREE).
See below, I don't think this is correct?
>
> Add an FW_VMA_LOCKED flag. When passed, folio_walk_start() asserts the
> per-VMA lock instead of the mmap lock, requires RCU-freed page tables,
> and refuses hugetlb VMAs (PMD sharing cannot be walked safely this way).
This is mostly superfluous. You can just say you added the flag to use a VMA
flag. You put in parens the key thing about hugetlb, I think you should break
that out.
> Everything else folio_walk_start() relies on -- the page table locks,
> pmdp_get_lockless() and pte_offset_map_lock() -- is already safe without
is -> are.
> the mmap lock, mirroring the per-VMA lock page fault path.
I'm not sure I understand why you have to have RCU freed page tables but then
say that you didn't need it here? Strange to reference arbitrary functions from
folio_walk_start() too.
>
> No existing caller passes FW_VMA_LOCKED, so behaviour is unchanged.
>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> ---
> include/linux/pagewalk.h | 5 +++++
> mm/pagewalk.c | 18 ++++++++++++++++--
> 2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index b41d7265c01b..84dd0d68f747 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -150,6 +150,11 @@ typedef int __bitwise folio_walk_flags_t;
>
> /* Walk shared zeropages (small + huge) as well. */
> #define FW_ZEROPAGE ((__force folio_walk_flags_t)BIT(0))
> +/*
> + * The caller holds the per-VMA lock instead of the mmap lock. Only valid with
> + * RCU-freed page tables (CONFIG_MMU_GATHER_RCU_TABLE_FREE) and not for hugetlb.
> + */
Hang on how could we be freeing higher level page tables of a VMA that's still
locked?
A VMA lock stabilises page tables for traversal, so why do you require
CONFIG_MMU_GATHER_RCU_TABLE_FREE here?
What will free the higher-level page tables?
Ref: https://origin.kernel.org/doc/html/latest/mm/process_addrs.html#page-table
> +#define FW_VMA_LOCKED ((__force folio_walk_flags_t)BIT(1))
>
> enum folio_walk_level {
> FW_LEVEL_PTE,
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 3ae2586ff45b..c85364b73e12 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -890,7 +890,9 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
> * huge_ptep_set_*, ...). Note that the page table entry stored in @fw might
> * not correspond to the first physical entry of a logical hugetlb entry.
> *
> - * The mmap lock must be held in read mode.
> + * The mmap lock must be held in read mode. Alternatively, if @FW_VMA_LOCKED is
> + * passed, the VMA's per-VMA lock must be held (only supported with RCU-freed
> + * page tables, i.e. CONFIG_MMU_GATHER_RCU_TABLE_FREE, and not for hugetlb).
> *
> * Return: folio pointer on success, otherwise NULL.
> */
> @@ -908,7 +910,19 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> pgd_t *pgdp;
> p4d_t *p4dp;
>
> - mmap_assert_locked(vma->vm_mm);
> + if (flags & FW_VMA_LOCKED) {
> + /*
> + * Lockless walk: the per-VMA lock keeps the VMA stable, and
> + * RCU-freed page tables keep the walked page table pages alive
> + * across the lockless upper-level walk and pte_offset_map_lock().
Err, but we take locks as normal on the walk?
> + * Hugetlb (PMD sharing) is not supported on this path.
I don't get the explanation above and then you just write a line that says what
your assert is doing with zero explanation here.
You should explain why hugetlb isn't supported.
> + */
> + VM_WARN_ON_ONCE(!IS_ENABLED(CONFIG_MMU_GATHER_RCU_TABLE_FREE));
> + VM_WARN_ON_ONCE(is_vm_hugetlb_page(vma));
> + vma_assert_locked(vma);
> + } else {
> + mmap_assert_locked(vma->vm_mm);
> + }
> vma_pgtable_walk_begin(vma);
>
> if (WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end))
> --
> 2.53.0-Meta
>
Thanks, Lorenzo