[PATCH 2/3] mm/pagewalk: let folio_walk_start() run under the per-VMA lock

From: Rik van Riel

Date: Tue Jun 16 2026 - 15:04:07 EST


folio_walk_start() asserts that the mmap lock is held. For callers that
only need to read a single, already-present page, the mmap lock is a
heavy and often badly contended hammer: the VMA can instead be
stabilized with the per-VMA lock, and the page table pages that are
walked are kept alive by RCU page-table freeing
(CONFIG_MMU_GATHER_RCU_TABLE_FREE).

Add an FW_VMA_LOCKED flag. When passed, folio_walk_start() asserts the
per-VMA lock instead of the mmap lock, requires RCU-freed page tables,
and refuses hugetlb VMAs (PMD sharing cannot be walked safely this way).
Everything else folio_walk_start() relies on -- the page table locks,
pmdp_get_lockless() and pte_offset_map_lock() -- is already safe without
the mmap lock, mirroring the per-VMA lock page fault path.

No existing caller passes FW_VMA_LOCKED, so behaviour is unchanged.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
---
include/linux/pagewalk.h | 5 +++++
mm/pagewalk.c | 18 ++++++++++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index b41d7265c01b..84dd0d68f747 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -150,6 +150,11 @@ typedef int __bitwise folio_walk_flags_t;

/* Walk shared zeropages (small + huge) as well. */
#define FW_ZEROPAGE ((__force folio_walk_flags_t)BIT(0))
+/*
+ * The caller holds the per-VMA lock instead of the mmap lock. Only valid with
+ * RCU-freed page tables (CONFIG_MMU_GATHER_RCU_TABLE_FREE) and not for hugetlb.
+ */
+#define FW_VMA_LOCKED ((__force folio_walk_flags_t)BIT(1))

enum folio_walk_level {
FW_LEVEL_PTE,
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 3ae2586ff45b..c85364b73e12 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -890,7 +890,9 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
* huge_ptep_set_*, ...). Note that the page table entry stored in @fw might
* not correspond to the first physical entry of a logical hugetlb entry.
*
- * The mmap lock must be held in read mode.
+ * The mmap lock must be held in read mode. Alternatively, if @FW_VMA_LOCKED is
+ * passed, the VMA's per-VMA lock must be held (only supported with RCU-freed
+ * page tables, i.e. CONFIG_MMU_GATHER_RCU_TABLE_FREE, and not for hugetlb).
*
* Return: folio pointer on success, otherwise NULL.
*/
@@ -908,7 +910,19 @@ struct folio *folio_walk_start(struct folio_walk *fw,
pgd_t *pgdp;
p4d_t *p4dp;

- mmap_assert_locked(vma->vm_mm);
+ if (flags & FW_VMA_LOCKED) {
+ /*
+ * Lockless walk: the per-VMA lock keeps the VMA stable, and
+ * RCU-freed page tables keep the walked page table pages alive
+ * across the lockless upper-level walk and pte_offset_map_lock().
+ * Hugetlb (PMD sharing) is not supported on this path.
+ */
+ VM_WARN_ON_ONCE(!IS_ENABLED(CONFIG_MMU_GATHER_RCU_TABLE_FREE));
+ VM_WARN_ON_ONCE(is_vm_hugetlb_page(vma));
+ vma_assert_locked(vma);
+ } else {
+ mmap_assert_locked(vma->vm_mm);
+ }
vma_pgtable_walk_begin(vma);

if (WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end))
--
2.53.0-Meta