[PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers
From: Dave Hansen
Date: Wed Jun 10 2026 - 19:06:58 EST
From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
== Background ==
There are basically two parallel ways to look up a VMA: the
traditional way, which is protected by mmap_lock, and the RCU-based
per-VMA lock way which is based on RCU and refcounts.
== Problem ==
The mmap_lock one is more straightforward to use but it has a big
disadvantage in that it can not be mixed with page faults since those
can take mmap_lock for read, which can deadlock when mixed with page
faults. For example:
mmap_read_lock(mm);
// Another thread does mmap_write_lock().
// New mmap_lock readers are blocked.
vma = vma_lookup(mm, address);
// This deadlocks on mmap_read_lock() if it faults:
copy_from_user(address);
mmap_read_unlock(mm);
The RCU one can be mixed with faults, but it is not available in all
configs, so all RCU users need to be able to fall back to the
traditional way.
== Solution ==
Add a variant of the RCU-based lookup that waits for writers. This is
basically the same as the existing RCU-based lookup, but it also takes
mmap_lock for read and waits for writers to finish before returning
the VMA. This has some advantages:
1. Callers do not need to have a fallback path for when they
collide with writers.
2. It can be used in contexts where page faults can happen because
it can take the mmap_lock for read but never *holds* it.
3. Its fast path does not require taking mmap_lock for read.
Basically, when applied correctly, this approach results in faster
*and* simpler code.
Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>
Cc: Lorenzo Stoakes <ljs@xxxxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxxxxx>
Cc: Shakeel Butt <shakeel.butt@xxxxxxxxx>
Cc: linux-mm@xxxxxxxxx
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Arve Hjønnevåg <arve@xxxxxxxxxxx>
Cc: Todd Kjos <tkjos@xxxxxxxxxxx>
Cc: Christian Brauner <christian@xxxxxxxxxx>
Cc: Carlos Llamas <cmllamas@xxxxxxxxxx>
Cc: Alice Ryhl <aliceryhl@xxxxxxxxxx>
Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
Cc: David Ahern <dsahern@xxxxxxxxxx>
Cc: netdev@xxxxxxxxxxxxxxx
--
Changes from v1:
* Add a comment explaining that this can not be mixed with other
per-VMA lock or mmap_lock users. It is prone to deadlocks if so.
* Add a FIXME about making the mmap_read_lock() killable
* Add more chaneglog bits about the possibility for an infinite goto
loop.
* Adopt vma_start_read_unlocked() implementation from Lorenzo
---
b/include/linux/mmap_lock.h | 3 +++
b/mm/mmap_lock.c | 27 +++++++++++++++++++++++++++
2 files changed, 30 insertions(+)
diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h
--- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait 2026-06-10 15:57:55.828431712 -0700
+++ b/include/linux/mmap_lock.h 2026-06-10 15:57:55.834431925 -0700
@@ -257,6 +257,9 @@ static inline bool vma_start_read_locked
return vma_start_read_locked_nested(vma, 0);
}
+struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
+ unsigned long address);
+
static inline void vma_end_read(struct vm_area_struct *vma)
{
vma_refcount_put(vma);
diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c
--- a/mm/mmap_lock.c~lock-vma-under-rcu-wait 2026-06-10 15:57:55.831431819 -0700
+++ b/mm/mmap_lock.c 2026-06-10 16:02:50.723860779 -0700
@@ -338,6 +338,33 @@ inval:
return NULL;
}
+/*
+ * Find the VMA covering 'address' and lock it for reading. Waits for writers to
+ * finish if the VMA is being modified. Returns NULL if there is no VMA covering
+ * 'address'.
+ *
+ * Use only in code paths where no mmap_lock and no VMA lock is held.
+ *
+ * The fast path does not take mmap_lock.
+ */
+struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
+ unsigned long address)
+{
+ struct vm_area_struct *vma;
+
+ /* Fast path: return stable VMA covering 'address': */
+ vma = lock_vma_under_rcu(mm, address);
+ if (vma)
+ return vma;
+
+ /* Slow path: preclude VMA writers by getting mmap read lock. */
+ guard(rwsem_read)(&mm->mmap_lock);
+ if (!vma_start_read_locked(vma))
+ return NULL;
+
+ return vma;
+}
+
static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
struct vma_iterator *vmi,
unsigned long from_addr)
_