Re: [PATCH v2 3/5] mm: Add RCU-based VMA lookup helper that waits for writers
From: Suren Baghdasaryan
Date: Thu Jun 11 2026 - 16:37:41 EST
On Wed, Jun 10, 2026 at 4:04 PM Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote:
>
>
> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>
> == Background ==
>
> There are basically two parallel ways to look up a VMA: the
> traditional way, which is protected by mmap_lock, and the RCU-based
"which is protected by mmap_lock" better be more specific "which is
protected by mmap_read_lock".
> per-VMA lock way which is based on RCU and refcounts.
>
> == Problem ==
>
> The mmap_lock one is more straightforward to use but it has a big
> disadvantage in that it can not be mixed with page faults since those
> can take mmap_lock for read, which can deadlock when mixed with page
> faults.
"when mixed with page faults" or "when mixed with address space
modifiers that take mmap_write_lock"? I think that's what your example
shows.
> For example:
>
> mmap_read_lock(mm);
> // Another thread does mmap_write_lock().
> // New mmap_lock readers are blocked.
> vma = vma_lookup(mm, address);
> // This deadlocks on mmap_read_lock() if it faults:
> copy_from_user(address);
> mmap_read_unlock(mm);
Ultimately the problem here is calling something that might require
mmap_read_lock(mm) (in this case copy_from_user()) while already
holding mmap_read_lock(mm). Normally that works up until you throw an
mmap_write_lock(mm) in the mix.
>
> The RCU one can be mixed with faults, but it is not available in all
> configs, so all RCU users need to be able to fall back to the
> traditional way.
>
> == Solution ==
>
> Add a variant of the RCU-based lookup that waits for writers. This is
> basically the same as the existing RCU-based lookup, but it also takes
> mmap_lock for read and waits for writers to finish before returning
> the VMA. This has some advantages:
>
> 1. Callers do not need to have a fallback path for when they
> collide with writers.
> 2. It can be used in contexts where page faults can happen because
> it can take the mmap_lock for read but never *holds* it.
> 3. Its fast path does not require taking mmap_lock for read.
>
> Basically, when applied correctly, this approach results in faster
> *and* simpler code.
>
> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: "Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx>
> Cc: Lorenzo Stoakes <ljs@xxxxxxxxxx>
> Cc: Vlastimil Babka <vbabka@xxxxxxxxxx>
> Cc: Shakeel Butt <shakeel.butt@xxxxxxxxx>
> Cc: linux-mm@xxxxxxxxx
> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: Arve Hjønnevåg <arve@xxxxxxxxxxx>
> Cc: Todd Kjos <tkjos@xxxxxxxxxxx>
> Cc: Christian Brauner <christian@xxxxxxxxxx>
> Cc: Carlos Llamas <cmllamas@xxxxxxxxxx>
> Cc: Alice Ryhl <aliceryhl@xxxxxxxxxx>
> Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
> Cc: David Ahern <dsahern@xxxxxxxxxx>
> Cc: netdev@xxxxxxxxxxxxxxx
>
> --
>
> Changes from v1:
> * Add a comment explaining that this can not be mixed with other
> per-VMA lock or mmap_lock users. It is prone to deadlocks if so.
> * Add a FIXME about making the mmap_read_lock() killable
> * Add more chaneglog bits about the possibility for an infinite goto
> loop.
> * Adopt vma_start_read_unlocked() implementation from Lorenzo
> ---
>
> b/include/linux/mmap_lock.h | 3 +++
> b/mm/mmap_lock.c | 27 +++++++++++++++++++++++++++
> 2 files changed, 30 insertions(+)
>
> diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h
> --- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait 2026-06-10 15:57:55.828431712 -0700
> +++ b/include/linux/mmap_lock.h 2026-06-10 15:57:55.834431925 -0700
> @@ -257,6 +257,9 @@ static inline bool vma_start_read_locked
> return vma_start_read_locked_nested(vma, 0);
> }
>
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> + unsigned long address);
> +
> static inline void vma_end_read(struct vm_area_struct *vma)
> {
> vma_refcount_put(vma);
> diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c
> --- a/mm/mmap_lock.c~lock-vma-under-rcu-wait 2026-06-10 15:57:55.831431819 -0700
> +++ b/mm/mmap_lock.c 2026-06-10 16:02:50.723860779 -0700
> @@ -338,6 +338,33 @@ inval:
> return NULL;
> }
>
> +/*
> + * Find the VMA covering 'address' and lock it for reading. Waits for writers to
> + * finish if the VMA is being modified. Returns NULL if there is no VMA covering
> + * 'address'.
> + *
> + * Use only in code paths where no mmap_lock and no VMA lock is held.
> + *
> + * The fast path does not take mmap_lock.
> + */
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> + unsigned long address)
It's harder to review a function without a user but I guess users will
be added later in this patchset.
> +{
> + struct vm_area_struct *vma;
> +
> + /* Fast path: return stable VMA covering 'address': */
> + vma = lock_vma_under_rcu(mm, address);
> + if (vma)
> + return vma;
> +
> + /* Slow path: preclude VMA writers by getting mmap read lock. */
> + guard(rwsem_read)(&mm->mmap_lock);
guard() is nice but mmap_read_{lock|unlock} has those
__mmap_lock_trace_* traces which we lose with guard(). Not sure if
that's a good enough reason to keep using older primitives.
> + if (!vma_start_read_locked(vma))
> + return NULL;
> +
> + return vma;
> +}
> +
> static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm,
> struct vma_iterator *vmi,
> unsigned long from_addr)
> _