Re: [PATCH v2 0/3] mm: __access_remote_vm with per-VMA lock

From: Rik van Riel

Date: Fri Jun 26 2026 - 18:55:53 EST


On Fri, 2026-06-26 at 22:33 +0200, David Hildenbrand (Arm) wrote:
>
> We really need someone to look into this with some GUP experience or
> the
> willingness to properly think the GUP lookup+fault path through,
> instead of
> adding some creative workarounds to selective GUP user.
>
> I will try to find some time to think it through, but my time would
> be better
> spent guiding someone (and definitely not someones LLM) to understand
>
> (1) which interface we could start with (as I said, a GUP interface
> where we
> pass a VMA-lcoked / mm-read-lcoked VMA instead of the MM)

__access_remote_vm seems like a decent place to start,
since that is a pain point for several people, and also
the path into remote get_user_pages with the most callers.

After that there are a few more callers in performance
sensitive paths:
- execve / argv setup -> new mm, no lock contention?
- KVM async pagefault path -> most/all of these to the same VMA?
- make_device_exclusive -> ???
- pin_user_pages_remote -> probably a good target?

>
> (2) which faults we could automatically resolve under VMA lock (I
> mentioned
> userfaultfd is tricky but the existing GUP call already doesn't
> trigger uffd)

It looks like for normal pages, any access that
does not require expanding the stack could be
done using just the per-VMA locks.

That might allow us to implement a locking
function for __access_remote_vm() that
takes the appropriate lock for the situation,
and also takes care of the expand_stack()
calls, if needed.

We could also take the slow path for an
access that spans multiple VMAs.

Then we could have an inner loop that does
only the page accessing and copying, with
no calls to expand_stack() in the while (len)
loop.

At the end of __access_remote_vm() we can
then unlock according to the way we locked.

Adjusting the locking assertions in __get_user_pages_locked
and untagged_addr_remote to allow the per-VMA lock seem
fairly straightforward.

Passing the starting vma all the way into __get_user_pages
would also allow us to skip looking up the VMA a second
time after the first lookup in __access_remote_vm().

>
> (3) whether gup-fast could be reused to some degree, or what it would
> take in
> order to do that.

If an access is entirely inside a single VMA,
it looks like using gup-fast could be possible?

The whole "look up VMA permissions" thing inside
get_user_pages() should not be necessary for callers
like __access_remote_vm() that have already done
that.

That could be a nice speedup for thing like
ptrace, BPF copy_from_user_task, etc

Another question is whether switching right
to gup-fast would be cleaner than trying to
elide the first VMA lookup from __get_user_pages,
but answering that may require looking at some
prototype code and figuring out more of the
details.

Thank you for asking the questions above.
Looking into those made things a little
clearer.

This looks a whole lot more manageable now.

Are there any major things I overlooked?

Are there any pending patches in -mm or
elsewhere that I should pull into my tree
before starting?

--
All Rights Reversed.