Re: [PATCH 0/3] mm: __access_remote_vm with per-VMA lock

From: David Hildenbrand (Arm)

Date: Fri Jun 19 2026 - 10:27:41 EST


On 6/19/26 16:19, Suren Baghdasaryan wrote:
> On Fri, Jun 19, 2026 at 6:05 AM David Hildenbrand (Arm)
> <david@xxxxxxxxxx> wrote:
>>
>>>
>>> Well I think a critical problem here, as pointed out by Suren, is that holding a
>>> VMA lock means that the VMAs around you can change and in ways that are quite
>>> problematic.
>>>
>>> E.g. The moment you drop the VMA lock that VMA might get freed and then merged
>>> with something else, and the next VMA you consume is the same one you just
>>> partially walked, for instance.
>>>
>>> Now perhaps you could reason your way around this, but I'm pretty sure there are
>>> cases where you might actually miss VMAs due to races (Suren knows best).
>>>
>>> And also without an mmap lock people can unmap and map new VMAs in the range as
>>> you go through which might cause weirdness as well.
>>>
>>> Really, unless you are dealing with a single VMA in the range, I suspect GUP
>>> needs to stabilise that whole range.
>>
>> Well, depends, really. It's not like a all GUP operation that target many pages
>> runs exclusively under the mmap lock that would prevent any VMA changes.
>>
>> With userfaultfd, for example, we drop the lock in between, to lookup the VMA
>> again later. There are various paths where __get_user_pages_locked() is
>> instructed to grab the mmap lock itself, to even temporarily drop it if the mmap
>> lock was dropped.
>>
>> gup_fast_fallback() grabs some pages to then take the mmap lock. And continue
>> from the next address.
>>
>>
>> So it really depends on the use case. I would actually be surprised if there a
>> lot of use cases that strictly must block concurrent mremap operations etc.
>>
>> The important part is that you process each virtual page address requested
>> exactly once. If the VMA was merged in the meantime, you continue from that
>> address in the previously-processed VMA.
>>
>>
>> Some use cases might indeed want to stabilize the whole range. But I wouldn't
>> expect them to opt-in to using per-VMA locks.
>>
>> Just like with any other page table walker, we cannot just convert all in one
>> shot to use per-VMA locks.
>>
>>>
>>> If we could find a way to have GUP fast-path the single VMA case sensibly, then
>>> that's probably workable?
>>
>> Right, that's what I said: start with a single-VMA interface that supports
>> getting called with the per-vma lock or the mmap lock.
>>
>> If we have to fallback to the mmap lock (userfaultd? indicated back by the
>> caller), handle it in the caller of that interface for now.
>>
>>>
>>> And I agree special-casing only one place but not others sucks.
>>
>> Yeah, we're not doing that unless inevitable.
>>
>>>
>>> Perhaps we could find a way to get this improvement without it being quite so
>>> 'tacked on' but without needing significant rework of GUP, but in either case I
>>> broadly agree we need to improve the codebase as part of the changes.
>>
>> We shouldn't fear extending GUP in a reasonable way that makes everyone out
>> there profit ion the long run :)
>
> I do not disagree with the general premise of making existing
> mechanisms work better rather than implementing parallel ones. I'm
> just pointing out my findings so far when I moved in that direction
> and I'm happy Rik posted an alternative simple way around large
> refactoring and started this discussion. We should definitely try
> reworking GUP to cause less contention. I just don't have enough time
> ATM to drive that, but would be happy to help with the VMA-locking
> parts.

If we have a VMA-lock GUP variant, I guess supporting the write part would also
be fairly easy? Not sure how performance-relevant that is, though.

--
Cheers,

David