Re: [PATCH RESEND] userfaultfd: snapshot VMA state across UFFDIO_COPY retry

From: David Hildenbrand (Arm)

Date: Wed May 20 2026 - 10:15:30 EST

On 5/20/26 15:48, David Hildenbrand (Arm) wrote:
> On 5/20/26 14:53, Mike Rapoport wrote:
>> On Wed, May 20, 2026 at 01:09:04PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> As this is all uffd specific, I wonder whether that should be "struct
>>> uffd_vma_snapshot"/"vma_uffd_snapshot" etc.
>>
>> It's local to this file, I wouldn't worry about namespacing until there's
>> an actual need if it will ever arise.
>
> Indeed, fair enough.
>
>>
>>> From a high level, this LGTM.
>>>
>>> I wish we could identify relevant VMA changes more easily. Like, using a per-MM
>>> sequence counter that we simply increment on any VMA changes.
>>
>> Do you mean per-VMA?
>> Per-MM counter would capture unrelated changes, e.g. an masvise() for
>> unrelated range.
>
> I was thinking about a per-MM thing for simplicity. If there were any changes,
> we'd retry (-EAGAIN).
>
> IOW, something like &mm->mm_lock_seq, which we have for per-vma locks already.
>
> Not sure if unrelated changes would really be a problem in practice (another
> thread gabbing the mmap_lock in write mode until we serviced the fault).

Assume something like the following:

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 180bad42fc79..608bb3c3a613 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -448,9 +448,12 @@ static int mfill_copy_folio_retry(struct mfill_state *state,
{
const struct vm_uffd_ops *orig_ops = vma_uffd_ops(state->vma);
unsigned long src_addr = state->src_addr;
+ unsigned int seq;
void *kaddr;
int err;

+ seq = raw_read_seqcount(&mm->mm_lock_seq);
+
/* retry copying with mm_lock dropped */
mfill_put_vma(state);

@@ -472,7 +475,7 @@ static int mfill_copy_folio_retry(struct mfill_state *state,
* (e.g. replaced with a hugetlb mapping), making the caller's
* ops pointer stale.
*/
- if (vma_uffd_ops(state->vma) != orig_ops)
+ if (read_seqcount_retry(&mm->mm_lock_seq, seq))
return -EAGAIN;

err = mfill_establish_pmd(state);

In case there is concurrent mmap_write lock, we'd retry. But we triggered and
resolved the fault. So the retry would just succeed in copying from the page,
not triggering another fault here.

So I'd not expect this to matter in practice?

We'd have to implement &mm->mm_lock_seq also without per-VMA locks. Which
wouldn't be too bad given that Dave proposed a patch set to just enable per-VMA
locks unconditionally.

As a hotfix, what you have is probably better, but a simplification like this
might later make sense.

--
Cheers,

David