Re: [PATCH RESEND] userfaultfd: snapshot VMA state across UFFDIO_COPY retry
From: Liam R. Howlett
Date: Tue May 26 2026 - 15:08:28 EST
On 26/05/26 04:25PM, Lorenzo Stoakes wrote:
> On Tue, May 26, 2026 at 02:47:45PM +0200, David Hildenbrand (Arm) wrote:
> > On 5/25/26 19:12, Liam R. Howlett wrote:
> > > On 26/05/20 04:38PM, David Hildenbrand (Arm) wrote:
> > >> On 5/20/26 16:12, Mike Rapoport wrote:
> > >>>
> > >>> Let me reiterate:
> > >>>
> > >>> A thread doing UFFDIO_COPY releases the VMA in mfill_copy_folio_retry(),
> > >>> re-gets the VMA and checks if the per-MM counter stayed the same.
> > >>>
> > >>> If another thread makes any change to VMA while mfill_copy_folio_retry()
> > >>> waits to re-get the VMA, the counter would be incremented by another
> > >>> thread. mfill_copy_folio_retry() will see the change after mfill_get_vma()
> > >>> and will bail out with -EAGAIN.
> > >>>
> > >>
> > >> Yeah.
> > >
> > > This isn't bulletproof anyways. The sequence count can wrap. So, if
> > > someone can replace the vma then cause the sequence counter wrap, then
> > > you can be fooled into thinking it's okay (we had this problem years ago
> > > with the vmacache using a 32 bit counter, iirc).
> >
> > If you can get it to wrap for such short durations, then how would sequence
> > counters possibly work in any reasonable context?
I think the answer is because 1. it may not be a short duration once
we're asking userspace to do things and 2. because we aren't depending
on the sequence number alone, usually.
I don't think we should trust it if we're dropping other
locks/guarantees and waiting on userspace.
>
> Surely even for a 32-bit value, we can be pretty confident we're not going to
> see a wrap that matters (the seqnum will != the prev seqnum unless 4 billion VMA
> write locks were obtained)?
The vmacache had a similar issue and it was reported as a potential
security problem, which caused it to switch to a 64bit value. [1] I
mean, it took 40 minutes, but it was possible..
Note that the vmacache was a bit different in that it was invalidating
the cache when the number wrapped.
And the vmacache was mm wide, while the per-vma sequence is mm and
per-vma specific, so it's very much harder to hit a false positive.
But, I guess if you can register a bunch of UFFDs and fork many
processes, you can increase your odds?
If you recall, we did try a 64 bit early on, but switched to a 32bit to
avoid a performance regression.
I think we should take a really hard look at the locking mechanics here
if we're going to try and depend on the sequence number before trusting
it as a correctness guarantee, especially when involving uffd copy.
[1]. https://www.exploit-db.com/exploits/45497
Thanks,
Liam