Re: [PATCH v4] mm/userfaultfd: detect VMA replacement after copy retry in mfill_copy_folio_retry()
From: Peter Xu
Date: Thu Apr 02 2026 - 09:44:27 EST
Hi, Mike,
Let me also leave my comments inline just for you to consider.
On Thu, Apr 02, 2026 at 06:58:33AM +0300, Mike Rapoport wrote:
> Hi David,
>
> It feels that you use an LLM for correspondence. Please tune it down to
> produce more laconic and to the point responses.
>
> On Wed, Apr 01, 2026 at 09:06:36AM +0100, David CARLIER wrote:
> > On Tue, Apr 01, 2026 at 08:49:00AM +0300, Mike Rapoport wrote:
> > > What does "folio allocated from the original VMA's backing store" exactly
> > > mean? Why is this a problem?
> >
> > Fair point, the commit message was vague here. What I meant is:
> >
> > mfill_atomic_pte_copy() captures ops = vma_uffd_ops(state->vma) and
> > passes it to __mfill_atomic_pte(). There, ops->alloc_folio() allocates
> > a folio for the original VMA's inode (e.g. a shmem folio for that
> > specific shmem inode).
>
> I wouldn't say ->alloc_folio() allocates a folio _for_ the inode, it
> allocates it with inode's memory policy. Worst can happen without any
> changes is that the allocated folio will end up in a wrong node.
For shmem it's only about mempolicy indeed, but since we're trying to
export it as an API in the series, IMHO it would be nice to be generic. So
we shouldn't assume it's only about mempolicy, we should rely on detecting
any context change and bail out with -EAGAIN, relying all rest checks to
the next UFFDIO_COPY ioctl done on top of the new mapping topology.
>
> This is still a footgun, but I don't see it as a big deal.
IIUC this is a real bug reported. Actually, if my understanding is
correct, we should be able to easily write a reproducer by registering the
src addr of UFFDIO_COPY to userfaultfd too, then the ioctl(UFFDIO_COPY)
thread will get blocked faulting in the src_addr. During that, we can
change the VMA layout in another thread to test whatever setup we want.
> Let's revisit it after -rc1 and please make sure to cc "MEMORY MAPPING"
> folks for insights about how to better track VMA changes or their absence.
No strong feeling here if we want to slightly postpone this fix. It looks
like not easy to happen as it looks to be a bug present for a while, indeed.
It's just that if my understanding is correct, with above reproducer we can
crash the kernel easily without a proper fix.
>
> > Then mfill_copy_folio_retry() drops all locks for
> > the copy_from_user retry. After mfill_get_vma() re-acquires them,
> > state->vma may now point to a replacement VMA, but ops is still the
> > stale pointer from before the drop.
>
> And this is a bug in my uffd refactoring, and it needs to be fixed ASAP
> with a simple comparison of old ops and new ops.
>
> > > Second, I have reservations about vma_snapshot implementation. What
> > > invariant does it exactly enforce?
> >
> > The invariant I was going for: "the folio we allocated is still
> > compatible with the VMA we're about to install it into." Since
> > alloc_folio() allocates from the VMA's backing file (inode), checking
> > that vm_file is still the same after re-acquiring locks ensures the
> > folio matches the inode.
>
> Again, it's not that folio matches the inode, but folio is allocated using
> the correct mempolicy.
>
> > The vm_flags comparison was a secondary guard against permission/type
> > changes during the window.
>
> Permissions should be fine, they are checked in userfaultfd_register.
> Some other flags that don't matter to uffd operation may change during the
> window, though and then a comparison of vm_flags will give a false
> positive.
IMHO false positive is fine in this case when -EAGAIN will be used (which I
still think we should), if it only causes a retry.
Thanks,
--
Peter Xu