Re: [PATCH 1/2] mm/userfaultfd: fix stale ops and VMA type mismatch after copy retry

From: Peter Xu

Date: Mon Mar 30 2026 - 15:42:10 EST


Hi, David,

On Sat, Mar 28, 2026 at 05:01:00PM +0000, David Carlier wrote:
> In mfill_atomic_pte_copy(), ops is derived from the VMA once and passed
> to __mfill_atomic_pte(). When the initial copy_from_user() fails under
> pagefault_disable(), mfill_copy_folio_retry() drops all locks, performs
> the copy with page faults enabled, then re-acquires locks via
> mfill_get_vma(). During this window, the VMA can be replaced entirely
> (e.g. munmap + mmap + UFFDIO_REGISTER by another thread), but ops is
> never re-validated.

Thanks for the report, this seems a bug indeed.

>
> If a shared shmem VMA is replaced by an anonymous VMA, the stale
> shmem_uffd_ops->filemap_add calls shmem_mfill_filemap_add() with an
> anonymous VMA, causing a NULL pointer dereference at file_inode(vma->
> vm_file) since vm_file is NULL for anonymous mappings.
>
> The mmap_changing guard does not fully prevent this because
> userfaultfd_unmap_prep() only increments mmap_changing when
> UFFD_FEATURE_EVENT_UNMAP is enabled, which is optional. Without it,
> munmap proceeds without any signal to the retry path.
>
> The copy_from_user() in the retry runs with page faults enabled and can
> block on slow backing stores (FUSE, NFS), significantly widening the
> race window.
>
> Fix this by:
> - Validating that the VMA's userfaultfd context matches state->ctx in
> mfill_get_vma() to detect cross-context VMA replacement.
> - Re-checking that vma_uffd_ops() still matches the frozen ops after
> the retry, and that the VMA is still VM_SHARED when ops expects it
> to be, returning -EAGAIN otherwise.
>
> Signed-off-by: David Carlier <devnexen@xxxxxxxxx>
> ---
> mm/userfaultfd.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 481ec7eb4442..2a6e034b15aa 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -225,8 +225,9 @@ static int mfill_get_vma(struct mfill_state *state)
> */
> down_read(&ctx->map_changing_lock);
> state->vma = dst_vma;
> +
> err = -EAGAIN;
> - if (atomic_read(&ctx->mmap_changing))
> + if (dst_vma->vm_userfaultfd_ctx.ctx != ctx || atomic_read(&ctx->mmap_changing))

This is a valid and good check. Though IMHO it doesn't need to be in the
same patch as a fix to the bug reported, likely it suites for a separate
patch.

For example, a new VMA (as you described in the case of when the bug can
happen) can also be registered to the same userfaultfd ctx, and this check
won't help there since the ctx will still match.

> goto out_unlock;
>
> err = -EINVAL;
> @@ -498,6 +499,12 @@ static int __mfill_atomic_pte(struct mfill_state *state,
> ret = mfill_copy_folio_retry(state, folio);
> if (ret)
> goto err_folio_put;
> + if (vma_uffd_ops(state->vma) != ops ||
> + (ops != &anon_uffd_ops &&
> + !(state->vma->vm_flags & VM_SHARED))) {

Hard-code this (out of mfill_atomic_pte_copy) is very likely not a good
idea.. Meanwhile I also feel like it won't completely fix the problem.

Consider the changed VMA is not about shmem VMA becoming anon VMA, but
shmem VMA1 becoming shmem VMA2 and both of them can even have the same
flags. IMHO in that case we should still fallback with EAGAIN because we
still hold a folio that we got allocated from VMA1 (rather than VMA2) here.

Ideally, we should check "if the VMA has changed at all", which will be a
very safe check, however I don't think we have any good way to refcount
VMA... at least not something I'm aware of.

A simple workaround here is we take a snapshot memory of the VMA attributes
and making sure that didn't change after the retried copy_from_user(),
assuming that guarantees the VMA didn't change. AFAIU, the important bit
is at least two things: (1) the inode representing the backing store of the
VMA mapping (when it's a file), and (2) vma flags.

This patch below will do something like what I discussed above; that's the
best I can come up with so far. Not sure if there's even better way to do.

Thanks,

===8<===
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 481ec7eb44420..caac1afabd520 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -443,33 +443,90 @@ static int mfill_copy_folio_locked(struct folio *folio, unsigned long src_addr)
return ret;
}

+struct vma_snapshot {
+ struct inode *inode;
+ vma_flags_t flags;
+};
+
+static void vma_snapshot_take(struct vm_area_struct *vma,
+ struct vma_snapshot *s)
+{
+ struct inode *inode;
+
+ memcpy(&s->flags, &vma->flags, sizeof(s->flags));
+ if (vma->vm_file) {
+ /* We're holding vma lock, so file and inode are available */
+ inode = vma->vm_file->f_inode;
+ ihold(inode);
+ s->inode = inode;
+ } else {
+ s->inode = NULL;
+ }
+}
+
+static bool vma_snapshot_changed(struct vm_area_struct *vma,
+ struct vma_snapshot *s)
+{
+ /* If vma flags changed? */
+ if (memcmp(&s->flags, &vma->flags, sizeof(s->flags)))
+ return true;
+
+ /* If there's a backing store of mapping, make sure it didn't change */
+ if (s->inode && vma->vm_file->f_inode != s->inode)
+ return true;
+
+ /* If not, making sure it's still anonymous */
+ if (!s->inode && !vma_is_anonymous(vma))
+ return true;
+
+ return false;
+}
+
+static void vma_snapshot_release(struct vma_snapshot *s)
+{
+ if (s->inode) {
+ iput(s->inode);
+ s->inode = NULL;
+ }
+}
+
static int mfill_copy_folio_retry(struct mfill_state *state, struct folio *folio)
{
unsigned long src_addr = state->src_addr;
+ struct vma_snapshot s;
void *kaddr;
int err;

+ /* Take a quick snapshot of the current vma */
+ vma_snapshot_take(state->vma, &s);
+
/* retry copying with mm_lock dropped */
mfill_put_vma(state);

kaddr = kmap_local_folio(folio, 0);
err = copy_from_user(kaddr, (const void __user *) src_addr, PAGE_SIZE);
kunmap_local(kaddr);
- if (unlikely(err))
- return -EFAULT;
+ if (unlikely(err)) {
+ err = -EFAULT;
+ goto out;
+ }

flush_dcache_folio(folio);

/* reget VMA and PMD, they could change underneath us */
err = mfill_get_vma(state);
if (err)
- return err;
+ goto out;

- err = mfill_establish_pmd(state);
- if (err)
- return err;
+ if (vma_snapshot_changed(state->vma, &s)) {
+ err = -EAGAIN;
+ goto out;
+ }

- return 0;
+ err = mfill_establish_pmd(state);
+out:
+ vma_snapshot_release(&s);
+ return err;
}

static int __mfill_atomic_pte(struct mfill_state *state,
--
2.50.1


--
Peter Xu