Re: [PATCH v2 4/15] userfaultfd: introduce mfill_get_vma() and mfill_put_vma()

From: Deepanshu Kartikey

Date: Mon Mar 16 2026 - 04:06:14 EST


On Mon, Mar 16, 2026 at 1:19 PM Harry Yoo <harry.yoo@xxxxxxxxxx> wrote:
>
> > It seems there's another attempt to fix the syzbot report from
> > Deepanshu Kartikey [2], which I didn't take a deeper look.
> >
> > At first look [2] looks a bit wrong way to fix to me though,
> > because it allows operating only on a single VMA nothing should really split
> > or shrink the VMA if somebody is holding the VMA lock in read mode
> > (and the validation of the range is done while holding the lock).
> >
> > [2] https://lore.kernel.org/linux-mm/20260316070039.549506-1-kartikey406@xxxxxxxxx
> >

Harry,

You are correct that once vm_refcnt > 0, nobody can split the VMA.
However the split can happen in the race window BEFORE vm_refcnt++
in vma_start_read(), and CHECK 2 can miss this if mmap_write_unlock()
completes before CHECK 2 runs.

Here is the exact race:

vma_start_read():

/* CHECK 1 */
if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(mm->mm_lock_seq.sequence))
goto err;

/*
* RACE WINDOW: vm_refcnt is still 0 here!
* UFFDIO_UNREGISTER can run:
*
* mmap_write_lock() -> mm_lock_seq = 11
* vma_start_write(vma) -> vm_lock_seq = 11
* __split_vma() -> vma->vm_end = 0x4ca000
* mmap_write_unlock() -> mm_lock_seq = 12
*
* writer completes entirely before vm_refcnt++!
*/

__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, ...);
/* vm_refcnt = 1 now, but vma->vm_end already modified! */

/* CHECK 2 */
if (unlikely(vma->vm_lock_seq == raw_read_seqcount(&mm->mm_lock_seq)))
/*
* vm_lock_seq(11) == mm_lock_seq(12)?
* NO! writer already finished and unlocked!
* mm_lock_seq incremented to 12 (even=unlocked)
* CHECK 2 MISSES the race!
*/

return vma;
/*
* returns split vma with vm_end=0x4ca000
* but vm_refcnt=1 (lock held)
*/

Now mfill_atomic loop runs with split vma:

while (state.src_addr < src_start + len) {

/* iteration 1 to N: dst_addr = 0x1b1000 to 0x4c9000
* all within vma->vm_end(0x4ca000)
*/

/* iteration N+1: dst_addr = 0x4ca000 */
err = mfill_atomic_pte(&state);
mfill_atomic_install_pte(state->vma, dst_addr=0x4ca000)
folio_add_new_anon_rmap(vma, 0x4ca000)
VM_WARN_ON_ONCE(address < vma->vm_start ||
address + (nr << 12) > vma->vm_end);
/* 0x4ca000 >= vma->vm_end(0x4ca000) -> WARN! */
}

Without my fix:
CRASH at folio_add_new_anon_rmap

With my fix:
if (state.dst_addr < state.vma->vm_start ||
state.dst_addr >= state.vma->vm_end) {
mfill_put_vma(&state);
state.dst_start = state.dst_addr;
state.len = dst_start + len - state.dst_addr;
err = mfill_get_vma(&state);
if (err)
break;
}
/* catches split, re-lookups correct VMA safely */

So both fixes are needed:

Harry's fix (state.len):
fixes state.len uninitialized
mfill_get_vma validates correct range
in the first call before loop

My fix (bounds check):
catches split VMA that slipped
through CHECK 2 during loop
because writer finished before
CHECK 2 ran