Re: maple tree change made it possible for VMA iteration to see same VMA twice due to late vma_merge() failure
From: Jann Horn
Date: Wed Aug 16 2023 - 13:14:00 EST
On Wed, Aug 16, 2023 at 6:18 PM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> * Jann Horn <jannh@xxxxxxxxxx> [230815 15:37]:
> > commit 18b098af2890 ("vma_merge: set vma iterator to correct
> > position.") added a vma_prev(vmi) call to vma_merge() at a point where
> > it's still possible to bail out. My understanding is that this moves
> > the VMA iterator back by one VMA.
> >
> > If you patch some extra logging into the kernel and inject a fake
> > out-of-memory error at the vma_iter_prealloc() call in vma_split() (a
> > real out-of-memory error there is very unlikely to happen in practice,
> > I think - my understanding is that the kernel will basically kill
> > every process on the system except for init before it starts failing
> > GFP_KERNEL allocations that fit within a single slab, unless the
> > allocation uses GFP_ACCOUNT or stuff like that, which the maple tree
> > doesn't):
[...]
> > then you'll get this fun log output, showing that the same VMA
> > (ffff88810c0b5e00) was visited by two iterations of the VMA iteration
> > loop, and on the second iteration, prev==vma:
> >
> > [ 326.765586] userfaultfd_register: begin vma iteration
> > [ 326.766985] userfaultfd_register: prev=ffff88810c0b5ef0,
> > vma=ffff88810c0b5e00 (0000000000101000-0000000000102000)
> > [ 326.768786] userfaultfd_register: vma_merge returned 0000000000000000
> > [ 326.769898] userfaultfd_register: prev=ffff88810c0b5e00,
> > vma=ffff88810c0b5e00 (0000000000101000-0000000000102000)
> >
> > I don't know if this can lead to anything bad but it seems pretty
> > clearly unintended?
>
> Yes, unintended.
>
> So we are running out of memory, but since vma_merge() doesn't
> differentiate between failure and 'nothing to merge', we end up in a
> situation that we will revisit the same VMA.
>
> I've been thinking about a way to work this into the interface and I
> don't see a clean way because we (could) do different things before the
> call depending on the situation.
>
> I think we need to undo any vma iterator changes in the failure
> scenarios if there is a chance of the iterator continuing to be used,
> which is probably not limited to just this case.
I don't fully understand the maple tree interface - in the specific
case of vma_merge(), could you move the vma_prev() call down below the
point of no return, after vma_iter_prealloc()? Or does
vma_iter_prealloc() require that the iterator is already in the insert
position?
> I will audit these areas and CC you on the result.
Thanks!