Re: [PATCH v2] mm: userfaultfd: avoid passing an invalid range to vma_merge()

From: Peter Xu
Date: Tue May 16 2023 - 11:07:44 EST


On Tue, May 16, 2023 at 12:07:11AM +0100, Lorenzo Stoakes wrote:
> On Mon, May 15, 2023 at 11:04:27PM +0100, Lorenzo Stoakes wrote:
> [snip]
> > > Could you explain a bit why we don't need to merge in this case?
> > >
> > > I'm considering, for example, when we have:
> > >
> > > vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)
> > >
> > > Then someone unregisters uffd on range (5-9), iiuc it should become:
> > >
> > > vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)
> > >
> > > But if no merge here it's:
> > >
> > > vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)
> > >
> > > Maybe I missed something?
> > >
> >
> > There's something really, really wrong with this. It simply isn't valid to
> > invoke vma_merge() over an existing VMA that != prev where you're not
> > specifying addr = vma->vm_start, end == vma->vm_end.
> >
> > This seems like you're relying on:-
> >
> > ***
> > CCCCCNNNNN -> CCNNNNNNNN

I had a closer look today, I still think this patch is not really the right
one. The split/merge order is something we use everywhere and I am not
convinced it must change as drastic. At least so far it still seems to me
if we do with what current patch proposed we can have vma fragmentations.

I think I see what you meant, but here I think it's a legal case where we
should have PPPP rather than CCCC (PPPPPNNNN --> PPNNNNNNNN).

To be explicit, for register I think it _should_ be the case 0 where we
cannot merge (note: the current code is indeed wrong though, see below):

****
PPPPPPNNNNNN
cannot merge

While for the unregister case here it's case 4:

****
PPPPPPNNNNNN
might become
PPNNNNNNNNNN
case 4 below

Here the problem is not that we need to do split then merge (I think it'll
have the problem of vma defragmentation here), the problem is we simply
passed in the wrong "prev" vma pointer, IMHO. I've patches attached
showing what I meant.

I checked the original commit from Andrea and I found that it _was_ correct:

commit 86039bd3b4e6a1129318cbfed4e0a6e001656635
Author: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Date: Fri Sep 4 15:46:31 2015 -0700

userfaultfd: add new syscall to provide memory externalization

I had a feeling that it's broken during the recent rework on vma (or maybe
even not that close), but I'm not yet sure and need to further check.

> >
> > By specifying parameters that are compatible with N even though you're only
> > partially spanning C?
> >
> > This is crazy, and isn't how this should be used. vma_merge() is not
> > supposed to do partial merges. If it works (presumably it does) this is not
> > by design unless I've lost my mind and I (and others) have somehow not
> > noticed this??
> >
> > I think you're right that now we'll end up with more fragmentation, but
> > what you're suggesting is not how vma_merge() is supposed to work.
> >
> > As I said above, giving vma_merge() invalid parameters is very dangerous as
> > you could end up merging over empty ranges in theory (and could otherwise
> > have corruption).
> >
> > I guess we should probably be passing 0 to the last parameter in
> > split_vma() here then to ensure we do a merge pass too. Will experiment
> > with this.
> >
> > I'm confused as to how the remove from case 8 is not proceeding. I'll look
> > into this some more...
> >
> > Happy to be corrected if I'm misconstruing this!
> >
>
> OK, so I wrote a small program to do perform exactly this case [0] and it seems
> that the outcome is the same before and after this patch - vma_merge() is
> clearly rejecting the case 8 merge (phew!) and in both instances you end up with
> 3 VMAs.
>
> So this patch doesn't change this behaviour and everything is as it was
> before. Ideally we'd let it go for another pass, so maybe we should change the
> split to add a new VMA _afterwards_. Will experiment with that, separately.
>
> But looks like the patch is good as it is.
>
> (if you notice something wrong with the repro, etc. do let me know!)
>
> [0]: https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e

I think your test case is fine, but... no, this is still not expected. The
vma should just merge.

So I have another closer look on this specific issue, it seems we have a
long standing bug on pgoff calculation here when passing that to
vma_merge(). I've got another patch attached to show what I meant.

To summarize.. now I've got two patches attached:

Patch 1 is something I'd like to propose to replace this patch that fixes
incorrect use of vma_merge() so it should also eliminate the assertion
being triggered (I still think this is a regression but I need to check
which I will do later; I'm not super familiar with maple tree work, maybe
you or Liam can quickly spot).

Patch 2 fixes the long standing issue of vma not being able to merge in
above case, and with patch 2 applied it should start merging all right.

Please have a look, thanks.

---8<---