Re: WARN_ON in move_normal_pmd

From: Linus Torvalds
Date: Fri Mar 24 2023 - 19:38:34 EST


On Fri, Mar 24, 2023 at 6:43 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> Wouldn't it be better to instead fix it from the caller side? Like
> making it non-overlapping.

I wonder if we could just do something like this in mremap() instead

- if old/new are mutually PMD_ALIGNED

- *and* there is no vma below new within the same PMD

- then just expand the mremap to be PMD-aligned downwards

IOW, the problem with the exec stack moving case isn't really that
it's overlapping: that part is fine. We're moving downwards, and we
start from the bottom, so the moving part works fine.

No, the problem is that we *start* by moving individual pages, and
then by the time we've a few pages down by a whole PMD, we finish the
source PMD (and we've cleared all the contents of it), but it still
exists.

And at *that* point, when we go and start copying the next page, we're
suddenly fully PMD-aligned, and now we try to copy a whole PMD, and
then that code is unhappy about the fact that the old (empty) PMD is
there in the target.

And for all of this to happen, we need to move things by an exact
multiple of PMD size, because otherwise we'd never get to that aligned
situation at all, and we'd always do all the movement in individual
pages, and everything would be just fine.

And more importantly, if we had just *started* with moving a whole
PMD, this also wouldn't have happened. But we didn't. We started
moving individual pages.

So you could see the warning not as a "this range overlaps" warning
(it's fine, and happens all the time, and we do individual pages that
way quite happily), but really as a "hey, this was very inefficient -
you shouldn't have done those individual pages as several small
independent invidual pages in the first place" warning.

So some kind of

/* Is the movement mutually PMD-aligned? */
if ((old_addr ^ new_addr) & ~PMD_MASK == 0) {
.. try to extend the move_vma() down to the *aligned*
PMD case ..
}

logic in move_page_tables() would get rid of the warning, and would
make the move more efficient since you'd skip the "move individual
pages and allocate a new PMD" case entirely.

This is all fairly com,plicated, and the "try to extend the move
range" would also have to depend on CONFIG_HAVE_MOVE_PMD etc, so I'm
not saying it's trivial.

But it would seem to be a really nice optimization, in addition to
getting rid of the warning.

It could even help real world cases outside of this odd stack
remapping case if users ever end up moving vma's by multiples of
PMD_SIZE, and there aren't other vma's around the source/target that
disable the optimization.

Hmm? Anybody want to look into that? It looks hairy enough that I
think that "you could test this with mutually aligned mremap()
source/targets in some test program" would be a good thing. Because
the pure execve() case is rare enough that using *that* as a test-case
seems like a fool's errand.

(To make things very clear: the important part is that the source and
targets aren't *actually* PMD-aligned, just mutually aligned so that
you *can* do the mremap() by just moving whole PMD's around)

Linus