Re: WARNING: at mm/mremap.c:211 move_page_tables in i386

From: Joel Fernandes
Date: Sat Jul 11 2020 - 19:33:49 EST


On Sat, Jul 11, 2020 at 11:12:58AM -0700, Linus Torvalds wrote:
> On Sat, Jul 11, 2020 at 10:27 AM Naresh Kamboju
> <naresh.kamboju@xxxxxxxxxx> wrote:
> >
> > I have started bisecting this problem and found the first bad commit
>
> Thanks for the effort. Bisection is often a great tool to figure out
> what's wrong.
>
> Sadly, in this case:
>
> > commit 9f132f7e145506efc0744426cb338b18a54afc3b
> > Author: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > Date: Thu Jan 3 15:28:41 2019 -0800
> >
> > mm: select HAVE_MOVE_PMD on x86 for faster mremap
>
> Yeah, that's just the commit that enables the code, not the commit
> that introduces the fundamental problem.
>
> That said, this is a prime example of why I absolutely detest patch
> series that do this kind of thing, and are several patches that create
> new functionality, followed by one patch to enable it.
>
> If you can't get things working incrementally, maybe you shouldn't do
> them at all. Doing a big series of "hidden work" and then enabling it
> later is wrong.
>
> In this case, though, the real patch that did the code isn't that kind
> of "big series of hidden work" patch series, it's just the (single)
> previous commit 2c91bd4a4e2e ("mm: speed up mremap by 20x on large
> regions").
>
> So your bisection is useful, it's just that it really points to that
> previous commit, and it's where this code was introduced.

Right, I think I should have squashed the enabling of the config, and the
introduction of the feature in the same patch, but as you pointed that
probably would not have made a difference with this bisect since this a
single patch.

> It's also worth noting that that commit doesn't really *break*
> anything, since it just falls back to the old behavior when it warns.

Agreed, I am also of the opinion that the patch is likely surface an existing
issue and not introducing a new one.

> So to "fix" your test-case, we could just remove the WARN_ON.
>
> But the WARN_ON() is still worrisome, because the thing it tests for
> really _should_ be true.

I'll get some tracing in an emulated i386 environment going and try to figure
out exactly what is going on before the warning triggers. thanks for the other
debug hints in this thread!

thanks,

- Joel

- Joel