Re: v4.6 kernel BUG at mm/rmap.c:1101!

From: Andrea Arcangeli
Date: Tue May 24 2016 - 10:08:20 EST

On Tue, May 24, 2016 at 11:12:23AM +0300, Mika Westerberg wrote:
> Hmm, the kernel shipped with Fedora 23 has that enabled:
> lahna % grep CONFIG_DEBUG_VM /boot/config-4.4.9-300.fc23.x86_64
> # CONFIG_DEBUG_VM_RB is not set

Yes, it would have been more accurate to say "enterprise", not just

It's great to run Fedora with CONFIG_DEBUG_VM=y and I'd recommend to
keep it that way, so it contributes to stronger runtime validation of
the VM invariants.

I keep CONFIG_DEBUG_VM=y on all my systems too of course.

Also note the RHEL debug kernel has CONFIG_DEBUG_VM=y also enabled,
but only the debug kernel.

In general while testing new kernels with new VM modifications it's
good idea to set CONFIG_DEBUG_VM=y, if you can afford the occasional
false positive like in this case and it's not an enterprise production
kernel, where clearly all testing should have already happened before
that become "enterprise" ready in the first place, so we can save a
few cycles.

Lately we got VM_WARN_ON too and I added to my tree recently:

+#define VM_WARN_ON_PAGE(cond, page) \
+ do { \
+ if (unlikely(cond)) { \
+ dump_page(page, "VM_WARN_ON_PAGE(" __stringify(cond)")");\
+ __WARN(); \
+ } \
+ } while (0)

So we could convert some... to reduce the pain of a false positive,
but in cases like the one that triggered I'm not sure it'd be good
idea to switch it to a WARN_ON as it may be a sign of memory
corruption if the assert fails (after the patch) and keeping going
after memory corruption can actually do more harm than good.

One thing to keep =n however is CONFIG_DEBUG_VM_RB=n, that one is
expensive and that's why it has its own separate knob to be able to
disable it while keeping CONFIG_DEBUG_VM=y. IIRC I kept originally
under #if 0... so I wouldn't recommend to enable VM_RB on production
(it's too much overhead), that's a nice validation but for development