Re: v4.6 kernel BUG at mm/rmap.c:1101!

From: Mika Westerberg
Date: Tue May 24 2016 - 10:53:43 EST


On Tue, May 24, 2016 at 04:08:09PM +0200, Andrea Arcangeli wrote:
> On Tue, May 24, 2016 at 11:12:23AM +0300, Mika Westerberg wrote:
> > Hmm, the kernel shipped with Fedora 23 has that enabled:
> >
> > lahna % grep CONFIG_DEBUG_VM /boot/config-4.4.9-300.fc23.x86_64
> > CONFIG_DEBUG_VM=y
> > # CONFIG_DEBUG_VM_VMACACHE is not set
> > # CONFIG_DEBUG_VM_RB is not set
>
> Yes, it would have been more accurate to say "enterprise", not just
> "production".

Fair enough.

> It's great to run Fedora with CONFIG_DEBUG_VM=y and I'd recommend to
> keep it that way, so it contributes to stronger runtime validation of
> the VM invariants.
>
> I keep CONFIG_DEBUG_VM=y on all my systems too of course.
>
> Also note the RHEL debug kernel has CONFIG_DEBUG_VM=y also enabled,
> but only the debug kernel.
>
> In general while testing new kernels with new VM modifications it's
> good idea to set CONFIG_DEBUG_VM=y, if you can afford the occasional
> false positive like in this case and it's not an enterprise production
> kernel, where clearly all testing should have already happened before
> that become "enterprise" ready in the first place, so we can save a
> few cycles.
>
> Lately we got VM_WARN_ON too and I added to my tree recently:
>
> +#define VM_WARN_ON_PAGE(cond, page) \
> + do { \
> + if (unlikely(cond)) { \
> + dump_page(page, "VM_WARN_ON_PAGE(" __stringify(cond)")");\
> + __WARN(); \
> + } \
> + } while (0)
>
> So we could convert some... to reduce the pain of a false positive,
> but in cases like the one that triggered I'm not sure it'd be good
> idea to switch it to a WARN_ON as it may be a sign of memory
> corruption if the assert fails (after the patch) and keeping going
> after memory corruption can actually do more harm than good.
>
> One thing to keep =n however is CONFIG_DEBUG_VM_RB=n, that one is
> expensive and that's why it has its own separate knob to be able to
> disable it while keeping CONFIG_DEBUG_VM=y. IIRC I kept originally
> under #if 0... so I wouldn't recommend to enable VM_RB on production
> (it's too much overhead), that's a nice validation but for development
> only.

Understood. Thanks for the thorough explanation :)