Re: BUG_ON() in workingset_node_shadows_dec() triggers

From: Johannes Weiner
Date: Tue Oct 04 2016 - 05:32:34 EST


Hi Linus,

On Mon, Oct 03, 2016 at 09:00:55PM -0700, Linus Torvalds wrote:
> Johannes? Please make this your first priority. And in the meantime I
> will make that VM_BUG_ON() be a VM_WARN_ON_ONCE().

I'm sorry about the trouble. I'll look into where the underflow comes
from that triggers this new check, and thanks for fixing it up in the
meantime.

> And dammit, if anybody else feels that they had done "debugging
> messages with BUG_ON()", I would suggest you
>
> (a) rethink your approach to programming
>
> (b) send me patches to remove the crap entirely, or make them real
> *DEBUGGING* messages, not "kill the whole machine" messages.

Yeah, it's what CONFIG_DEBUG_VM effectively is :( There are a few
instances where it adds additional output to dmesg or a stat file, but
the vast majority of uses is VM_BUG_ON(), VM_BUG_ON_PAGE() etc., which
all mean "if anything is unexpected, stop in your tracks and drop me
into a kdump kernel." People do think of it as a development tool
rather than verbose diagnostics in production kernels, so I assume
none of the conditions protected by that config option are curated for
whether the machine could or should continue to limp along after they
trigger, or at least much less so than the bare BUG vs WARN instances.

But I agree that most, if not all of these checks could warn under a
ratelimit and be more useful for bleeding edge situations like the
fedora kernel than the binary thinking of development kernels and
production kernels as separate things with no overlap. And kernel
people can set panic_on_warn if they want quick death and/or kdump.

In the workingset code, if we detect radix tree nodes in a state in
which they shouldn't be on the shadow node LRU, we could simply warn,
abort the reclaim process and leave them off the LRU. Something like
the below patch. I hate what it does to the codeflow, though, I don't
want half of the code to be clutter that handles shouldnt-bes. But it
also feels wrong to warn about a corrupted node and then keep working
on it against who knows what state. I wonder if that's the fundamental
dilemma that keeps us sniffing that assert() glue...

diff --git a/mm/workingset.c b/mm/workingset.c
index 617475f529f4..5f07db171c03 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -418,23 +418,28 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
* no pages, so we expect to be able to remove them all and
* delete and free the empty node afterwards.
*/
- BUG_ON(!workingset_node_shadows(node));
- BUG_ON(workingset_node_pages(node));
+ if (WARN_ON_ONCE(!workingset_node_shadows(node)))
+ goto out_invalid;
+ if (WARN_ON_ONCE(workingset_node_pages(node)))
+ goto out_invalid;

for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
if (node->slots[i]) {
- BUG_ON(!radix_tree_exceptional_entry(node->slots[i]));
+ if (WARN_ON_ONCE(!radix_tree_exceptional_entry(node->slots[i])))
+ goto out_invalid;
node->slots[i] = NULL;
workingset_node_shadows_dec(node);
- BUG_ON(!mapping->nrexceptional);
+ if (WARN_ON_ONCE(!mapping->nrexceptional))
+ goto out_invalid;
mapping->nrexceptional--;
}
}
- BUG_ON(workingset_node_shadows(node));
+ if (WARN_ON_ONCE(workingset_node_shadows(node)))
+ goto out_invalid;
inc_node_state(page_pgdat(virt_to_page(node)), WORKINGSET_NODERECLAIM);
- if (!__radix_tree_delete_node(&mapping->page_tree, node))
- BUG();
+ __radix_tree_delete_node(&mapping->page_tree, node);

+out_invalid:
spin_unlock(&mapping->tree_lock);
ret = LRU_REMOVED_RETRY;
out: