Re: [PATCH] oom_kill: add option to disable dump_stack()

From: David Rientjes
Date: Mon Oct 26 2015 - 17:38:19 EST

On Fri, 23 Oct 2015, Aristeu Rozanski wrote:

> One of the largest chunks of log messages in a OOM is from dump_stack() and in
> some cases it isn't even necessary to figure out what's going on. In
> systems with multiple tenants/containers with limited resources each
> OOMs can be way more frequent and being able to reduce the amount of log
> output for each situation is useful.
> This patch adds a sysctl to allow disabling dump_stack() during an OOM while
> keeping the default to behave the same way it behaves today.
> Cc: Greg Thelen <gthelen@xxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: linux-mm@xxxxxxxxx
> Cc: cgroups@xxxxxxxxxxxxxxx
> Signed-off-by: Aristeu Rozanski <arozansk@xxxxxxxxxx>

There's lots of information in the oom log that is irrelevant depending on
the context in which the oom condition occurred. Removing the stack trace
would have made things like commit 9a185e5861e8 ("/proc/stat: convert to
single_open_size()") harder to fix. In that case, we were calling the oom
killer on large file reads from procfs when we could have easily have
used vmalloc() instead.

When you have a memcg oom kill, the state of the system's memory can
usually be suppressed because it only occurred because a memcg hierarchy
reached its limit and has nothing to do with the exhaustion of RAM.

We already control oom output with global sysctls like vm.oom_dump_tasks
and memcg tunables like memory.oom_verbose. I'm not sure that adding more
and more tunables simply to control the oom killer output is in the best
interest of either procfs or a long-term maintainable kernel.

I can understand the usefulness of having a very small amount of output to
the kernel log and then enabling tunables to investigate why oom kills are
happening, but in many situations I've found to only have the oom killer
output left behind in a kernel log and the situation is not on-going so I
can't start diagnosing the problem if I don't know what triggered it.

I think adding additional sysctls to control oom killer output is in the
wrong direction. I do agree with removing anything that is irrelevant in
all situations, however.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at