Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic

From: Yang Shi
Date: Mon Sep 25 2017 - 17:53:09 EST




On 9/25/17 1:32 PM, Michal Hocko wrote:
On Mon 25-09-17 23:55:19, Yang Shi wrote:


On 9/25/17 7:23 AM, Michal Hocko wrote:
On Thu 21-09-17 06:38:50, Yang Shi wrote:
Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.

So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.

With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.

And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.

Well, I do undestand that this _might_ be useful but it also might
generates a _lot_ of output. The oom report can be quite verbose already
so is this something we want to have enabled by default?

The uneclaimable slub message will be just printed out when kernel panic (no
killable process or panic_on_oom is set). So, it will not bother normal oom.
Since kernel is already panic, so it might be preferred to have more
information reported.

Well, this certainly depends. If you have a limited console output (e.g.
no serial console) then the additional information can easily scroll the
potentially much more useful information from the early oom report. We
already do have a control to enable/disable tasks dumping which can be
very long as well.
We definitely can add a proc knob to control it if we want to disable the
message even if when kernel panic.

Well, I do not have a strong opinion on this. I can see cases where this
kind of information would be useful but most OOM reports I have seen
were simply user space pinned memory. Slab memory leaks are seen very
seldom. Do you think a pr_dbg and slab stats for all ooms would be still
useful?

It might be. But, we can use slabinfo to get all slab stats in non-panic oom case, patch 1/2 (tools: slabinfo: add "-U" option to show unreclaimable slabs only) should be used to cover this case.

Maybe we can set a unreclaimable slab/total mem ratio. For example, when unreclaimable slab size >= 50% total memory size, then we print out slab stats in oom? And, the ratio might be adjustable in /proc.

Or just replace pr_info to pr_debug. Once oom happens, if there are a lot unreclaimable slabs consumed, we can just enable the debug info then try to reproduce.

Thanks,
Yang