On Mon 25-09-17 23:55:19, Yang Shi wrote:
On 9/25/17 7:23 AM, Michal Hocko wrote:
On Thu 21-09-17 06:38:50, Yang Shi wrote:
Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
Well, I do undestand that this _might_ be useful but it also might
generates a _lot_ of output. The oom report can be quite verbose already
so is this something we want to have enabled by default?
The uneclaimable slub message will be just printed out when kernel panic (no
killable process or panic_on_oom is set). So, it will not bother normal oom.
Since kernel is already panic, so it might be preferred to have more
information reported.
Well, this certainly depends. If you have a limited console output (e.g.
no serial console) then the additional information can easily scroll the
potentially much more useful information from the early oom report. We
already do have a control to enable/disable tasks dumping which can be
very long as well.
We definitely can add a proc knob to control it if we want to disable the
message even if when kernel panic.
Well, I do not have a strong opinion on this. I can see cases where this
kind of information would be useful but most OOM reports I have seen
were simply user space pinned memory. Slab memory leaks are seen very
seldom. Do you think a pr_dbg and slab stats for all ooms would be still
useful?