Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic

From: Michal Hocko
Date: Mon Sep 25 2017 - 16:32:43 EST


On Mon 25-09-17 23:55:19, Yang Shi wrote:
>
>
> On 9/25/17 7:23 AM, Michal Hocko wrote:
> > On Thu 21-09-17 06:38:50, Yang Shi wrote:
> > > Recently we ran into a oom issue, kernel panic due to no killable process.
> > > The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
> > >
> > > So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
> > > Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
> > >
> > > With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
> > >
> > > And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
> >
> > Well, I do undestand that this _might_ be useful but it also might
> > generates a _lot_ of output. The oom report can be quite verbose already
> > so is this something we want to have enabled by default?
>
> The uneclaimable slub message will be just printed out when kernel panic (no
> killable process or panic_on_oom is set). So, it will not bother normal oom.
> Since kernel is already panic, so it might be preferred to have more
> information reported.

Well, this certainly depends. If you have a limited console output (e.g.
no serial console) then the additional information can easily scroll the
potentially much more useful information from the early oom report. We
already do have a control to enable/disable tasks dumping which can be
very long as well.

> We definitely can add a proc knob to control it if we want to disable the
> message even if when kernel panic.

Well, I do not have a strong opinion on this. I can see cases where this
kind of information would be useful but most OOM reports I have seen
were simply user space pinned memory. Slab memory leaks are seen very
seldom. Do you think a pr_dbg and slab stats for all ooms would be still
useful?
--
Michal Hocko
SUSE Labs