Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

From: Michal Hocko
Date: Tue Jan 17 2017 - 08:45:21 EST

Next message: Thomas Gleixner: "Re: [PATCH 05/12] x86/cqm,perf/core: Cgroup support prepare"
Previous message: Peter Zijlstra: "Re: [PATCH -next] init/main: Init jump_labels before they are used to build zonelists"
In reply to: Trevor Cordes: "Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun 15-01-17 00:27:52, Trevor Cordes wrote:
> On 2017-01-12 Michal Hocko wrote:
> > On Wed 11-01-17 16:52:32, Trevor Cordes wrote:
> > [...]
> > > I'm not sure how I can tell if my bug is because of memcgs so here
> > > is a full first oom example (attached).
> >
> > 4.7 kernel doesn't contain 71c799f4982d ("mm: add per-zone lru list
> > stat") so the OOM report will not tell us whether the Normal zone
> > doesn't age active lists, unfortunatelly.
>
> I compiled the patch Mel provided into the stock F23 kernel
> 4.8.13-100.fc23.i686+PAE and it ran for 2 nights. It didn't oom the
> first night, but did the second night. So the bug persists even with
> that patch. However, it does *seem* a bit "better" since it took 2
> nights (usually takes only one, but maybe 10% of the time it does take
> two) before oom'ing, *and* it allowed my reboot script to reboot it
> cleanly when it saw the oom (which happens only 25% of the time).
>
> I'm attaching the 4.8.13 oom message which should have the memcg info
> (71c799f4982d) you are asking for above?

It doesn't have the memcg info which is neither a part of the current
vanilla kernel output. But we have per zone LRU counters which is what I
was after. So you have a correct patch. Sorry if I confused you.

> Hopefully?

[167409.074463] nmbd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0

again lowmem request

[...]
[167409.074576] Normal free:3484kB min:3544kB low:4428kB high:5312kB active_anon:0kB inactive_anon:0kB active_file:3412kB inactive_file:1560kB unevictabl:0kB writepending:0kB present:892920kB managed:815216kB mlocked:0kB slab_reclaimable:711068kB slab_unreclaimable:49496kB kernel_stack:2904kB pagetables:0kB bounce:0kB free_pcp:240kB local_pcp:120kB free_cma:0kB

but have a look here. There are basically no pages on the Normal zone
LRU list. There is a huge amount of slab allocated here but we are not
able to reclaim it because we scale slab reclaimers based on the LRU
reclaim. This is an inherent problem of the current design and we should
address it. It is nothing really new. We just didn't have many users
affected because having a majority of memory consumed by SLAB is not a
usual situation. It seems you just hit a more aggressive slab user with
newer kernels.

Using the 32b kernel really makes all this worse because all those
allocations go to the Normal and DMA zones which will push LRU pages out
of that zone.

> > You can easily check whether this is memcg related by trying to run
> > the same workload with cgroup_disable=memory kernel command line
> > parameter. This will put all the memcg specifics out of the way.
>
> I will try booting now into cgroup_disable=memory to see if that helps
> at all. I'll reply back in 48 hours, or when it oom's, whichever comes
> first.

This will not help most probably.

> Also, should I bother trying the latest git HEAD to see if that solves
> anything? Thanks!

It might help wrt. the slab consumers but there is nothing that I would
consider a fix for the general problem of the slab shrinking I am
afraid.

--
Michal Hocko
SUSE Labs

Next message: Thomas Gleixner: "Re: [PATCH 05/12] x86/cqm,perf/core: Cgroup support prepare"
Previous message: Peter Zijlstra: "Re: [PATCH -next] init/main: Init jump_labels before they are used to build zonelists"
In reply to: Trevor Cordes: "Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]