Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

From: Trevor Cordes
Date: Sun Jan 15 2017 - 01:36:32 EST


On 2017-01-12 Michal Hocko wrote:
> On Wed 11-01-17 16:52:32, Trevor Cordes wrote:
> [...]
> > I'm not sure how I can tell if my bug is because of memcgs so here
> > is a full first oom example (attached).
>
> 4.7 kernel doesn't contain 71c799f4982d ("mm: add per-zone lru list
> stat") so the OOM report will not tell us whether the Normal zone
> doesn't age active lists, unfortunatelly.

I compiled the patch Mel provided into the stock F23 kernel
4.8.13-100.fc23.i686+PAE and it ran for 2 nights. It didn't oom the
first night, but did the second night. So the bug persists even with
that patch. However, it does *seem* a bit "better" since it took 2
nights (usually takes only one, but maybe 10% of the time it does take
two) before oom'ing, *and* it allowed my reboot script to reboot it
cleanly when it saw the oom (which happens only 25% of the time).

I'm attaching the 4.8.13 oom message which should have the memcg info
(71c799f4982d) you are asking for above? Hopefully?

> You can easily check whether this is memcg related by trying to run
> the same workload with cgroup_disable=memory kernel command line
> parameter. This will put all the memcg specifics out of the way.

I will try booting now into cgroup_disable=memory to see if that helps
at all. I'll reply back in 48 hours, or when it oom's, whichever comes
first.

Also, should I bother trying the latest git HEAD to see if that solves
anything? Thanks!

Attachment: oom2
Description: Binary data