Re: [RFC] 3.10 kernel- oom with about 24G free memory

From: Michal Hocko
Date: Fri Feb 10 2017 - 02:10:03 EST


On Fri 10-02-17 09:13:58, Yisheng Xie wrote:
> hi Michal,
> Thanks for your comment.
>
> On 2017/2/9 21:41, Michal Hocko wrote:
> > On Thu 09-02-17 14:26:28, Michal Hocko wrote:
> >> On Thu 09-02-17 20:54:49, Yisheng Xie wrote:
> >>> Hi all,
> >>> I get an oom on a linux 3.10 kvm guest OS. when it triggers the oom
> >>> it have about 24G free memory(and host OS have about 10G free memory)
> >>> and watermark is sure ok.
> >>>
> >>> I also check about about memcg limit value, also cannot find the
> >>> root cause.
> >>>
> >>> Is there anybody ever meet similar problem and have any idea about it?
> >>>
> >>> Any comment is more than welcome!
> >>>
> >>> Thanks
> >>> Yisheng Xie
> >>>
> >>> -------------
> >>> [ 81.234289] DefSch0200 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
> >>> [ 81.234295] DefSch0200 cpuset=/ mems_allowed=0
> >>> [ 81.234299] CPU: 3 PID: 8284 Comm: DefSch0200 Tainted: G O E ----V------- 3.10.0-229.42.1.105.x86_64 #1
> >>> [ 81.234301] Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.8.1-0-g4adadbd-20161111_105425-HGH1000008200 04/01/2014
> >>> [ 81.234303] ffff880ae2900000 000000002b3489d7 ffff880b6cec7c58 ffffffff81608d3d
> >>> [ 81.234307] ffff880b6cec7ce8 ffffffff81603d1c 0000000000000000 ffff880b6cd09000
> >>> [ 81.234311] ffff880b6cec7cd8 000000002b3489d7 ffff880b6cec7ce0 ffffffff811bdd77
> >>> [ 81.234314] Call Trace:
> >>> [ 81.234323] [<ffffffff81608d3d>] dump_stack+0x19/0x1b
> >>> [ 81.234327] [<ffffffff81603d1c>] dump_header+0x8e/0x214
> >>> [ 81.234333] [<ffffffff811bdd77>] ? mem_cgroup_iter+0x177/0x2b0
> >>> [ 81.234339] [<ffffffff8115d83e>] check_panic_on_oom+0x2e/0x60
> >>> [ 81.234342] [<ffffffff811c17bf>] mem_cgroup_oom_synchronize+0x34f/0x580
> >>
> >> OK, so this is a memcg OOM killer which panics because the configuration
> >> says so. The OOM report doesn't say so and that is the bug. dump_header
> >> is memcg aware and mem_cgroup_out_of_memory initializes oom_control
> >> properly. Is this Vanilla kernel?
>
> That means we should raise the limit of that memcg to avoid memcg OOM killer, right?

Why do you configure the system to panic on memcg OOM in the first
place. This is a wrong thing to do in 99% of cases.

--
Michal Hocko
SUSE Labs