Re: getting oom/stalls for ltp test cpuset01 with latest/4.9 kernel

From: Michal Hocko
Date: Wed Jan 11 2017 - 11:32:10 EST


On Wed 11-01-17 16:20:45, Ganapatrao Kulkarni wrote:
> Hi,
>
> we are seeing OOM/stalls messages when we run ltp cpuset01(cpuset01 -I
> 360) test for few minutes, even through the numa system has adequate
> memory on both nodes.
>
> this we have observed same on both arm64/thunderx numa and on x86 numa system!
>
> using latest ltp from master branch version 20160920-197-gbc4d3db
> and linux kernel version 4.9
>
> is this known bug already?
>
> below is the oops log:
> [ 2280.275193] cgroup: new mount options do not match the existing
> superblock, will be ignored
> [ 2316.565940] cgroup: new mount options do not match the existing
> superblock, will be ignored
> [ 2393.388361] cpuset01: page allocation stalls for 10051ms, order:0, mode:0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)

For some reason I thought we are printing the nodemask here. We are
not... Which sucks in situations like this. I will cook up a patch...

[...[
> [ 2393.388457] Node 1 Normal free:11937124kB min:45532kB low:62044kB
> high:78556kB active_anon:58896kB inactive_anon:58552kB
> active_file:288kB inactive_file:0kB unevictable:4kB
> writepending:23384kB present:16777216kB managed:16512808kB mlocked:4kB
> slab_reclaimable:37876kB slab_unreclaimable:44812kB
> kernel_stack:4264kB pagetables:27612kB bounce:0kB free_pcp:2240kB
> local_pcp:0kB free_cma:0kB

It seems that there is a lot of free memory in this node which seems to
be the only eligible one because there are no details about Node 0
zones. So there shouldn't be any real reason to stall this allocation.
Unless there was a huge memory pressure and the relief came only
recently when the current task just managed to get out of the reclaim
and report the stall.

Is there any other workload running on this system?
[...]
> [ 2397.331098] cpuset01 invoked oom-killer:
> gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=1,
> order=0, oom_score_adj=0

Please attach the full oom report.
--
Michal Hocko
SUSE Labs