Re: [PATCH 0/3] OOM detection rework v4

From: Michal Hocko
Date: Wed Jan 20 2016 - 07:24:53 EST


On Sun 03-01-16 00:47:30, Tetsuo Handa wrote:
[...]
> The output showed that __zone_watermark_ok() returning false on both DMA32 and DMA
> zones is the trigger of the OOM killer invocation. Direct reclaim is constantly
> reclaiming some pages, but I guess freelist for 2 <= order < MAX_ORDER are empty.

Yes and this is to be expected. Direct reclaim doesn't guarantee any
progress for high order allocations. We might be reclaiming pages which
cannot be coalesced.

> That trigger was introduced by commit 97a16fc82a7c5b0c ("mm, page_alloc: only
> enforce watermarks for order-0 allocations"), and "mm, oom: rework oom detection"
> patch hits the trigger.
[....]
> [ 154.829582] zone=DMA32 reclaimable=308907 available=312734 no_progress_loops=0 did_some_progress=50
> [ 154.831562] zone=DMA reclaimable=2 available=1728 no_progress_loops=0 did_some_progress=50
> [ 154.838499] fork invoked oom-killer: order=2, oom_score_adj=0, gfp_mask=0x27000c0(GFP_KERNEL|GFP_NOTRACK|0x100000)
> [ 154.841167] fork cpuset=/ mems_allowed=0
[...]
> [ 154.917857] Node 0 DMA32 free:17996kB min:5172kB low:6464kB high:7756kB ....
[...]
> [ 154.931918] Node 0 DMA: 107*4kB (UME) 72*8kB (ME) 47*16kB (UME) 19*32kB (UME) 9*64kB (ME) 1*128kB (M) 3*256kB (M) 2*512kB (E) 2*1024kB (UM) 0*2048kB 0*4096kB = 6908kB
> [ 154.937453] Node 0 DMA32: 1113*4kB (UME) 1400*8kB (UME) 116*16kB (UM) 15*32kB (UM) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 18052kB

It is really strange that __zone_watermark_ok claimed DMA32 unusable
here. With the target of 312734 which should easilly pass the wmark
check for the particular order and there are 116*16kB 15*32kB 1*64kB
blocks "usable" for our request because GFP_KERNEL can use both
Unmovable and Movable blocks. So it makes sense to wait for more order-0
allocations to pass the basic (NR_FREE_MEMORY) watermark and continue
with this particular allocation request.

The nr_reserved_highatomic might be too high to matter but then you see
[1] the reserce being 0. So this doesn't make much sense to me. I will
dig into it some more.

[1] http://lkml.kernel.org/r/201601161007.DDG56185.QOHMOFOLtSFJVF@xxxxxxxxxxxxxxxxxxx
--
Michal Hocko
SUSE Labs