Re: [PATCH] mm, oom: protect !costly allocations some more (was: Re: [PATCH 0/3] OOM detection rework v4)

From: Michal Hocko
Date: Tue Mar 08 2016 - 04:08:32 EST


On Tue 08-03-16 12:51:04, Sergey Senozhatsky wrote:
> Hello Michal,
>
> On (03/07/16 17:08), Michal Hocko wrote:
> > On Mon 29-02-16 22:02:13, Michal Hocko wrote:
> > > Andrew,
> > > could you queue this one as well, please? This is more a band aid than a
> > > real solution which I will be working on as soon as I am able to
> > > reproduce the issue but the patch should help to some degree at least.
> >
> > Joonsoo wasn't very happy about this approach so let me try a different
> > way. What do you think about the following? Hugh, Sergey does it help
> > for your load? I have tested it with the Hugh's load and there was no
> > major difference from the previous testing so at least nothing has blown
> > up as I am not able to reproduce the issue here.
>
> (next-20160307 + "[PATCH] mm, oom: protect !costly allocations some more")
>
> seems it's significantly less likely to oom-kill now, but I still can see
> something like this

Thanks for the testing. This is highly appreciated. If you are able to
reproduce this then collecting compaction related tracepoints might be
really helpful.

> [ 501.942745] coretemp-sensor invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[...]
> [ 501.942853] active_anon:151312 inactive_anon:54791 isolated_anon:0
> active_file:31213 inactive_file:302048 isolated_file:0
> unevictable:0 dirty:44 writeback:221 unstable:0
> slab_reclaimable:43570 slab_unreclaimable:5651
> mapped:16660 shmem:29495 pagetables:2542 bounce:0
> free:10884 free_pcp:214 free_cma:0
[...]
> [ 501.942867] DMA32 free:23664kB min:6232kB low:9332kB high:12432kB active_anon:516228kB inactive_anon:129136kB active_file:96508kB inactive_file:954780kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3194880kB managed:3107512kB mlocked:0kB dirty:136kB writeback:440kB mapped:51816kB shmem:91488kB slab_reclaimable:129856kB slab_unreclaimable:13876kB kernel_stack:2160kB pagetables:7888kB unstable:0kB bounce:0kB free_pcp:724kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? no
> [ 501.942870] lowmem_reserve[]: 0 0 824 824
> [ 501.942876] Normal free:4784kB min:1696kB low:2540kB high:3384kB active_anon:89020kB inactive_anon:90028kB active_file:28248kB inactive_file:253308kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:917504kB managed:844512kB mlocked:0kB dirty:40kB writeback:444kB mapped:14700kB shmem:26492kB slab_reclaimable:44396kB slab_unreclaimable:8620kB kernel_stack:1328kB pagetables:2280kB unstable:0kB bounce:0kB free_pcp:244kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:60 all_unreclaimable? no

Both DMA32 and Normal zones are over high watermarks so this OOM is due
to the memory fragmentation.

> [ 501.942912] DMA32: 564*4kB (UME) 2700*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 23856kB
> [ 501.942921] Normal: 959*4kB (ME) 128*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4860kB

There are no order-2+ pages usable even after we know that the
compaction was active and didn't back out early. I might be missing
something of course and the patch might still be tweaked to be more
conservative. Tracepoints should tell us more though.

Thanks!
--
Michal Hocko
SUSE Labs