Re: [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v2

From: Joonsoo Kim
Date: Tue Jul 26 2016 - 04:07:06 EST


On Thu, Jul 21, 2016 at 03:10:56PM +0100, Mel Gorman wrote:
> Both Joonsoo Kim and Minchan Kim have reported premature OOM kills.
> The common element is a zone-constrained allocation failings. Two factors
> appear to be at fault -- pgdat being considered unreclaimable prematurely
> and insufficient rotation of the active list.
>
> The series is in three basic parts;
>
> Patches 1-3 add per-zone stats back in. The actual stats patch is different
> to Minchan's as the original patch did not account for unevictable
> LRU which would corrupt counters. The second two patches remove
> approximations based on pgdat statistics. It's effectively a
> revert of "mm, vmstat: remove zone and node double accounting
> by approximating retries" but different LRU stats are used. This
> is better than a full revert or a reworking of the series as it
> preserves history of why the zone stats are necessary.
>
> If this work out, we may have to leave the double accounting in
> place for now until an alternative cheap solution presents itself.
>
> Patch 4 rotates inactive/active lists for lowmem allocations. This is also
> quite different to Minchan's patch as the original patch did not
> account for memcg and would rotate if *any* eligible zone needed
> rotation which may rotate excessively. The new patch considers the
> ratio for all eligible zones which is more in line with node-lru
> in general.
>
> Patch 5 accounts for skipped pages as partial scanned. This avoids the pgdat
> being prematurely marked unreclaimable while still allowing it to
> be marked unreclaimable if there are no reclaimable pages.
>
> These patches did not OOM for me on a 2G 32-bit KVM instance while running
> a stress test for an hour. Preliminary tests on a 64-bit system using a
> parallel dd workload did not show anything alarming.
>
> If an OOM is detected then please post the full OOM message.

Before attaching OOM message, I should note that my test case also triggers
OOM in old kernel if there are four parallel file-readers. With node-lru and
patch 1~5, OOM is triggered even if there are one or more parallel file-readers.
With node-lru and patch 1~4, OOM is triggered if there are two or more
parallel file-readers.

Here goes OOM message.

fork invoked oom-killer: gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [108/9620]
fork cpuset=/ mems_allowed=0
CPU: 0 PID: 4304 Comm: fork Not tainted 4.7.0-rc7-next-20160720+ #713
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
0000000000000000 ffff8800209ab960 ffffffff8142bd03 ffff8800209abb58
ffff8800209a0000 ffff8800209ab9d8 ffffffff81241a59 ffffffff81e70020
ffff8800209ab988 ffffffff810dddcd ffff8800209ab9a8 0000000000000206
Call Trace:
[<ffffffff8142bd03>] dump_stack+0x85/0xc2
[<ffffffff81241a59>] dump_header+0x5c/0x22e
[<ffffffff810dddcd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff811b33e1>] oom_kill_process+0x221/0x3f0
[<ffffffff811b3a22>] out_of_memory+0x422/0x560
[<ffffffff811b9f69>] __alloc_pages_nodemask+0x1069/0x10c0
[<ffffffff81211a41>] ? alloc_pages_vma+0xc1/0x300
[<ffffffff81211a41>] alloc_pages_vma+0xc1/0x300
[<ffffffff811e851f>] ? wp_page_copy+0x7f/0x640
[<ffffffff811e851f>] wp_page_copy+0x7f/0x640
[<ffffffff811e974b>] do_wp_page+0x13b/0x6e0
[<ffffffff811ec704>] handle_mm_fault+0xaf4/0x1310
[<ffffffff811ebc4b>] ? handle_mm_fault+0x3b/0x1310
[<ffffffff8106eb90>] ? __do_page_fault+0x160/0x4e0
[<ffffffff8106ec19>] __do_page_fault+0x1e9/0x4e0
[<ffffffff8106efed>] trace_do_page_fault+0x5d/0x290
[<ffffffff810674ca>] do_async_page_fault+0x1a/0xa0
[<ffffffff8185bee8>] async_page_fault+0x28/0x30
[<ffffffff810a73d3>] ? __task_pid_nr_ns+0xb3/0x1b0
[<ffffffff8143ab9c>] ? __put_user_4+0x1c/0x30
[<ffffffff810b7205>] ? schedule_tail+0x55/0x70
[<ffffffff81859f3c>] ret_from_fork+0xc/0x40
Mem-Info:
active_anon:26762 inactive_anon:95 isolated_anon:0
active_file:42543 inactive_file:347438 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:5476 slab_unreclaimable:23140
mapped:389534 shmem:95 pagetables:20927 bounce:0
free:6948 free_pcp:222 free_cma:0
Node 0 active_anon:107048kB inactive_anon:380kB active_file:170008kB inactive_file:1389752kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1558136kB dirty:0kB writeback:0kB shmem:0kB shmem_$
hp: 0kB shmem_pmdmapped: 0kB anon_thp: 380kB writeback_tmp:0kB unstable:0kB pages_scanned:4697206 all_unreclaimable? yes
Node 0 DMA free:2168kB min:204kB low:252kB high:300kB active_anon:3544kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB sl$
b_reclaimable:0kB slab_unreclaimable:2684kB kernel_stack:1760kB pagetables:3092kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 493 493 1955
Node 0 DMA32 free:6508kB min:6492kB low:8112kB high:9732kB active_anon:81264kB inactive_anon:0kB active_file:101204kB inactive_file:228kB unevictable:0kB writepending:0kB present:2080632kB managed:508584k$
mlocked:0kB slab_reclaimable:21904kB slab_unreclaimable:89876kB kernel_stack:46400kB pagetables:80616kB bounce:0kB free_pcp:544kB local_pcp:120kB free_cma:0kB
lowmem_reserve[]: 0 0 0 1462
Node 0 Movable free:19116kB min:19256kB low:24068kB high:28880kB active_anon:22240kB inactive_anon:380kB active_file:68812kB inactive_file:1389688kB unevictable:0kB writepending:0kB present:1535864kB mana$
ed:1500964kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:368kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 0*4kB 1*8kB (M) 1*16kB (U) 1*32kB (M) 1*64kB (U) 0*128kB 0*256kB 2*512kB (UM) 1*1024kB (U) 0*2048kB 0*4096kB = 2168kB
Node 0 DMA32: 51*4kB (UME) 96*8kB (ME) 46*16kB (UME) 41*32kB (ME) 32*64kB (ME) 11*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6476kB
Node 0 Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 1*64kB (M) 0*128kB 1*256kB (M) 1*512kB (M) 0*1024kB 1*2048kB (M) 4*4096kB (M) = 19324kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
390134 total pagecache pages
0 pages in swap cache


>
> Optionally please test without patch 5 if an OOM occurs.

Here goes without patch 5.

fork invoked oom-killer: gfp_mask=0x26000c0(GFP_KERNEL|__GFP_NOTRACK), order=0, oom_score_adj=0 [2[2/9152]
fork cpuset=/ mems_allowed=0
CPU: 5 PID: 1269 Comm: fork Not tainted 4.7.0-rc7-next-20160720+ #714
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
0000000000000000 ffff8800136138e8 ffffffff8142bd23 ffff880013613ae0
ffff88000fa6ca00 ffff880013613960 ffffffff81241a79 ffffffff81e70020
ffff880013613910 ffffffff810dddcd ffff880013613930 0000000000000206
Call Trace:
[<ffffffff8142bd23>] dump_stack+0x85/0xc2
[<ffffffff81241a79>] dump_header+0x5c/0x22e
[<ffffffff810dddcd>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff811b33e1>] oom_kill_process+0x221/0x3f0
[<ffffffff811b3a22>] out_of_memory+0x422/0x560
[<ffffffff811b9f69>] __alloc_pages_nodemask+0x1069/0x10c0
[<ffffffff8120fb01>] ? alloc_pages_current+0xa1/0x1f0
[<ffffffff8120fb01>] alloc_pages_current+0xa1/0x1f0
[<ffffffff81219f33>] ? new_slab+0x473/0x5e0
[<ffffffff81219f33>] new_slab+0x473/0x5e0
[<ffffffff8121b16f>] ___slab_alloc+0x27f/0x550
[<ffffffff8121b491>] ? __slab_alloc+0x51/0x90
[<ffffffff81081e11>] ? copy_process.part.29+0xc11/0x1b90
[<ffffffff81081e11>] ? copy_process.part.29+0xc11/0x1b90
[<ffffffff8121b491>] __slab_alloc+0x51/0x90
[<ffffffff8121b6dc>] kmem_cache_alloc+0x20c/0x2b0
[<ffffffff81081e11>] ? copy_process.part.29+0xc11/0x1b90
[<ffffffff81081e11>] copy_process.part.29+0xc11/0x1b90
[<ffffffff81082f86>] _do_fork+0xe6/0x6a0
[<ffffffff810835e9>] SyS_clone+0x19/0x20
[<ffffffff81003e13>] do_syscall_64+0x73/0x1e0
[<ffffffff81859dc3>] entry_SYSCALL64_slow_path+0x25/0x25
Mem-Info:
active_anon:26003 inactive_anon:95 isolated_anon:0
active_file:289026 inactive_file:96101 isolated_file:21
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:6056 slab_unreclaimable:23737
mapped:384788 shmem:95 pagetables:23282 bounce:0
free:7815 free_pcp:179 free_cma:0
Node 0 active_anon:104012kB inactive_anon:380kB active_file:1156104kB inactive_file:384404kB unevictable:0kB isolated(anon):0kB isolated(file):84kB mapped:1539152kB dirty:0kB writeback:0kB shmem:0kB shmem_
thp: 0kB shmem_pmdmapped: 2048kB anon_thp: 380kB writeback_tmp:0kB unstable:0kB pages_scanned:2512936 all_unreclaimable? yes
Node 0 DMA free:2172kB min:204kB low:252kB high:300kB active_anon:3204kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB sla
b_reclaimable:16kB slab_unreclaimable:2944kB kernel_stack:1584kB pagetables:3188kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 493 493 1955
Node 0 DMA32 free:6320kB min:6492kB low:8112kB high:9732kB active_anon:79128kB inactive_anon:0kB active_file:69016kB inactive_file:15872kB unevictable:0kB writepending:0kB present:2080632kB managed:508584k
B mlocked:0kB slab_reclaimable:24208kB slab_unreclaimable:92004kB kernel_stack:44064kB pagetables:89940kB bounce:0kB free_pcp:264kB local_pcp:100kB free_cma:0kB
lowmem_reserve[]: 0 0 0 1462
Node 0 Movable free:22768kB min:19256kB low:24068kB high:28880kB active_anon:21676kB inactive_anon:380kB active_file:1085592kB inactive_file:369724kB unevictable:0kB writepending:0kB present:1535864kB mana
ged:1500964kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:452kB local_pcp:80kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB (M) 0*8kB 1*16kB (M) 1*32kB (M) 1*64kB (M) 0*128kB 2*256kB (UM) 1*512kB (M) 1*1024kB (U) 0*2048kB 0*4096kB = 2172kB
Node 0 DMA32: 94*4kB (ME) 48*8kB (ME) 22*16kB (ME) 10*32kB (UME) 3*64kB (ME) 1*128kB (M) 0*256kB 2*512kB (UM) 4*1024kB (M) 0*2048kB 0*4096kB = 6872kB
Node 0 Movable: 0*4kB 0*8kB 1*16kB (M) 3*32kB (M) 4*64kB (M) 1*128kB (M) 10*256kB (M) 3*512kB (M) 0*1024kB 1*2048kB (M) 4*4096kB (M) = 23024kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
385234 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0

Thanks.

> include/linux/mm_inline.h | 19 ++---------
> include/linux/mmzone.h | 7 ++++
> include/linux/swap.h | 1 +
> mm/compaction.c | 20 +----------
> mm/migrate.c | 2 ++
> mm/page-writeback.c | 17 +++++-----
> mm/page_alloc.c | 59 +++++++++++----------------------
> mm/vmscan.c | 84 ++++++++++++++++++++++++++++++++++++++---------
> mm/vmstat.c | 6 ++++
> 9 files changed, 116 insertions(+), 99 deletions(-)
>
> --
> 2.6.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>