How to fix oom-killer on DRA7x when enabling LPAE?

From: Brian McFarland
Date: Wed Oct 11 2017 - 17:05:40 EST


I'm running on a 4.4.45+ release based on TI's 6AM.1.3 release with an
Android MM user space.

http://git.omapzoom.org/kernel/?p=kernel/omap.git

LPAE was previously disabled on the system because we were only using
2GB of RAM. We've expanded that to 4GB of RAM, enabled LPAE to take
advantage of it and now see oom killer crashes.


It's always a low order allocation that fails, though exact source and
gfp_mask varies.

Example oom-killer events / gfp masks:

GLThread 583 invoked oom-killer: gfp_mask=0x24000c4, order=0, oom_score_adj=0
Binder_A invoked oom-killer: gfp_mask=0x26000c0, order=1, oom_score_adj=-705
top invoked oom-killer: gfp_mask=0x24000d0, order=0, oom_score_adj=-1000

Mem info from one such event:

[ 358.267219] Mem-Info:
[ 358.270438] active_anon:246392 inactive_anon:22796 isolated_anon:0
[ 358.270438] active_file:55368 inactive_file:74077 isolated_file:0
[ 358.270438] unevictable:0 dirty:5 writeback:0 unstable:0
[ 358.270438] slab_reclaimable:3246 slab_unreclaimable:6453
[ 358.270438] mapped:145719 shmem:23090 pagetables:5828 bounce:0
[ 358.270438] free:492267 free_pcp:182 free_cma:10778
[ 358.288491] DMA free:23856kB min:2688kB low:7020kB high:7692kB
active_anon:88860kB inactive_anon:89928kB active_file:60kB
inactive_file:92kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:782336kB managed:627212kB mlocked:0kB
dirty:0kB writeback:0kB mapped:206996kB shmem:89944kB
slab_reclaimable:12984kB slab_unreclaimable:25812kB
kernel_stack:9552kB pagetables:1556kB unstable:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:20812kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[ 358.312859] lowmem_reserve[]: 0 0 3283 3283
[ 358.315307] HighMem free:1949088kB min:512kB low:23908kB
high:27536kB active_anon:896808kB inactive_anon:1256kB
active_file:221312kB inactive_file:292232kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:3386368kB
managed:3386368kB mlocked:0kB dirty:20kB writeback:0kB mapped:375880kB
shmem:2416kB slab_reclaimable:0kB slab_unreclaimable:0kB
kernel_stack:0kB pagetables:21756kB unstable:0kB bounce:0kB
free_pcp:792kB local_pcp:120kB free_cma:22300kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[ 358.342671] lowmem_reserve[]: 0 0 0 0
[ 358.344770] DMA: 935*4kB (MEC) 502*8kB (UMEC) 114*16kB (UMC)
29*32kB (UMC) 8*64kB (C) 1*128kB (C) 0*256kB 0*512kB 0*1024kB 0*2048kB
3*4096kB (C) = 23436kB
[ 358.354477] HighMem: 9863*4kB (UMC) 5032*8kB (UMC) 1566*16kB (UMC)
501*32kB (UMC) 233*64kB (UMC) 48*128kB (UMC) 20*256kB (UMC) 9*512kB
(UM) 4*1024kB (MC) 0*2048kB 438*4096kB (MC) = 1949724kB
[ 358.366813] 151233 total pagecache pages
[ 358.369254] 0 pages in swap cache
[ 358.371281] Swap cache stats: add 0, delete 0, find 0/0
[ 358.378158] Free swap = 0kB
[ 358.382138] Total swap = 0kB
[ 358.383621] 1042176 pages RAM
[ 358.394011] 846592 pages HighMem/MovableOnly
[ 358.402905] platform dabr_udc.0: SETUP : ff.ff vffff i0000 l0 DATA_IN
[ 358.402989] 38781 pages reserved
[ 358.402991] 49152 pages cma reserve


Things I've tried (that have failed or come up with no clues):

- I've attempted both removing reserved memory carve outs from our
device tree (normally there for other cores on the SoC), and adjusting
vmalloc size to provide more low memory to the system. I'm able to
grant the kernel about 100MB extra low mem, but the problem still
occurs.

- kmemleak does not show any obvious issues (I thought it might since
the extra 100MB gets swallowed up).

- Ran some tests looking at ftrace for kmem_cache_alloc, kmalloc, and
friends on both non-lpae and lpae configurations. I'm not seeing any
obvious differences between the two.

Seems strange that order 1 or 0 allocations would fail we're reporting
>20MB free low mem.

About out of ideas to debug this, aside from going through kernel/mm
line by line or trying to understand each of the CONFIG_ARM_LPAE
changes.

Any suggestions would be appreciated.

Regards,
Brian