Re: [[PATCH]] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged

From: Vijay Balakrishna
Date: Thu Sep 17 2020 - 14:04:33 EST




On 9/17/2020 5:12 AM, Michal Hocko wrote:
On Wed 16-09-20 11:28:40, Vijay Balakrishna wrote:
[...]
OOM splat below. I see we had kmem leak detection turned on here. We
haven't run stress with kmem leak detection since uncovereing low
min_free_kbytes. During investigation we wanted to make sure there is no
kmem leaks, we didn't find significant leaks detected.

[330319.766059] systemd invoked oom-killer:
gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=1, oom_score_adj=0

[...]
[330319.861064] Mem-Info:
[330319.863519] active_anon:60744 inactive_anon:109226 isolated_anon:0
active_file:6418 inactive_file:3869 isolated_file:2
unevictable:0 dirty:8 writeback:1 unstable:0
slab_reclaimable:34660 slab_unreclaimable:795718
mapped:1256 shmem:165765 pagetables:689 bounce:0
free:340962 free_pcp:4672 free_cma:0

The memory consumption is predominantely in slab (unreclaimable). Only
~8% of the memory is on LRUs (anonymous + file). Slab (both reclaimable
and unreclaimable) is ~40%. So there is still a lot of memory
unaccounted (direct users of the page allocator). This would partially
explain why the oom killer is not able to make progress and eventually
panics because it is the kernel which is blowing the memory consumption.

There is still ~1G free memory but the problem is that this is a
GFP_KERNEL request which is not allowed to consume Movable memory.
Zone normal is depleted and therefore it cannot satisfy this request
even when there are some order-1 pages available.

[330319.928124] Node 0 Normal free:12652kB min:14344kB low:19092kB=20
high:23840kB active_anon:55340kB inactive_anon:60276kB active_file:60kB
inactive_file:128kB unevictable:0kB writepending:4kB present:6220656kB
managed:4750196kB mlocked:0kB kernel_stack:9568kB pagetables:2756kB
bounce:0kB free_pcp:10056kB local_pcp:1376kB free_cma:0kB
[...]
[330319.996879] Node 0 Normal: 3138*4kB (UME) 38*8kB (UM) 0*16kB 0*32kB
0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 12856kB

I do not see the state of swap in the oom splat so I assume you have
swap disabled. If that is the case then the memory reclaim cannot really
do much for this request. There is almost no page cache to reclaim.

No swap configured in our system.

That being said I do not see how a increased min_free_kbytes could help
for this particular OOM situation. If there is really any relation it is
more of a unintended side effect.

I haven't had a chance to rerun stress with kmem leak detection to know if we still see OOM kills after min_free_kbytes restore.

[...]
Extreme values can damage your system. Setting min_free_kbytes to an
extremely low value prevents the system from reclaiming memory, which can
result in system hangs and OOM-killing processes. However, setting
min_free_kbytes too high (for example, to 5–10% of total system memory)
causes the system to enter an out-of-memory state immediately, resulting in
the system spending too much time reclaiming memory.

The auto tuned value should never reach such a low value to cause
problems.

The auto tuned value is incorrect post hotplug memory operation, in our use
case memoy hot add occurs very early during boot.
Define incorrect. What are the actual values? Have you tried to increase
the value manually after the hotplug?

In our case SoC with 8GB memory, system tuned min_free_kbytes
- first to 22528
- we perform memory hot add very early in boot
- now min_free_kbytes is 8703

Before looking at code, first I manually restored min_free_kbytes soon after boot, reran stress and didn't notice symptoms I mentioned in change log.

Thanks,
Vijay