Re: [[PATCH]] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged

From: Michal Hocko
Date: Thu Sep 17 2020 - 08:13:15 EST


On Wed 16-09-20 11:28:40, Vijay Balakrishna wrote:
[...]
> OOM splat below. I see we had kmem leak detection turned on here. We
> haven't run stress with kmem leak detection since uncovereing low
> min_free_kbytes. During investigation we wanted to make sure there is no
> kmem leaks, we didn't find significant leaks detected.
>
> [330319.766059] systemd invoked oom-killer:
> gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=1, oom_score_adj=0

[...]
> [330319.861064] Mem-Info:
> [330319.863519] active_anon:60744 inactive_anon:109226 isolated_anon:0
> active_file:6418 inactive_file:3869 isolated_file:2
> unevictable:0 dirty:8 writeback:1 unstable:0
> slab_reclaimable:34660 slab_unreclaimable:795718
> mapped:1256 shmem:165765 pagetables:689 bounce:0
> free:340962 free_pcp:4672 free_cma:0

The memory consumption is predominantely in slab (unreclaimable). Only
~8% of the memory is on LRUs (anonymous + file). Slab (both reclaimable
and unreclaimable) is ~40%. So there is still a lot of memory
unaccounted (direct users of the page allocator). This would partially
explain why the oom killer is not able to make progress and eventually
panics because it is the kernel which is blowing the memory consumption.

There is still ~1G free memory but the problem is that this is a
GFP_KERNEL request which is not allowed to consume Movable memory.
Zone normal is depleted and therefore it cannot satisfy this request
even when there are some order-1 pages available.

> [330319.928124] Node 0 Normal free:12652kB min:14344kB low:19092kB=20
> high:23840kB active_anon:55340kB inactive_anon:60276kB active_file:60kB
> inactive_file:128kB unevictable:0kB writepending:4kB present:6220656kB
> managed:4750196kB mlocked:0kB kernel_stack:9568kB pagetables:2756kB
> bounce:0kB free_pcp:10056kB local_pcp:1376kB free_cma:0kB
[...]
> [330319.996879] Node 0 Normal: 3138*4kB (UME) 38*8kB (UM) 0*16kB 0*32kB
> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 12856kB

I do not see the state of swap in the oom splat so I assume you have
swap disabled. If that is the case then the memory reclaim cannot really
do much for this request. There is almost no page cache to reclaim.

That being said I do not see how a increased min_free_kbytes could help
for this particular OOM situation. If there is really any relation it is
more of a unintended side effect.

[...]
> > > Extreme values can damage your system. Setting min_free_kbytes to an
> > > extremely low value prevents the system from reclaiming memory, which can
> > > result in system hangs and OOM-killing processes. However, setting
> > > min_free_kbytes too high (for example, to 5–10% of total system memory)
> > > causes the system to enter an out-of-memory state immediately, resulting in
> > > the system spending too much time reclaiming memory.
> >
> > The auto tuned value should never reach such a low value to cause
> > problems.
>
> The auto tuned value is incorrect post hotplug memory operation, in our use
> case memoy hot add occurs very early during boot.

Define incorrect. What are the actual values? Have you tried to increase
the value manually after the hotplug?

--
Michal Hocko
SUSE Labs