Re: [PATCH] mm/page_owner: print largest memory consumer when OOM panic occurs

From: Qian Cai
Date: Mon Dec 23 2019 - 07:32:55 EST




> On Dec 23, 2019, at 6:33 AM, Miles Chen <miles.chen@xxxxxxxxxxxx> wrote:
>
> Motivation:
> -----------
>
> When debug with a OOM kernel panic, it is difficult to know the
> memory allocated by kernel drivers of vmalloc() by checking the
> Mem-Info or Node/Zone info. For example:
>
> Mem-Info:
> active_anon:5144 inactive_anon:16120 isolated_anon:0
> active_file:0 inactive_file:0 isolated_file:0
> unevictable:0 dirty:0 writeback:0 unstable:0
> slab_reclaimable:739 slab_unreclaimable:442469
> mapped:534 shmem:21050 pagetables:21 bounce:0
> free:14808 free_pcp:3389 free_cma:8128
>
> Node 0 active_anon:20576kB inactive_anon:64480kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> mapped:2136kB dirty:0kB writeback:0kB shmem:84200kB shmem_thp: 0kB
> shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
> all_unr eclaimable? yes
>
> Node 0 DMA free:14476kB min:21512kB low:26888kB high:32264kB
> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> active_file: 0kB inactive_file:0kB unevictable:0kB writepending:0kB
> present:1048576kB managed:952736kB mlocked:0kB kernel_stack:0kB
> pagetables:0kB bounce:0kB free_pcp:2716kB local_pcp:0kB free_cma:0kB
>
> The information above tells us the memory usage of the known memory
> categories and we can check the abnormal large numbers. However, if a
> memory leakage cannot be observed in the categories above, we need to
> reproduce the issue with CONFIG_PAGE_OWNER.
>
> It is possible to read the page owner information from coredump files.
> However, coredump files may not always be available, so my approach is
> to print out the largest page consumer when OOM kernel panic occurs.

Many of those patches helping debugging special cases had been shot down in the past. I donât see much difference this time. If you worry about memory leak, enable kmemleak and then to reproduce. Otherwise, we will end up with too many heuristics just for debugging.

>
> The heuristic approach assumes that the OOM kernel panic is caused by
> a single backtrace. The assumption is not always true but it works in
> many cases during our test.
>
> We have tested this heuristic approach since 2019/5 on android devices.
> In 38 internal OOM kernel panic reports:
>
> 31/38: can be analyzed by using existing information
> 7/38: need page owner formatino and the heuristic approach in this patch
> prints the correct backtraces of abnormal memory allocations. No need to
> reproduce the issues.