Re: [PATCH 3/3] mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory
From: David Rientjes
Date: Mon Oct 16 2017 - 20:16:07 EST
On Wed, 11 Oct 2017, Yang Shi wrote:
> @@ -161,6 +162,25 @@ static bool oom_unkillable_task(struct task_struct *p,
> return false;
> }
>
> +/*
> + * Print out unreclaimble slabs info when unreclaimable slabs amount is greater
> + * than all user memory (LRU pages)
> + */
> +static bool is_dump_unreclaim_slabs(void)
> +{
> + unsigned long nr_lru;
> +
> + nr_lru = global_node_page_state(NR_ACTIVE_ANON) +
> + global_node_page_state(NR_INACTIVE_ANON) +
> + global_node_page_state(NR_ACTIVE_FILE) +
> + global_node_page_state(NR_INACTIVE_FILE) +
> + global_node_page_state(NR_ISOLATED_ANON) +
> + global_node_page_state(NR_ISOLATED_FILE) +
> + global_node_page_state(NR_UNEVICTABLE);
> +
> + return (global_node_page_state(NR_SLAB_UNRECLAIMABLE) > nr_lru);
> +}
I think this is an excessive requirement to meet to dump potentially very
helpful information to the kernel log. On my 256GB system, this would
probably require >128GB of unreclaimable slab to trigger. If a single
slab cache leaker were to blame for this excessive usage, it would suffice
to only print a single line showing the slab cache with the greatest
memory footprint.
It also prevents us from diagnosing issues where reclaimable slab isn't
actually reclaimed as expected, so the scope is too narrow.
Previous iterations of this patchset were actually better because it
presented useful data that wasn't restricted to excessive requirements for
a very narrow scope.
Please simply dump statistics for all slab caches where the memory
footprint is greater than 5% of system memory.