Re: [PATCH 3/3] mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory

From: Yang Shi
Date: Fri Oct 06 2017 - 12:38:16 EST




On 10/6/17 2:37 AM, Michal Hocko wrote:
On Thu 05-10-17 05:29:10, Yang Shi wrote:
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.

Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.

Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when
unreclaimable slabs amount is greater than total user memory (LRU
pages).

The output looks like:

Unreclaimable slab info:
Name Used Total
rpc_buffers 31KB 31KB
rpc_tasks 7KB 7KB
ebitmap_node 1964KB 1964KB
avtab_node 5024KB 5024KB
xfs_buf 1402KB 1402KB
xfs_ili 134KB 134KB
xfs_efi_item 115KB 115KB
xfs_efd_item 115KB 115KB
xfs_buf_item 134KB 134KB
xfs_log_item_desc 342KB 342KB
xfs_trans 1412KB 1412KB
xfs_ifork 212KB 212KB

OK this looks better. The naming is not the greatest but I will not
nitpick on this. I have one question though


Signed-off-by: Yang Shi <yang.s@xxxxxxxxxxxxxxx>
[...]
+void dump_unreclaimable_slab(void)
+{
+ struct kmem_cache *s, *s2;
+ struct slabinfo sinfo;
+
+ /*
+ * Here acquiring slab_mutex is risky since we don't prefer to get
+ * sleep in oom path. But, without mutex hold, it may introduce a
+ * risk of crash.
+ * Use mutex_trylock to protect the list traverse, dump nothing
+ * without acquiring the mutex.
+ */
+ if (!mutex_trylock(&slab_mutex)) {
+ pr_warn("excessive unreclaimable slab but cannot dump stats\n");
+ return;
+ }
+
+ pr_info("Unreclaimable slab info:\n");
+ pr_info("Name Used Total\n");
+
+ list_for_each_entry_safe(s, s2, &slab_caches, list) {
+ if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+ continue;
+
+ memset(&sinfo, 0, sizeof(sinfo));

why do you zero out the structure. All the fields you are printing are
filled out in get_slabinfo.

No special reason, just wipe out the potential stale data on the stack.

Yang


+ get_slabinfo(s, &sinfo);
+
+ if (sinfo.num_objs > 0)
+ pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
+ (sinfo.active_objs * s->size) / 1024,
+ (sinfo.num_objs * s->size) / 1024);
+ }
+ mutex_unlock(&slab_mutex);
+}
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
--
1.8.3.1