Re: [question] how to figure out OOM reason? should dump slab/vmallocinfo when OOM?
From: David Rientjes
Date: Tue Jan 21 2014 - 15:41:52 EST
On Tue, 21 Jan 2014, Jianguo Wu wrote:
> > The problem is that slabinfo becomes excessively verbose and dumping it
> > all to the kernel log often times causes important messages to be lost.
> > This is why we control things like the tasklist dump with a VM sysctl. It
> > would be possible to dump, say, the top ten slab caches with the highest
> > memory usage, but it will only be helpful for slab leaks. Typically there
> > are better debugging tools available than analyzing the kernel log; if you
> > see unusually high slab memory in the meminfo dump, you can enable it.
> >
>
> But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc
> is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys?
>
You could, but it's a tradeoff between how much to dump to a general
resource such as the kernel log and how many sysctls we add that control
every possible thing. Slab leaks would definitely be a minority of oom
conditions and you should normally be able to reproduce them by running
the same workload; just use slabtop(1) or manually inspect /proc/slabinfo
while such a workload is running for indicators. I don't think we want to
add the information by default, though, nor do we want to add sysctls to
control the behavior (you'd still need to reproduce the issue after
enabling it).
We are currently discussing userspace oom handlers, though, that would
allow you to run a process that would be notified and allowed to allocate
a small amount of memory on oom conditions. It would then be trivial to
dump any information you feel pertinent in userspace prior to killing
something. I like to inspect heap profiles for memory hogs while
debugging our malloc() issues, for example, and you could look more
closely at kernel memory.
I'll cc you on future discussions of that feature.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/