Re: OOM-killer and strange RSS value in 3.9-rc7

From: Michal Hocko
Date: Thu Apr 18 2013 - 13:55:22 EST


On Fri 19-04-13 00:55:31, Han Pingtian wrote:
> On Thu, Apr 18, 2013 at 07:17:36AM -0700, Michal Hocko wrote:
> > On Thu 18-04-13 18:15:41, Han Pingtian wrote:
> > > On Wed, Apr 17, 2013 at 07:19:09AM -0700, Michal Hocko wrote:
> > > > On Wed 17-04-13 17:47:50, Han Pingtian wrote:
> > > > > [ 5233.949714] Node 1 DMA free:3968kB min:7808kB low:9728kB high:11712kB active_anon:0kB inactive_anon:3584kB active_file:2240kB inactive_file:576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4194304kB managed:3854464kB mlocked:0kB dirty:64kB writeback:448kB mapped:0kB shmem:64kB slab_reclaimable:106496kB slab_unreclaimable:3654976kB kernel_stack:14912kB pagetables:18496kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:531 all_unreclaimable? yes
> > > >
> > > > This smells either like a slab backed memory leak or something went
> > > > crazy and allocate huge amount of slab. You have 3.6G (or of 4G
> > > > available) of slab_unreclaimable. I would check /proc/slabinfo for which
> > > > cache consumes that huge amount of memory.
> > >
> > > Thanks your reply. But I cannot find any clues in the slabinfo:
> >
> > awk '{val=$3*$4; printf "%s %d\n", $1, val}' /proc/slabinfo | sort -k2 -n
> > says:
> > [...]
> > kmalloc-65536 41943040
> > kmemleak_object 112746000
> > pgtable-2^12 113246208
> > kmalloc-8192 122159104
> > kmalloc-32768 137887744
> > task_struct 241293920
> > kmalloc-2048 306446336
> > kmalloc-96 307652928
> > kmalloc-16384 516620288
> >
> Oh, I see. I only calculated "$2*$4" and got some small numbers. Thanks.

OK, this is interesting. Only 865M out of 3.5G slabs are on the partial
or full lists. I do not have much time to look at this more closely but
it would suggest that free slabs do not get returned to the system.

> > Hmm, how many processes you have running? Having 240M in task_structs
> > sounds quite excessive. Also there seem to be quite a lot of memory used
> > in the generic 16K, 96B and 2K caches. Core kernel usually do not use
> > those on its own so I would be inclined to suspect some driver.
> There are 671 processes is running and most of them are kernel thread I
> think:

awk '{val=$2*$4; sum+=val; printf "%s %d\n", $1, val}' a | grep task_struct
task_struct 27080016

looks only slightly more reasonable because it is still way too high and
it doesn't seem to match the number of processes you see.

What is the kernel that you are using and what config?

> [root@riblp3 ~]# ps haux|wc -l
> 671
> [root@riblp3 ~]# ps haux|awk '{print $11}'|grep '^\['|wc -l
> 620
> [root@riblp3 ~]#
>
[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/