Re: OOM Killer and add_to_page_cache_locked

From: Michal Hocko
Date: Fri Jun 07 2013 - 11:36:40 EST


On Fri 07-06-13 17:13:55, Piotr Nowojski wrote:
> W dniu 06.06.2013 17:57, Michal Hocko pisze:
> >>>In our system we have hit some very annoying situation (bug?) with
> >>>cgroups. I'm writing to you, because I have found your posts on
> >>>mailing lists with similar topic. Maybe you could help us or point
> >>>some direction where to look for/ask.
> >>>
> >>>We have system with ~15GB RAM (+2GB SWAP), and we are running ~10
> >>>heavy IO processes. Each process is using constantly 200-210MB RAM
> >>>(RSS) and a lot of page cache. All processes are in cgroup with
> >>>following limits:
> >>>
> >>>/sys/fs/cgroup/taskell2 $ cat memory.limit_in_bytes
> >>>memory.memsw.limit_in_bytes
> >>>14183038976
> >>>15601344512
> >I assume that memory.use_hierarchy is 1, right?
> System has been rebooted since last test, so I can not guarantee
> that it was set for 100%, but it should have been. Currently I'm
> rerunning this scenario that lead to the described problem with:
>
> /sys/fs/cgroup/taskell2# cat memory.use_hierarchy ../memory.use_hierarchy
> 1
> 0

OK, good. Your numbers suggeste that the hierachy _is_ in use. I just
wanted to be 100% sure.

[...]
> >The core thing to find out is why the hard limit reclaim is not able to
> >free anything. Unfortunatelly we do not have memcg reclaim statistics so
> >it would be a bit harder. I would start with the above patch first and
> >then I can prepare some debugging patches for you.
> I will try 3.6 (probably 3.7) kernel after weekend - unfortunately

I would simply try 3.9 (stable) and skip those two.

> repeating whole scenario is taking 10-30 hours because of very
> slowly growing page cache.

OK, this is good to know.

> >Also does 3.4 vanila (or the stable kernel) behave the same way? Is the
> >current vanilla behaving the same way?
> I don't know, we are using standard kernel that comes from Ubuntu.

yes, but I guess ubuntu, like any other distro puts some pathces on top
of vanilla kernel.

> >Finally, have you seen the issue for a longer time or it started showing
> >up only now?
> >
> This system is very new. We have started testing scenario which
> triggered OOM something like one week ago and we have immediately
> hit this issue. Previously, with different scenarios and different
> memory usage by processes we didn't have this issue.

OK

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/