Re: OOM Killer and add_to_page_cache_locked

From: Michal Hocko
Date: Tue Jun 11 2013 - 05:49:42 EST


On Tue 11-06-13 10:35:01, Piotr Nowojski wrote:
> W dniu 07.06.2013 17:36, Michal Hocko pisze:
> >On Fri 07-06-13 17:13:55, Piotr Nowojski wrote:
> >>W dniu 06.06.2013 17:57, Michal Hocko pisze:
> >>>>>In our system we have hit some very annoying situation (bug?) with
> >>>>>cgroups. I'm writing to you, because I have found your posts on
> >>>>>mailing lists with similar topic. Maybe you could help us or point
> >>>>>some direction where to look for/ask.
> >>>>>
> >>>>>We have system with ~15GB RAM (+2GB SWAP), and we are running ~10
> >>>>>heavy IO processes. Each process is using constantly 200-210MB RAM
> >>>>>(RSS) and a lot of page cache. All processes are in cgroup with
> >>>>>following limits:
> >>>>>
> >>>>>/sys/fs/cgroup/taskell2 $ cat memory.limit_in_bytes
> >>>>>memory.memsw.limit_in_bytes
> >>>>>14183038976
> >>>>>15601344512
> >>>I assume that memory.use_hierarchy is 1, right?
> >>System has been rebooted since last test, so I can not guarantee
> >>that it was set for 100%, but it should have been. Currently I'm
> >>rerunning this scenario that lead to the described problem with:
> >>
> >>/sys/fs/cgroup/taskell2# cat memory.use_hierarchy ../memory.use_hierarchy
> >>1
> >>0
> >OK, good. Your numbers suggeste that the hierachy _is_ in use. I just
> >wanted to be 100% sure.
> >
>
> I don't know what has solved this problem, but we weren't able to
> reproduce this problem during whole weekend. Most likely there was
> some problem with our code initializing cgroups configuration
> regarding use_hierarchy (can writing 1 to memory.use_hierarchy
> silently fail?).

No it complains with EINVAL or EBUSY but maybe you have tripped over
bash built-in echo which doesn't return error codes properly AFAIR.
Always make sure you use /bin/echo. If you are doing initialization in
parallel then this in-deed might race and use_hierarchy fail to set to 1
if any children have been created in the mean time.
But again, your numbers suggested that the parent group collected
charges from children so this would be rather unexpected.

> I have added assertions for checking this parameter before starting
> and after initialization of our application. If problem reoccur, I
> will proceed as you suggested before - trying latest kernels.
>
> Thanks, Piotr Nowojski

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/