Re: [v9 3/5] mm, oom: cgroup-aware OOM killer

From: Tejun Heo
Date: Tue Oct 03 2017 - 10:36:10 EST


Hello, Michal.

On Tue, Oct 03, 2017 at 04:22:46PM +0200, Michal Hocko wrote:
> On Tue 03-10-17 15:08:41, Roman Gushchin wrote:
> > On Tue, Oct 03, 2017 at 03:36:23PM +0200, Michal Hocko wrote:
> [...]
> > > I guess we want to inherit the value on the memcg creation but I agree
> > > that enforcing parent setting is weird. I will think about it some more
> > > but I agree that it is saner to only enforce per memcg value.
> >
> > I'm not against, but we should come up with a good explanation, why we're
> > inheriting it; or not inherit.
>
> Inheriting sounds like a less surprising behavior. Once you opt in for
> oom_group you can expect that descendants are going to assume the same
> unless they explicitly state otherwise.

Here's a counter example.

Let's say there's a container which hosts one main application, and
the container shares its host with other containers.

* Let's say the container is a regular containerized OS instance and
can't really guarantee system integrity if one its processes gets
randomly killed.

* However, the application that it's running inside an isolated cgroup
is more intelligent and composed of multiple interchangeable
processes and can treat killing of a random process as partial
capacity loss.

When the host is setting up the outer container, it doesn't
necessarily know whether the containerized environment would be able
to handle partial OOM kills or not. It's akin to panic_on_oom setting
at system level - it's the containerized instance itself which knows
whether it can handle partial OOM kills or not. This is why this knob
should be delegatable.

Now, the container itself has group OOM set and the isolated main
application is starting up. It obviously wants partial OOM kills
rather than group killing. This is the same principle. The
application which is being contained in the cgroup is the one which
knows how it can handle OOM conditions, not the outer environment, so
it obviously needs to be able to set the configuration it wants.

Thanks.

--
tejun