Re: [v9 3/5] mm, oom: cgroup-aware OOM killer

From: Michal Hocko
Date: Wed Oct 04 2017 - 05:29:50 EST

On Tue 03-10-17 07:35:59, Tejun Heo wrote:
> Hello, Michal.
> On Tue, Oct 03, 2017 at 04:22:46PM +0200, Michal Hocko wrote:
> > On Tue 03-10-17 15:08:41, Roman Gushchin wrote:
> > > On Tue, Oct 03, 2017 at 03:36:23PM +0200, Michal Hocko wrote:
> > [...]
> > > > I guess we want to inherit the value on the memcg creation but I agree
> > > > that enforcing parent setting is weird. I will think about it some more
> > > > but I agree that it is saner to only enforce per memcg value.
> > >
> > > I'm not against, but we should come up with a good explanation, why we're
> > > inheriting it; or not inherit.
> >
> > Inheriting sounds like a less surprising behavior. Once you opt in for
> > oom_group you can expect that descendants are going to assume the same
> > unless they explicitly state otherwise.
> Here's a counter example.
> Let's say there's a container which hosts one main application, and
> the container shares its host with other containers.
> * Let's say the container is a regular containerized OS instance and
> can't really guarantee system integrity if one its processes gets
> randomly killed.
> * However, the application that it's running inside an isolated cgroup
> is more intelligent and composed of multiple interchangeable
> processes and can treat killing of a random process as partial
> capacity loss.
> When the host is setting up the outer container, it doesn't
> necessarily know whether the containerized environment would be able
> to handle partial OOM kills or not. It's akin to panic_on_oom setting
> at system level - it's the containerized instance itself which knows
> whether it can handle partial OOM kills or not. This is why this knob
> should be delegatable.
> Now, the container itself has group OOM set and the isolated main
> application is starting up. It obviously wants partial OOM kills
> rather than group killing. This is the same principle. The
> application which is being contained in the cgroup is the one which
> knows how it can handle OOM conditions, not the outer environment, so
> it obviously needs to be able to set the configuration it wants.

Yes this makes a lot of sense. On the other hand we used to copy other
reclaim specific atributes like swappiness and oom_kill_disable.

I guess we should be OK with "non-hierarchical" behavior when it is
documented properly so that there are surpasses.

Michal Hocko