Re: [v9 3/5] mm, oom: cgroup-aware OOM killer

From: Roman Gushchin
Date: Tue Oct 03 2017 - 10:38:51 EST


On Tue, Oct 03, 2017 at 04:22:46PM +0200, Michal Hocko wrote:
> On Tue 03-10-17 15:08:41, Roman Gushchin wrote:
> > On Tue, Oct 03, 2017 at 03:36:23PM +0200, Michal Hocko wrote:
> [...]
> > > I guess we want to inherit the value on the memcg creation but I agree
> > > that enforcing parent setting is weird. I will think about it some more
> > > but I agree that it is saner to only enforce per memcg value.
> >
> > I'm not against, but we should come up with a good explanation, why we're
> > inheriting it; or not inherit.
>
> Inheriting sounds like a less surprising behavior. Once you opt in for
> oom_group you can expect that descendants are going to assume the same
> unless they explicitly state otherwise.
>
> [...]
> > > > > > @@ -962,6 +968,48 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
> > > > > > __oom_kill_process(victim);
> > > > > > }
> > > > > >
> > > > > > +static int oom_kill_memcg_member(struct task_struct *task, void *unused)
> > > > > > +{
> > > > > > + if (!tsk_is_oom_victim(task)) {
> > > > >
> > > > > How can this happen?
> > > >
> > > > We do start with killing the largest process, and then iterate over all tasks
> > > > in the cgroup. So, this check is required to avoid killing tasks which are
> > > > already in the termination process.
> > >
> > > Do you mean we have tsk_is_oom_victim && MMF_OOM_SKIP == T?
> >
> > No, just tsk_is_oom_victim. We're are killing the biggest task, and then _all_
> > tasks. This is a way to skip the biggest task, and do not kill it again.
>
> OK, I have missed that part. Why are we doing that actually? Why don't
> we simply do
> /* If oom_group flag is set, kill all belonging tasks */
> if (mem_cgroup_oom_group(oc->chosen_memcg))
> mem_cgroup_scan_tasks(oc->chosen_memcg, oom_kill_memcg_member,
> NULL);
>
> we are going to kill all the tasks anyway.

Well, the idea behind was that killing the biggest process give us better
chances to get out of global memory shortage and guarantee forward progress.
I can drop it, if it considered to be excessive.