Re: [patch -mm v2 2/3] mm, memcg: replace cgroup aware oom killer mount option with tunable

From: Roman Gushchin
Date: Tue Jan 30 2018 - 07:14:00 EST


On Tue, Jan 30, 2018 at 01:08:52PM +0100, Michal Hocko wrote:
> On Tue 30-01-18 11:58:51, Roman Gushchin wrote:
> > On Tue, Jan 30, 2018 at 09:54:45AM +0100, Michal Hocko wrote:
> > > On Mon 29-01-18 11:11:39, Tejun Heo wrote:
> >
> > Hello, Michal!
> >
> > > diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> > > index 2eaed1e2243d..67bdf19f8e5b 100644
> > > --- a/Documentation/cgroup-v2.txt
> > > +++ b/Documentation/cgroup-v2.txt
> > > @@ -1291,8 +1291,14 @@ This affects both system- and cgroup-wide OOMs. For a cgroup-wide OOM
> > > the memory controller considers only cgroups belonging to the sub-tree
> > > of the OOM'ing cgroup.
> > >
> > > -The root cgroup is treated as a leaf memory cgroup, so it's compared
> > > -with other leaf memory cgroups and cgroups with oom_group option set.
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > IMO, this statement is important. Isn't it?
> >
> > > +Leaf cgroups are compared based on their cumulative memory usage. The
> > > +root cgroup is treated as a leaf memory cgroup as well, so it's
> > > +compared with other leaf memory cgroups. Due to internal implementation
> > > +restrictions the size of the root cgroup is a cumulative sum of
> > > +oom_badness of all its tasks (in other words oom_score_adj of each task
> > > +is obeyed). Relying on oom_score_adj (appart from OOM_SCORE_ADJ_MIN)
> > > +can lead to overestimating of the root cgroup consumption and it is
> >
> > Hm, and underestimating too. Also OOM_SCORE_ADJ_MIN isn't any different
> > in this case. Say, all tasks except a small one have OOM_SCORE_ADJ set to
> > -999, this means the root croup has extremely low chances to be elected.
> >
> > > +therefore discouraged. This might change in the future, though.
> >
> > Other than that looks very good to me.
>
> This?
>
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index 2eaed1e2243d..34ad80ee90f2 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1291,8 +1291,15 @@ This affects both system- and cgroup-wide OOMs. For a cgroup-wide OOM
> the memory controller considers only cgroups belonging to the sub-tree
> of the OOM'ing cgroup.
>
> -The root cgroup is treated as a leaf memory cgroup, so it's compared
> -with other leaf memory cgroups and cgroups with oom_group option set.
> +Leaf cgroups and cgroups with oom_group option set are compared based
> +on their cumulative memory usage. The root cgroup is treated as a
> +leaf memory cgroup as well, so it's compared with other leaf memory
> +cgroups. Due to internal implementation restrictions the size of
> +the root cgroup is a cumulative sum of oom_badness of all its tasks
> +(in other words oom_score_adj of each task is obeyed). Relying on
> +oom_score_adj (appart from OOM_SCORE_ADJ_MIN) can lead to over or
> +underestimating of the root cgroup consumption and it is therefore
> +discouraged. This might change in the future, though.

Acked-by: Roman Gushchin <guro@xxxxxx>

Thank you!