Re: [patch -mm 0/4] mm, memcg: introduce oom policies
From: Roman Gushchin
Date: Wed Jan 17 2018 - 06:46:45 EST
On Tue, Jan 16, 2018 at 06:14:58PM -0800, David Rientjes wrote:
> There are three significant concerns about the cgroup aware oom killer as
> it is implemented in -mm:
>
> (1) allows users to evade the oom killer by creating subcontainers or
> using other controllers since scoring is done per cgroup and not
> hierarchically,
>
> (2) does not allow the user to influence the decisionmaking, such that
> important subtrees cannot be preferred or biased, and
>
> (3) unfairly compares the root mem cgroup using completely different
> criteria than leaf mem cgroups and allows wildly inaccurate results
> if oom_score_adj is used.
>
> This patchset aims to fix (1) completely and, by doing so, introduces a
> completely extensible user interface that can be expanded in the future.
>
> It eliminates the mount option for the cgroup aware oom killer entirely
> since it is now enabled through the root mem cgroup's oom policy.
>
> It eliminates a pointless tunable, memory.oom_group, that unnecessarily
> pollutes the mem cgroup v2 filesystem and is invalid when cgroup v2 is
> mounted with the "groupoom" option.
You're introducing a new oom_policy knob, which has two separate sets
of possible values for the root and non-root cgroups. I don't think
it aligns with the existing cgroup v2 design.
For the root cgroup it works exactly as mount option, and both "none"
and "cgroup" values have no meaning outside of the root cgroup. We can
discuss if a knob on root cgroup is better than a mount option, or not
(I don't think so), but it has nothing to do with oom policy as you
define it for non-root cgroups.
For non-root cgroups you're introducing "all" and "tree", and the _only_
difference is that in the "all" mode all processes will be killed, rather
than the biggest in the "tree". I find these names confusing, in reality
it's more "evaluate together and kill all" and "evaluate together and
kill one".
So, it's not really the fully hierarchical approach, which I thought,
you were arguing for. You can easily do the same with adding the third
value to the memory.groupoom knob, as I've suggested earlier (say, "disable,
"kill" and "evaluate"), and will be much less confusing.
Thanks!
Roman