Re: [RESEND v12 0/6] cgroup-aware OOM killer

From: Johannes Weiner
Date: Fri Oct 27 2017 - 16:06:09 EST


On Thu, Oct 26, 2017 at 02:03:41PM -0700, David Rientjes wrote:
> On Thu, 26 Oct 2017, Johannes Weiner wrote:
>
> > > The nack is for three reasons:
> > >
> > > (1) unfair comparison of root mem cgroup usage to bias against that mem
> > > cgroup from oom kill in system oom conditions,
> > >
> > > (2) the ability of users to completely evade the oom killer by attaching
> > > all processes to child cgroups either purposefully or unpurposefully,
> > > and
> > >
> > > (3) the inability of userspace to effectively control oom victim
> > > selection.
> >
> > My apologies if my summary was too reductionist.
> >
> > That being said, the arguments you repeat here have come up in
> > previous threads and been responded to. This doesn't change my
> > conclusion that your NAK is bogus.
>
> They actually haven't been responded to, Roman was working through v11 and
> made a change on how the root mem cgroup usage was calculated that was
> better than previous iterations but still not an apples to apples
> comparison with other cgroups. The problem is that it the calculation for
> leaf cgroups includes additional memory classes, so it biases against
> processes that are moved to non-root mem cgroups. Simply creating mem
> cgroups and attaching processes should not independently cause them to
> become more preferred: it should be a fair comparison between the root mem
> cgroup and the set of leaf mem cgroups as implemented. That is very
> trivial to do with hierarchical oom cgroup scoring.

There is absolutely no value in your repeating the same stuff over and
over again without considering what other people are telling you.

Hierarchical oom scoring has other downsides, and most of us agree
that they aren't preferable over the differences in scoring the root
vs scoring other cgroups - in particular because the root cannot be
controlled, doesn't even have local statistics, and so is unlikely to
contain important work on a containerized system. Getting the ballpark
right for the vast majority of usecases is more than good enough here.

> Since the ability of userspace to control oom victim selection is not
> addressed whatsoever by this patchset, and the suggested method cannot be
> implemented on top of this patchset as you have argued because it requires
> a change to the heuristic itself, the patchset needs to become complete
> before being mergeable.

It is complete. It just isn't a drop-in replacement for what you've
been doing out-of-tree for years. Stop making your problem everybody
else's problem.

You can change the the heuristics later, as you have done before. Or
you can add another configuration flag and we can phase out the old
mode, like we do all the time.