Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

From: David Rientjes
Date: Thu Aug 31 2017 - 16:02:03 EST


On Thu, 31 Aug 2017, Roman Gushchin wrote:

> So, it looks to me that we're close to an acceptable version,
> and the only remaining question is the default behavior
> (when oom_group is not set).
>

Nit: without knowledge of the implementation, I still don't think I would
know what an "out of memory group" is. Out of memory doesn't necessarily
imply a kill. I suggest oom_kill_all or something that includes the verb.

> Michal suggests to ignore non-oom_group memcgs, and compare tasks with
> memcgs with oom_group set. This makes the whole thing completely opt-in,
> but then we probably need another knob (or value) to select between
> "select memcg, kill biggest task" and "select memcg, kill all tasks".

It seems like that would either bias toward or bias against cgroups that
opt-in. I suggest comparing memory cgroups at each level in the hierarchy
based on your new badness heuristic, regardless of any tunables it has
enabled. Then kill either the largest process or all the processes
attached depending on oom_group or oom_kill_all.

> Also, as the whole thing is based on comparison between processes and
> memcgs, we probably need oom_priority for processes.

I think with the constraints of cgroup v2 that a victim memcg must first
be chosen, and then a victim process attached to that memcg must be chosen
or all eligible processes attached to it be killed, depending on the
tunable.

The simplest and most clear way to define this, in my opinion, is to
implement a heuristic that compares sibling memcgs based on usage, as you
have done. This can be overridden by a memory.oom_priority that userspace
defines, and is enough support for them to change victim selection (no
mount option needed, just set memory.oom_priority). Then kill the largest
process or all eligible processes attached. We only use per-process
priority to override process selection compared to sibling memcgs, but
with cgroup v2 process constraints it doesn't seem like that is within the
scope of your patchset.