Re: [PATCH 3/3] mm, oom: introduce memory.oom.group

From: Johannes Weiner
Date: Wed Aug 01 2018 - 13:48:06 EST


On Mon, Jul 30, 2018 at 11:01:00AM -0700, Roman Gushchin wrote:
> For some workloads an intervention from the OOM killer
> can be painful. Killing a random task can bring
> the workload into an inconsistent state.
>
> Historically, there are two common solutions for this
> problem:
> 1) enabling panic_on_oom,
> 2) using a userspace daemon to monitor OOMs and kill
> all outstanding processes.
>
> Both approaches have their downsides:
> rebooting on each OOM is an obvious waste of capacity,
> and handling all in userspace is tricky and requires
> a userspace agent, which will monitor all cgroups
> for OOMs.
>
> In most cases an in-kernel after-OOM cleaning-up
> mechanism can eliminate the necessity of enabling
> panic_on_oom. Also, it can simplify the cgroup
> management for userspace applications.
>
> This commit introduces a new knob for cgroup v2 memory
> controller: memory.oom.group. The knob determines
> whether the cgroup should be treated as a single
> unit by the OOM killer. If set, the cgroup and its
> descendants are killed together or not at all.
>
> To determine which cgroup has to be killed, we do
> traverse the cgroup hierarchy from the victim task's
> cgroup up to the OOMing cgroup (or root) and looking
> for the highest-level cgroup with memory.oom.group set.
>
> Tasks with the OOM protection (oom_score_adj set to -1000)
> are treated as an exception and are never killed.
>
> This patch doesn't change the OOM victim selection algorithm.
>
> Signed-off-by: Roman Gushchin <guro@xxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Cc: Tejun Heo <tj@xxxxxxxxxx>

The semantics make sense to me and the code is straight-forward. With
Michal's other feedback incorporated, please feel free to add:

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>