On Tue, 23 May 2017, Konstantin Khlebnikov wrote:
This is worth addition. Let's call it "oom_victim" for short.
It allows to locate leaky part if they are spread over sub-containers within
common limit.
But doesn't tell which limit caused this kill. For hierarchical limits this
might be not so easy.
I think oom_kill better suits for automatic actions - restart affected
hierarchy, increase limits, e.t.c.
But oom_victim allows to determine container affected by global oom killer.
So, probably it's worth to merge them together and increment oom_kill by
global killer for victim memcg:
if (!is_memcg_oom(oc)) {
count_vm_event(OOM_KILL);
mem_cgroup_count_vm_event(mm, OOM_KILL);
} else
mem_cgroup_event(oc->memcg, OOM_KILL);
Our complete solution is that we have a complementary
memory.oom_kill_control that allows users to register for eventfd(2)
notification when the kernel oom killer kills a victim, but this is
because we have had complete support for userspace oom handling for years.
When read, it exports three classes of information:
- the "total" (hierarchical) and "local" (memcg specific) number of oom
kills for system oom conditions (overcommit),
- the "total" and "local" number of oom kills for memcg oom conditions,
and
- the total number of processes in the hierarchy where an oom victim was
reaped successfully and unsuccessfully.
One benefit of this is that it prevents us from having to scrape the
kernel log for oom events which has been troublesome in the past, but
userspace can easily do so when the eventfd triggers for the kill
notification.