Re: [PATCH 5/5] mm, oom_reaper: implement OOM victims queuing

From: Michal Hocko
Date: Thu Feb 04 2016 - 09:54:06 EST


On Thu 04-02-16 19:49:29, Tetsuo Handa wrote:
[...]
> I think we want to rewrite this patch's description from a different point
> of view.
>
> As of "[PATCH 1/5] mm, oom: introduce oom reaper", we assumed that we try to
> manage OOM livelock caused by system-wide OOM events using the OOM reaper.
> Therefore, the OOM reaper had high scheduling priority and we considered side
> effect of the OOM reaper as a reasonable constraint.
>
> But as the discussion went by, we started to try to manage OOM livelock
> caused by non system-wide OOM events (e.g. memcg OOM) using the OOM reaper.
> Therefore, the OOM reaper now has normal scheduling priority. For non
> system-wide OOM events, side effect of the OOM reaper might not be a
> reasonable constraint. Some administrator might expect that the OOM reaper
> does not break coredumping unless the system is under system-wide OOM events.

I am willing to discuss this as an option after we actually hear about a
_real_ usecase.

[...]

> But if we consider non system-wide OOM events, it is not very unlikely to hit
> this race. This queue is useful for situations where memcg1 and memcg2 hit
> memcg OOM at the same time and victim1 in memcg1 cannot terminate immediately.

This can happen of course but the likelihood is _much_ smaller without
the global OOM because the memcg OOM killer is invoked from a lockless
context so the oom context cannot block the victim to proceed.

> I expect parallel reaping (shown below) because there is no need to serialize
> victim tasks (e.g. wait for reaping victim1 in memcg1 which can take up to
> 1 second to complete before start reaping victim2 in memcg2) if we implement
> this queue.

I would really prefer to go a simpler way first and extend the code when
we see the current approach insufficient for real life loads. Please do
not get me wrong, of course the code can be enhanced in many different
ways and optimize for lots of pathological cases but I really believe
that we should start with correctness first and only later care about
optimizing corner cases. Realistically, who really cares about
oom_reaper acting on an Nth completely stuck tasks after N seconds?
For now, my target is to guarantee that oom_reaper will _eventually_
process a queued task and reap its memory if the target hasn't exited
yet. I do not think this is an unreasonable goal...

Thanks!
--
Michal Hocko
SUSE Labs