Re: [PATCH 3/2] oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space

From: Tetsuo Handa
Date: Thu Jan 28 2016 - 17:26:51 EST

Michal Hocko wrote:
> On Thu 28-01-16 20:24:36, Tetsuo Handa wrote:
> [...]
> > I like the OOM reaper approach but I can't agree on merging the OOM reaper
> > without providing a guaranteed last resort at the same time. If you do want
> > to start the OOM reaper as simple as possible (without being bothered by
> > a lot of possible corner cases), please pursue a guaranteed last resort
> > at the same time.
> I am getting tired of this level of argumentation. oom_reaper in its
> current form is a step forward. I have acknowledged there are possible
> improvements doable on top but I do not see them necessary for the core
> part being merged. I am not trying to rush this in because I am very
> well aware of how subtle and complex all the interactions might be.
> So please stop your "we must have it all at once" attitude. This is
> nothing we have to rush in. We are not talking about a regression which
> has to be absolutely fixed in few days.

I'm not asking you to merge a perfect version of oom_reaper from the
beginning. I know it is too difficult. Instead, I'm asking you to allow
using timeout based approaches (shown below) as temporarily workaround
because there are environments which cannot wait for oom_reaper to become
enough reliable. Would you please reply to the thread which proposed a
guaranteed last resort (shown below)?

Tetsuo Handa wrote:
> I consider phases for managing system-wide OOM events as follows.
> (1) Design and use a system with appropriate memory capacity in mind.
> (2) When (1) failed, the OOM killer is invoked. The OOM killer selects
> an OOM victim and allow that victim access to memory reserves by
> setting TIF_MEMDIE to it.
> (3) When (2) did not solve the OOM condition, start allowing all tasks
> access to memory reserves by your approach.
> (4) When (3) did not solve the OOM condition, start selecting more OOM
> victims by my approach.
> (5) When (4) did not solve the OOM condition, trigger the kernel panic.