Re: [PATCH] mm,oom: Re-enable OOM killer using timers.

From: David Rientjes
Date: Thu Jan 14 2016 - 18:10:05 EST

On Thu, 14 Jan 2016, Johannes Weiner wrote:

> > This is where me and you disagree; the goal should not be to continue to
> > oom kill more and more processes since there is no guarantee that further
> > kills will result in forward progress. These additional kills can result
> > in the same livelock that is already problematic, and killing additional
> > processes has made the situation worse since memory reserves are more
> > depleted.
> >
> > I believe what is better is to exhaust reclaim, check if the page
> > allocator is constantly looping due to waiting for the same victim to
> > exit, and then allowing that allocation with memory reserves, see the
> > attached patch which I have proposed before.
> If giving the reserves to another OOM victim is bad, how is giving
> them to the *allocating* task supposed to be better?

Unfortunately, due to rss and oom priority, it is possible to repeatedly
select processes which are all waiting for the same mutex. This is
possible when loading shards, for example, and all processes have the same
oom priority and are livelocked on i_mutex which is the most common
occurrence in our environments. The livelock came about because we
selected a process that could not make forward progress, there is no
guarantee that we will not continue to select such processes.

Giving access to the memory allocator eventually allows all allocators to
successfully allocate, giving the holder of i_mutex the ability to
eventually drop it. This happens in a very rate-limited manner depending
on how you define when the page allocator has looped enough waiting for
the same process to exit in my patch.

In the past, we have even increased the scheduling priority of oom killed
processes so that they have a greater likelihood of picking up i_mutex and

> We need to make the OOM killer conclude in a fixed amount of time, no
> matter what happens. If the system is irrecoverably deadlocked on
> memory it needs to panic (and reboot) so we can get on with it. And
> it's silly to panic while there are still killable tasks available.

What is the solution when there are no additional processes that may be
killed? It is better to give access to memory reserves so a single
stalling allocation can succeed so the livelock can be resolved rather
than panicking.