Re: [PATCH] mm,oom: Re-enable OOM killer using timers.
From: David Rientjes
Date: Tue Jan 19 2016 - 18:14:03 EST
On Fri, 15 Jan 2016, Tetsuo Handa wrote:
> Leaving a system OOM-livelocked forever is very very annoying thing.
Agreed.
> My goal is to ask the OOM killer not to toss the OOM killer's duty away.
> What is important for me is that the OOM killer takes next action when
> current action did not solve the OOM situation.
>
What is the "next action" when there are no more processes on your system,
or attached to your memcg hierarchy, that are killable?
Of course your proposal offers no solution for that. Extend it further:
what is the "next action" when the process holding the mutex needed by the
victim is oom disabled?
I don't think it's in the best interest of the user to randomly kill
processes until one exits and implicitly hoping that one of your
selections will be able to do so (your notion of "pick and pray").
> > These additional kills can result
> > in the same livelock that is already problematic, and killing additional
> > processes has made the situation worse since memory reserves are more
> > depleted.
>
> Why are you still assuming that memory reserves are more depleted if we kill
> additional processes? We are introducing the OOM reaper which can compensate
> memory reserves if we kill additional processes. We can make the OOM reaper
> update oom priority of all processes that use a mm the OOM killer chose
> ( http://lkml.kernel.org/r/201601131915.BCI35488.FHSFQtVMJOOOLF@xxxxxxxxxxxxxxxxxxx )
> so that we can help the OOM reaper compensate memory reserves by helping
> the OOM killer to select a different mm.
>
We are not adjusting the selection heuristic, which is already
determinisitic and people use to fine tune through procfs, for what the
oom reaper can free.
Even if you can free memory immediately, there is no guarantee that a
process holding a mutex needed for the victim to exit will be able to
allocate from that memory. Continuing to kill more and more processes may
eventually solve the situation which simply granting access to memory
reserves temporarily would have also solved, but at the cost of, well,
many processes.
The final solution may combine both approaches, which are the only real
approaches on how to make forward progress. We could first try allowing
temporary access to memory reserves when a livelock has been detected,
similar to my patch, and then fallback to killing additional processes
since the oom reaper should be able to at least free some of that memory
immediately, if it fails.
However, I think the best course of action at the moment is to review and
get the oom reaper merged, if applicable, since it should greatly aid this
issue and then look at livelock issues as they arise once it is deployed.
I'm not enthusiastic about adding additional heuristics and tunables for
theoretical issues that may arise, especially considering the oom reaper
is not even upstream.