Re: [patch] mm, oom: prevent additional oom kills before memory is freed

From: Michal Hocko
Date: Fri Jun 16 2017 - 04:06:32 EST

On Thu 15-06-17 15:42:23, David Rientjes wrote:
> On Fri, 16 Jun 2017, Michal Hocko wrote:
> > I am sorry but I have really hard to make the oom reaper a reliable way
> > to stop all the potential oom lockups go away. I do not want to
> > reintroduce another potential lockup now.
> Please show where this "potential lockup" ever existed in a bug report or
> a testcase?

I am not aware of any specific bug report. But the main point of the
reaper is to close all _possible_ lockups due to oom victim being stuck
somewhere. exit_aio waits for all kiocbs. Can we guarantee that none
of them will depend on an allocation (directly or via a lock chain) to
proceed? Likewise ksm_exit/khugepaged_exit depend on mmap_sem for write
to proceed. Are we _guaranteed_ nobody can hold mmap_sem for read at
that time and depend on an allocation? Can we guarantee that __mmput
path will work without any depency on allocation in future?

> I have never seen __mmput() block when trying to free the
> memory it maps.
> > I also do not see why any
> > solution should be rushed into. I have proposed a way to go and unless
> > it is clear that this is not a way forward then I simply do not agree
> > with any partial workarounds or shortcuts.
> This is not a shortcut, it is a bug fix. 4.12 kills 1-4 processes
> unnecessarily as a result of setting MMF_OOM_SKIP incorrectly before the
> mm's memory can be freed. If you have not seen this issue before, which
> is why you asked if I ever observed it in practice, then you have not
> stress tested oom reaping. It is very observable and reproducible.

I am not questioning that it works for your particular test. I just
argue that it reduces the robustness of the oom reaper because it allows
oom victim to leave the reaper without MMF_OOM_SKIP set and that is the
core concept to guarantee a forward progress. So we should think about
something more appropriate.

> I do
> not agree that adding additional and obscure locking into __mmput() is the
> solution to what is plainly and obviously fixed with this simple patch.

Well, __mmput path already depends on the mmap_sem for write. So this is
not a new concept. I am not saying using mmap_sem is the only way. I
will think about that more.

> 4.12 needs to stop killing 2-5 processes on every oom condition instead of
> 1.

Believe me, I am not dismissing the issue nor the fact it _has_ to be
fixed. I just disagree we should make the oom reaper less robust.

Michal Hocko