Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks

From: David Rientjes
Date: Mon Sep 21 2015 - 19:33:38 EST


On Sat, 19 Sep 2015, Tetsuo Handa wrote:

> I think that use of ALLOC_NO_WATERMARKS via TIF_MEMDIE is the underlying
> cause. ALLOC_NO_WATERMARKS via TIF_MEMDIE is intended for terminating the
> OOM victim task as soon as possible, but it turned out that it will not
> work if there is invisible lock dependency. Therefore, why not to give up
> "there should be only up to 1 TIF_MEMDIE task" rule?
>

I don't see the connection between TIF_MEMDIE and ALLOC_NO_WATERMARKS
being problematic. It is simply the mechanism by which we give oom killed
processes access to memory reserves if they need it. I believe you are
referring only to the oom killer stalling when it finds an oom victim.

> What this patch (and many others posted in various forms many times over
> past years) does is to give up "there should be only up to 1 TIF_MEMDIE
> task" rule. I think that we need to tolerate more than 1 TIF_MEMDIE tasks
> and somehow manage in a way memory reserves will not deplete.
>

Your proposal, which I mostly agree with, tries to kill additional
processes so that they allocate and drop the lock that the original victim
depends on. My approach, from
http://marc.info/?l=linux-kernel&m=144010444913702, is the same, but
without the killing. It's unecessary to kill every process on the system
that is depending on the same lock, and we can't know which processes are
stalling on that lock and which are not.

I think it's much easier to simply identify such a situation where a
process has not exited in a timely manner and then provide processes
access to memory reserves without being killed. We hope that the victim
will have queued its mutex_lock() and allocators that are holding the lock
will drop it after successfully utilizing memory reserves.

We can mitigate immediate depletion of memory reserves by requiring all
allocators to reclaim (or compact) and calling the oom killer to identify
the timeout before granting access to memory reserves for a single
allocation before schedule_timeout_killable(1) and returning.

I don't know of any alternative solutions where we can guarantee that
memory reserves cannot be depleted unless memory reserves are 100% of
memory.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/