Re: can't oom-kill zap the victim's memory?

From: Tetsuo Handa
Date: Fri Oct 02 2015 - 09:06:18 EST


Michal Hocko wrote:
> On Mon 28-09-15 15:24:06, David Rientjes wrote:
> > I agree that i_mutex seems to be one of the most common offenders.
> > However, I'm not sure I understand why holding it while trying to allocate
> > infinitely for an order-0 allocation is problematic wrt the proposed
> > kthread.
>
> I didn't say it would be problematic. We are talking past each other
> here. All I wanted to say was that a separate kernel oom thread wouldn't
> _help_ with the lock dependencies.
>
Oops. I misunderstood that you are skeptical about memory unmapping approach
due to lock dependency. But rather, you are skeptical about use of a dedicated
kernel thread for memory unmapping approach.

> > The kthread itself need only take mmap_sem for read. If all
> > threads sharing the mm with a victim have been SIGKILL'd, they should get
> > TIF_MEMDIE set when reclaim fails and be able to allocate so that they can
> > drop mmap_sem.
>
> which is the case if the direct oom context used trylock...
> So just to make it clear. I am not objecting a specialized oom kernel
> thread. It would work as well. I am just not convinced that it is really
> needed because the direct oom context can use trylock and do the same
> work directly.

Well, I think it depends on from where we call memory unmapping code.

The first candidate is oom_kill_process() because it is a location where
the mm struct to unmap is determined. But since select_bad_process()
aborts upon encountering a TIF_MEMDIE task, we will fail to call memory
unmapping code again if the first down_trylock(&mm->mmap_sem) attempt in
oom_kill_process() failed. (Here I assumed that we allow all OOM victims
to access memory reserves so that subsequent down_trylock(&mm->mmap_sem)
attempts could succeed.)

The second candidate is select_bad_process() because it is a location
where we can call memory unmapping code again upon encountering a
TIF_MEMDIE task.

The third candidate is caller of out_of_memory() because it is a location
where we can call memory unmapping code again even when the OOM victims
are blocked. (Our discussion seems to assume that TIF_MEMDIE tasks can
make forward progress and die. But since TIF_MEMDIE tasks might encounter
unkillable locks after returning from allocation (e.g.
http://lkml.kernel.org/r/201509290118.BCJ43256.tSFFFMOLHVOJOQ@xxxxxxxxxxxxxxxxxxx ),
it will be safer not to assume that out_of_memory() can be always called.
So, I thought that a dedicated kernel thread makes it easy to call memory
unmapping code periodically again and again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/