Re: [RFC 3/4] mm, oom: do not rely on TIF_MEMDIE for exit_oom_victim

From: Michal Hocko
Date: Tue Sep 13 2016 - 03:21:18 EST


On Tue 13-09-16 15:25:51, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Sat 10-09-16 21:55:49, Tetsuo Handa wrote:
> > > Tetsuo Handa wrote:
> > > > If you worry about tasks which are sitting on a memory which is not
> > > > reclaimable by the oom reaper, why you don't worry about tasks which
> > > > share mm and do not share signal (i.e. clone(CLONE_VM && !CLONE_SIGHAND)
> > > > tasks) ? Thawing only tasks which share signal is a halfway job.
> > > >
> > >
> > > Here is a different approach which does not thaw tasks as of mark_oom_victim()
> > > but thaws tasks as of oom_killer_disable(). I think that we don't need to
> > > distinguish OOM victims and killed/exiting tasks when we disable the OOM
> > > killer, for trying to reclaim as much memory as possible is preferable for
> > > reducing the possibility of memory allocation failure after the OOM killer
> > > is disabled.
> >
> > This makes the oom_killer_disable suspend specific which is imho not
> > necessary. While we do not have any other user outside of the suspend
> > path right now and I hope we will not need any in a foreseeable future
> > there is no real reason to do a hack like this if we can make the
> > implementation suspend independent.
>
> My intention is to somehow get rid of oom_killer_disable(). While I wrote
> this approach, I again came to wonder why we need to disable the OOM killer
> during suspend.
>
> If the reason is that the OOM killer thaws already frozen OOM victims,
> we won't have reason to disable the OOM killer if the OOM killer does not
> thaw OOM victims. We can rely on the OOM killer/reaper immediately before
> start taking a memory snapshot for suspend.

Yes, if we don't have to wake already frozen tasks then the life would
be easier. But as I've already mentioned the async oom doesn't cover all
we need and the tasks can be frozen also from the userspace which means
that this is under user control.

> If the reason is that the OOM killer changes SIGKILL pending state of
> already frozen OOM victims during taking a memory snapshot, I think that
> sending SIGKILL via not only SysRq-f but also SysRq-i will be problematic.

Sysrq+i will not be a problem because that will not thaw any frozen
tasks.

> If the reason is that the OOM reaper changes content of mm_struct of
> OOM victims during taking a memory snapshot,

I do not think this is a problem. But I have to think about this some
more. My thinking is that even if saved the original content before
reaping it then all that matters is that the victim just goes away so it
cannot observe the corruption.

--
Michal Hocko
SUSE Labs