Re: [PATCH] mm,oom: Re-enable OOM killer using timers.

From: David Rientjes
Date: Tue Jan 26 2016 - 18:44:50 EST


On Fri, 22 Jan 2016, Tetsuo Handa wrote:

> > > (1) Design and use a system with appropriate memory capacity in mind.
> > >
> > > (2) When (1) failed, the OOM killer is invoked. The OOM killer selects
> > > an OOM victim and allow that victim access to memory reserves by
> > > setting TIF_MEMDIE to it.
> > >
> > > (3) When (2) did not solve the OOM condition, start allowing all tasks
> > > access to memory reserves by your approach.
> > >
> > > (4) When (3) did not solve the OOM condition, start selecting more OOM
> > > victims by my approach.
> > >
> > > (5) When (4) did not solve the OOM condition, trigger the kernel panic.
> > >
> >
> > This was all mentioned previously, and I suggested that the panic only
> > occur when memory reserves have been depleted, otherwise there is still
> > the potential for the livelock to be solved. That is a patch that would
> > apply today, before any of this work, since we never want to loop
> > endlessly in the page allocator when memory reserves are fully depleted.
> >
> > This is all really quite simple.
> >
>
> So, David is OK with above approach, right?
> Then, Michal and Johannes, are you OK with above approach?
>

The first step before implementing access to memory reserves on livelock
(my patch) and oom killing additional processes on livelock (your patch)
is to detect the appropriate place to panic() when reserves are depleted.

This has historically been done in the oom killer when there are no oom
killable processes left. That's easy to figure out and should still be
done, but we are now introducing the possibility of memory reserves being
fully depleted while there are oom killable processes left or victims that
cannot exit.

So we need a patch to the page allocator that would be applicable today
before any of the above is worked on to detect when reserves are depleted
and panic() rather than loop forever in the page allocator. I'd suggest
that this work be done as a follow-up to Michal's patchset to rework the
page allocator retry logic.

It's not entirely trivial because we want to detect situations when
high-order < PAGE_ALLOC_COSTLY_ORDER allocations are looping forever and
we are failing due to fragmentation as well. If all cpus are looping
trying to allocate a task_struct, and there are eligible zones with some
free memory but it is not allocatable, we still want to panic().

> What I'm not sure about above approach are handling of !__GFP_NOFAIL &&
> !__GFP_FS allocation requests and use of ALLOC_NO_WATERMARKS without
> TIF_MEMDIE.
>
> Basically, we want to make small allocation requests success unless
> __GFP_NORETRY is given. Currently such allocation requests do not fail
> unless TIF_MEMDIE is given by the OOM killer. But how hard do we want to
> continue looping when we reach (3) by timeout for waiting for TIF_MEMDIE
> task at (2) expires?
>

In my patch, that is tunable by the user with a new sysctl and defines
when the oom killer is considered livelocked because the victim cannot
exit. I think we'd do *did_some_progress = 1 for !__GFP_FS as is done
today before this expiration happens and otherwise trigger the oom killer
livelock detection in my patch to allow the allocation to succeed with
ALLOC_NO_WATERMARKS.