Re: can't oom-kill zap the victim's memory?

From: Tetsuo Handa
Date: Thu Oct 01 2015 - 08:13:44 EST


David Rientjes wrote:
> On Wed, 30 Sep 2015, Tetsuo Handa wrote:
>
> > If we choose only 1 OOM victim, the possibility of hitting this memory
> > unmapping livelock is (say) 1%. But if we choose multiple OOM victims, the
> > possibility becomes (almost) 0%. And if we still hit this livelock even
> > after choosing many OOM victims, it is time to call panic().
> >
>
> Again, this is a fundamental disagreement between your approach of
> randomly killing processes hoping that we target one that can make a quick
> exit vs. my approach where we give threads access to memory reserves after
> reclaim has failed in an oom livelock so they at least make forward
> progress. We're going around in circles.

I don't like that memory management subsystem shows an expectant attitude
when memory allocation is failing. There are many possible silent hang up
paths. And my customer's servers might be hitting such paths. But I can't
go in front of their servers and capture SysRq. Thus, I want to let memory
management subsystem try to recover automatically; at least emit some
diagnostic kernel messages automatically.

>
> > (Well, do we need to change __alloc_pages_slowpath() that OOM victims do not
> > enter direct reclaim paths in order to avoid being blocked by unkillable fs
> > locks?)
> >
>
> OOM victims shouldn't need to enter reclaim, and there have been patches
> before to abort reclaim if current has a pending SIGKILL,

Yes. shrink_inactive_list() and throttle_direct_reclaim() recognize
fatal_signal_pending() tasks.

> if they have
> access to memory reserves.

What does this mean?

shrink_inactive_list() and throttle_direct_reclaim() do not check whether
OOM victims have access to memory reserves, do they?

We don't allow access to memory reserves by OOM victims without TIF_MEMDIE.
I think that we should favor kthread and dying threads over normal threads
at __alloc_pages_slowpath() but there is no response on
http://lkml.kernel.org/r/1442939668-4421-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx .

> Nothing prevents the victim from already being
> in reclaim, however, when it is killed.

I think this is problematic because there are unkillable locks in reclaim
paths. The memory management subsystem reports nothing.

>
> > > Perhaps this is an argument that we need to provide access to memory
> > > reserves for threads even for !__GFP_WAIT and !__GFP_FS in such scenarios,
> > > but I would wait to make that extension until we see it in practice.
> >
> > I think that GFP_ATOMIC allocations already access memory reserves via
> > ALLOC_HIGH priority.
> >
>
> Yes, that's true. It doesn't help for GFP_NOFS, however. It may be
> possible that GFP_ATOMIC reserves have been depleted or there is a
> GFP_NOFS allocation that gets stuck looping forever that doesn't get the
> ability to allocate without watermarks.

Why can't we emit some diagnostic kernel messages automatically?
Memory allocation requests which did not complete within e.g. 30 seconds
deserve possible memory allocation deadlock warning messages.

> I'd wait to see it in practice
> before making this extension since it relies on scanning the tasklist.
>

Is this extension something like check_hung_uninterruptible_tasks()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/