Re: FYI: mmap_sem OOM patch

From: David Rientjes
Date: Tue Jul 13 2010 - 17:08:39 EST

Next message: tip-bot for Yinghai Lu: "[tip:x86/setup] x86, setup: Make the setup code also accept console=uart8250"
Previous message: Felipe W Damasio: "Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets"
In reply to: KOSAKI Motohiro: "Re: FYI: mmap_sem OOM patch"
Next in thread: Peter Zijlstra: "Re: FYI: mmap_sem OOM patch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 13 Jul 2010, KOSAKI Motohiro wrote:

> > > I disagree. __GFP_NOFAIL mean this allocation failure can makes really
> > > dangerous result. Instead, OOM-Killer should try to kill next process.
> > > I think.
> > >
> >
> > That's not what happens, __alloc_pages_high_priority() will loop forever
> > for __GFP_NOFAIL, the oom killer is never recalled.
>
> Yup, please reread the discusstion.
>

There's nothing in the discussion that addresses the fact that
__alloc_pages_high_priority() loops infinitely for __GFP_NOFAIL without
again calling the oom killer. You may be proposing a change to that when
you said "OOM-Killer should try to kill next process. I think," but I
can't speculate on that. If you are, please propose a patch.

The success of whether we can allocate without watermarks is dependent on
where we set them and what types of exclusions we allow. This isn't local
only to the page allocator but rather to the entire kernel since
GFP_ATOMIC allocations, for example, can deplete the min watermark by
~60%. The remaining memory is what we allow access to for
ALLOC_NO_WATERMARKS: those allocations in the direct reclaim patch and
those that have been oom killed.

Thus, it's important that oom killed tasks do not completely deplete
memory reserves, otherwise __GFP_NOFAIL allocations will loop forever
without killing additional tasks, as you say. That's sane since oom
killing additional tasks wouldn't allow them access to any additional
memory, anyway, so the allocation would still fail (we only call the oom
killer for those allocations that are retried) and the victim could not
exit.

With that said, I'm not exactly sure what change you're proposing when you
say the oom killer should try to kill another process because it won't
lead to any future memory freeing unless the victim can exit without
allocating memory. If that's not possible, you've proliferated a ton of
innocent kills that were triggered because of one __GFP_NOFAIL attempt.

I agree with Peter that it would be ideal to remove __GFP_NOFAIL, but I
think we require changes in the retry logic first before that is possible.
Right now, we insist on retrying all blockable !__GFP_NORETRY allocations
under PAGE_ALLOC_COSTLY_ORDER indefinitely. That, in a sense, is already
__GFP_NOFAIL behavior that is implicit: that's why we typically see
__GFP_NOFAIL with GFP_NOFS instead. With GFP_NOFS, we never kill the oom
killer in the first place, so memory allocation is only more successful on
a subsequent attempt by either direct reclaim or memory compaction.

There's nothing preventing users from doing

do {
page = alloc_page(GFP_KERNEL);
} while (!page);

if the retry logic were reworked to start failing allocations or we
removed __GFP_NOFAIL. Thus, I think __GFP_NOFAIL should be substituted
with a different gfp flag that acts similar but failable: use compaction,
reclaim, and the oom killer where possible and in that order if there is
no success and then retry to a high watermark one final time. If the
allocation is still unsuccessful, return NULL. This allows users to do
what getblk() does, for instance, by implementing their own memory freeing
functions. It also allows them to use all of the page allocator's powers
(compaction, reclaim, oom killer) without infinitely looping by either
using an order under PAGE_ALLOC_COSTLY_ORDER or insisting on __GFP_NOFAIL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: tip-bot for Yinghai Lu: "[tip:x86/setup] x86, setup: Make the setup code also accept console=uart8250"
Previous message: Felipe W Damasio: "Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets"
In reply to: KOSAKI Motohiro: "Re: FYI: mmap_sem OOM patch"
Next in thread: Peter Zijlstra: "Re: FYI: mmap_sem OOM patch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]