Re: [patch -mm 8/9 v2] oom: avoid oom killer for lowmemallocations

From: David Rientjes
Date: Tue Feb 16 2010 - 19:21:26 EST


On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote:

> > On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote:
> >
> > > > > > I'll add this check to __alloc_pages_may_oom() for the !(gfp_mask &
> > > > > > __GFP_NOFAIL) path since we're all content with endlessly looping.
> > > > >
> > > > > Thanks. Yes endlessly looping is far preferable to randomly oopsing
> > > > > or corrupting memory.
> > > > >
> > > >
> > > > Here's the new patch for your consideration.
> > > >
> > >
> > > Then, can we take kdump in this endlessly looping situaton ?
> > >
> > > panic_on_oom=always + kdump can do that.
> > >
> >
> > The endless loop is only helpful if something is going to free memory
> > external to the current page allocation: either another task with
> > __GFP_WAIT | __GFP_FS that invokes the oom killer, a task that frees
> > memory, or a task that exits.
> >
> > The most notable endless loop in the page allocator is the one when a task
> > has been oom killed, gets access to memory reserves, and then cannot find
> > a page for a __GFP_NOFAIL allocation:
> >
> > do {
> > page = get_page_from_freelist(gfp_mask, nodemask, order,
> > zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
> > preferred_zone, migratetype);
> >
> > if (!page && gfp_mask & __GFP_NOFAIL)
> > congestion_wait(BLK_RW_ASYNC, HZ/50);
> > } while (!page && (gfp_mask & __GFP_NOFAIL));
> >
> > We don't expect any such allocations to happen during the exit path, but
> > we could probably find some in the fs layer.
> >
> > I don't want to check sysctl_panic_on_oom in the page allocator because it
> > would start panicking the machine unnecessarily for the integrity
> > metadata GFP_NOIO | __GFP_NOFAIL allocation, for any
> > order > PAGE_ALLOC_COSTLY_ORDER, or for users who can't lock the zonelist
> > for oom kill that wouldn't have panicked before.
> >
>
> Then, why don't you check higzone_idx in oom_kill.c
>

out_of_memory() doesn't return a value to specify whether the page
allocator should retry the allocation or just return NULL, all that policy
is kept in mm/page_alloc.c. For highzone_idx < ZONE_NORMAL, we want to
fail the allocation when !(gfp_mask & __GFP_NOFAIL) and call the oom
killer when it's __GFP_NOFAIL.
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1696,6 +1696,9 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
/* The OOM killer will not help higher order allocs */
if (order > PAGE_ALLOC_COSTLY_ORDER)
goto out;
+ /* The OOM killer does not needlessly kill tasks for lowmem */
+ if (high_zoneidx < ZONE_NORMAL)
+ goto out;
/*
* GFP_THISNODE contains __GFP_NORETRY and we never hit this.
* Sanity check for bare calls of __GFP_THISNODE, not real OOM.
@@ -1924,15 +1927,23 @@ rebalance:
if (page)
goto got_pg;

- /*
- * The OOM killer does not trigger for high-order
- * ~__GFP_NOFAIL allocations so if no progress is being
- * made, there are no other options and retrying is
- * unlikely to help.
- */
- if (order > PAGE_ALLOC_COSTLY_ORDER &&
- !(gfp_mask & __GFP_NOFAIL))
- goto nopage;
+ if (!(gfp_mask & __GFP_NOFAIL)) {
+ /*
+ * The oom killer is not called for high-order
+ * allocations that may fail, so if no progress
+ * is being made, there are no other options and
+ * retrying is unlikely to help.
+ */
+ if (order > PAGE_ALLOC_COSTLY_ORDER)
+ goto nopage;
+ /*
+ * The oom killer is not called for lowmem
+ * allocations to prevent needlessly killing
+ * innocent tasks.
+ */
+ if (high_zoneidx < ZONE_NORMAL)
+ goto nopage;
+ }

goto restart;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/