Re: [PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

From: Minchan Kim
Date: Tue Oct 11 2016 - 03:24:57 EST


On Tue, Oct 11, 2016 at 08:50:48AM +0200, Michal Hocko wrote:
> On Tue 11-10-16 14:01:41, Minchan Kim wrote:
> > Hi Michal,
> >
> > On Mon, Oct 10, 2016 at 09:41:40AM +0200, Michal Hocko wrote:
> > > On Fri 07-10-16 23:43:45, Minchan Kim wrote:
> > > > On Fri, Oct 07, 2016 at 11:09:17AM +0200, Michal Hocko wrote:
> [...]
> > > > > @@ -2102,10 +2109,12 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
> > > > > set_pageblock_migratetype(page, ac->migratetype);
> > > > > move_freepages_block(zone, page, ac->migratetype);
> > > > > spin_unlock_irqrestore(&zone->lock, flags);
> > > > > - return;
> > > > > + return true;
> > > >
> > > > Such cut-off makes reserved pageblock remained before the OOM.
> > > > We call it as premature OOM kill.
> > >
> > > Not sure I understand. The above should get rid of all atomic reserves
> > > before we go OOM. We can do it all at once but that sounds too
> >
> > The problem is there is race between page freeing path and unreserve
> > logic so that some pages could be in highatomic free list even though
> > zone->nr_reserved_highatomic is already zero.
>
> Does it make any sense to handle such an unlikely case?

I agree if it's really hard to solve but why should we remain
such hole in the algorithm if we can fix easily?

>
> > So, at least, it would be better to have a draining step at some point
> > where was (no_progress_loops == MAX_RECLAIM RETRIES) in my patch.
> >
> > Also, your patch makes retry loop greater than MAX_RECLAIM_RETRIES
> > if unreserve_highatomic_pageblock returns true. Theoretically,
> > it would make live lock. You might argue it's *really really* rare
> > but I don't want to add such subtle thing.
> > Maybe, we could drain when no_progress_loops == MAX_RECLAIM_RETRIES.
>
> What would be the scenario when we would really livelock here? How can
> we have unreserve_highatomic_pageblock returning true for ever?

Other context freeing highorder page/reallocating repeatedly while
a process stucked direct reclaim is looping with should_reclaim_retry.

>
> > > aggressive to me. If we just do one at the time we have a chance to
> > > keep some reserves if the OOM situation is really ephemeral.
> > >
> > > Does this patch work in your usecase?
> >
> > I didn't test but I guess it works but it has problems I mentioned
> > above.
>
> Please do not make this too over complicated and be practical. I do not
> really want to dismiss your usecase but I am really not convinced that
> such a "perfectly fit into all memory" situations are sustainable and
> justify to make the whole code more complex. I agree that we can at
> least try to do something to release those reserves but let's do it
> as simple as possible.

If you think it's too complicated, how about this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd91b8955b26..e3ce442e9976 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2098,7 +2098,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
* intense memory pressure but failed atomic allocations should be easier
* to recover from than an OOM.
*/
-static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
+static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
+ bool drain)
{
struct zonelist *zonelist = ac->zonelist;
unsigned long flags;
@@ -2106,11 +2107,12 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
struct zone *zone;
struct page *page;
int order;
+ bool ret = false;

for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
ac->nodemask) {
/* Preserve at least one pageblock */
- if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
+ if (!drain && zone->nr_reserved_highatomic <= pageblock_nr_pages)
continue;

spin_lock_irqsave(&zone->lock, flags);
@@ -2154,12 +2156,24 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
* may increase.
*/
set_pageblock_migratetype(page, ac->migratetype);
- move_freepages_block(zone, page, ac->migratetype);
- spin_unlock_irqrestore(&zone->lock, flags);
- return;
+ ret = move_freepages_block(zone, page,
+ ac->migratetype);
+ /*
+ * By race with page freeing functions, !highatomic
+ * pageblocks can have free pages in highatomic free
+ * list so if drain is true, try to unreserve every
+ * free pages in highatomic free list without bailing
+ * out.
+ */
+ if (!drain) {
+ spin_unlock_irqrestore(&zone->lock, flags);
+ return ret;
+ }
}
spin_unlock_irqrestore(&zone->lock, flags);
}
+
+ return ret;
}

/* Remove an element from the buddy allocator from the fallback list */
@@ -3358,7 +3372,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
* Shrink them them and try again
*/
if (!page && !drained) {
- unreserve_highatomic_pageblock(ac);
+ unreserve_highatomic_pageblock(ac, false);
drain_all_pages(NULL);
drained = true;
goto retry;
@@ -3475,8 +3489,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
* Make sure we converge to OOM if we cannot make any progress
* several times in the row.
*/
- if (*no_progress_loops > MAX_RECLAIM_RETRIES)
+ if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
+ if (unreserve_highatomic_pageblock(ac, true))
+ return true;
return false;
+ }

/*
* Keep reclaiming pages while there is a chance this will lead
>
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>