Re: [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed

From: Michal Hocko
Date: Thu Sep 05 2019 - 05:00:14 EST


[Ccing Mike for checking on the hugetlb side of this change]

On Wed 04-09-19 12:54:22, David Rientjes wrote:
> Memory compaction has a couple significant drawbacks as the allocation
> order increases, specifically:
>
> - isolate_freepages() is responsible for finding free pages to use as
> migration targets and is implemented as a linear scan of memory
> starting at the end of a zone,
>
> - failing order-0 watermark checks in memory compaction does not account
> for how far below the watermarks the zone actually is: to enable
> migration, there must be *some* free memory available. Per the above,
> watermarks are not always suffficient if isolate_freepages() cannot
> find the free memory but it could require hundreds of MBs of reclaim to
> even reach this threshold (read: potentially very expensive reclaim with
> no indication compaction can be successful), and
>
> - if compaction at this order has failed recently so that it does not even
> run as a result of deferred compaction, looping through reclaim can often
> be pointless.
>
> For hugepage allocations, these are quite substantial drawbacks because
> these are very high order allocations (order-9 on x86) and falling back to
> doing reclaim can potentially be *very* expensive without any indication
> that compaction would even be successful.
>
> Reclaim itself is unlikely to free entire pageblocks and certainly no
> reliance should be put on it to do so in isolation (recall lumpy reclaim).
> This means we should avoid reclaim and simply fail hugepage allocation if
> compaction is deferred.
>
> It is also not helpful to thrash a zone by doing excessive reclaim if
> compaction may not be able to access that memory. If order-0 watermarks
> fail and the allocation order is sufficiently large, it is likely better
> to fail the allocation rather than thrashing the zone.
>
> Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
> ---
> mm/page_alloc.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4458,6 +4458,28 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> if (page)
> goto got_pg;
>
> + if (order >= pageblock_order && (gfp_mask & __GFP_IO)) {
> + /*
> + * If allocating entire pageblock(s) and compaction
> + * failed because all zones are below low watermarks
> + * or is prohibited because it recently failed at this
> + * order, fail immediately.
> + *
> + * Reclaim is
> + * - potentially very expensive because zones are far
> + * below their low watermarks or this is part of very
> + * bursty high order allocations,
> + * - not guaranteed to help because isolate_freepages()
> + * may not iterate over freed pages as part of its
> + * linear scan, and
> + * - unlikely to make entire pageblocks free on its
> + * own.
> + */
> + if (compact_result == COMPACT_SKIPPED ||
> + compact_result == COMPACT_DEFERRED)
> + goto nopage;
> + }
> +
> /*
> * Checks for costly allocations with __GFP_NORETRY, which
> * includes THP page fault allocations

--
Michal Hocko
SUSE Labs