Re: [PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations

From: Sven van Ashbrook
Date: Mon Feb 26 2024 - 11:09:37 EST


Vlastimil,

We noticed that this patch is now added to Andrew Morton's
mm-hotfixes-unstable branch.

How can we help to get this into mm-hotfixes-stable
and from there, into mainline ?

We are still stress-testing using low memory suspend. Anything else
that is required, and we can help with?

Sven

On Wed, Feb 21, 2024 at 6:44 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> Sven reports an infinite loop in __alloc_pages_slowpath() for costly
> order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such
> combination can happen in a suspend/resume context where a GFP_KERNEL
> allocation can have __GFP_IO masked out via gfp_allowed_mask.
>
> Quoting Sven:
>
> 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
> with __GFP_RETRY_MAYFAIL set.
>
> 2. page alloc's __alloc_pages_slowpath tries to get a page from the
> freelist. This fails because there is nothing free of that costly
> order.
>
> 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
> which bails out because a zone is ready to be compacted; it pretends
> to have made a single page of progress.
>
> 4. page alloc tries to compact, but this always bails out early because
> __GFP_IO is not set (it's not passed by the snd allocator, and even
> if it were, we are suspending so the __GFP_IO flag would be cleared
> anyway).
>
> 5. page alloc believes reclaim progress was made (because of the
> pretense in item 3) and so it checks whether it should retry
> compaction. The compaction retry logic thinks it should try again,
> because:
> a) reclaim is needed because of the early bail-out in item 4
> b) a zonelist is suitable for compaction
>
> 6. goto 2. indefinite stall.
>
> (end quote)
>
> The immediate root cause is confusing the COMPACT_SKIPPED returned from
> __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
> indicating a lack of order-0 pages, and in step 5 evaluating that in
> should_compact_retry() as a reason to retry, before incrementing and
> limiting the number of retries. There are however other places that
> wrongly assume that compaction can happen while we lack __GFP_IO.
>
> To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
> evaluation and switch the open-coded test in try_to_compact_pages() to
> use it.
>
> Also use the new helper in:
> - compaction_ready(), which will make reclaim not bail out in step 3, so
> there's at least one attempt to actually reclaim, even if chances are
> small for a costly order
> - in_reclaim_compaction() which will make should_continue_reclaim()
> return false and we don't over-reclaim unnecessarily
> - in __alloc_pages_slowpath() to set a local variable can_compact,
> which is then used to avoid retrying reclaim/compaction for costly
> allocations (step 5) if we can't compact and also to skip the early
> compaction attempt that we do in some cases
>
> Reported-by: Sven van Ashbrook <svenva@xxxxxxxxxxxx>
> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@xxxxxxxxxxxxxx/
> Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
> ---
> include/linux/gfp.h | 9 +++++++++
> mm/compaction.c | 7 +------
> mm/page_alloc.c | 10 ++++++----
> mm/vmscan.c | 5 ++++-
> 4 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index de292a007138..e2a916cf29c4 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp)
> return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS);
> }
>
> +/*
> + * Check if the gfp flags allow compaction - GFP_NOIO is a really
> + * tricky context because the migration might require IO.
> + */
> +static inline bool gfp_compaction_allowed(gfp_t gfp_mask)
> +{
> + return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO);
> +}
> +
> extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
>
> #ifdef CONFIG_CONTIG_ALLOC
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4add68d40e8d..b961db601df4 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
> unsigned int alloc_flags, const struct alloc_context *ac,
> enum compact_priority prio, struct page **capture)
> {
> - int may_perform_io = (__force int)(gfp_mask & __GFP_IO);
> struct zoneref *z;
> struct zone *zone;
> enum compact_result rc = COMPACT_SKIPPED;
>
> - /*
> - * Check if the GFP flags allow compaction - GFP_NOIO is really
> - * tricky context because the migration might require IO
> - */
> - if (!may_perform_io)
> + if (!gfp_compaction_allowed(gfp_mask))
> return COMPACT_SKIPPED;
>
> trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 150d4f23b010..a663202045dc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> struct alloc_context *ac)
> {
> bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
> + bool can_compact = gfp_compaction_allowed(gfp_mask);
> const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
> struct page *page = NULL;
> unsigned int alloc_flags;
> @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> * Don't try this for allocations that are allowed to ignore
> * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
> */
> - if (can_direct_reclaim &&
> + if (can_direct_reclaim && can_compact &&
> (costly_order ||
> (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
> && !gfp_pfmemalloc_allowed(gfp_mask)) {
> @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>
> /*
> * Do not retry costly high order allocations unless they are
> - * __GFP_RETRY_MAYFAIL
> + * __GFP_RETRY_MAYFAIL and we can compact
> */
> - if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
> + if (costly_order && (!can_compact ||
> + !(gfp_mask & __GFP_RETRY_MAYFAIL)))
> goto nopage;
>
> if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
> @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> * implementation of the compaction depends on the sufficient amount
> * of free memory (see __compaction_suitable)
> */
> - if (did_some_progress > 0 &&
> + if (did_some_progress > 0 && can_compact &&
> should_compact_retry(ac, order, alloc_flags,
> compact_result, &compact_priority,
> &compaction_retries))
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4f9c854ce6cc..4255619a1a31 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
> /* Use reclaim/compaction for costly allocs or under memory pressure */
> static bool in_reclaim_compaction(struct scan_control *sc)
> {
> - if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
> + if (gfp_compaction_allowed(sc->gfp_mask) && sc->order &&
> (sc->order > PAGE_ALLOC_COSTLY_ORDER ||
> sc->priority < DEF_PRIORITY - 2))
> return true;
> @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
> {
> unsigned long watermark;
>
> + if (!gfp_compaction_allowed(sc->gfp_mask))
> + return false;
> +
> /* Allocation can already succeed, nothing to do */
> if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
> sc->reclaim_idx, 0))
> --
> 2.43.1
>