Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim

Next message: Aaron Tomlin: "Re: [PATCH v10 13/13] docs: add io_queue flag to isolcpus"
Previous message: Philipp Stanner: "[PATCH v2 2/2] drm/nouveau: Fix double call to drm_sched_entity_fini()"
In reply to: Matt Fleming: "Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim"
Next in thread: Shakeel Butt: "Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Pedro Falcato

Date: Wed Apr 15 2026 - 10:57:44 EST

On Fri, Apr 10, 2026 at 11:15:49AM +0100, Matt Fleming wrote:
> From: Matt Fleming <mfleming@xxxxxxxxxxxxxx>
>
> should_reclaim_retry() uses zone_reclaimable_pages() to estimate whether
> retrying reclaim could eventually satisfy an allocation. It's possible
> for reclaim to make minimal or no progress on an LRU type despite having
> ample reclaimable pages, e.g. anonymous pages when the only swap is
> RAM-backed (zram). This can cause the reclaim path to loop indefinitely.
>
> Track LRU reclaim progress (anon vs file) through a new struct
> reclaim_progress passed out of try_to_free_pages(), and only count a
> type's reclaimable pages if at least reclaim_progress_pct% was actually
> reclaimed in the last cycle.

I think there is at least one problem with this heuristic: you are counting
everything that hasn't made progress as "we cannot reclaim it". When in reality
you can simply fail to make progress on any given folio as e.g it's referenced
and we want to give it another spin in the LRU.

My theory (from merely reading the patch, maybe I missed something) is that
a pathological case for this is a lot of folios added to the LRU in a row,
that are set referenced (or dirty). Say SWAP_CLUSTER_MAX * MAX_RECLAIM_RETRIES
- it will simply OOM too early.

The other question is whether this effectively solves reclaim problems - some
hard numbers would be great.

--
Pedro