Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim

From: Shakeel Butt

Date: Wed Apr 15 2026 - 21:04:33 EST

On Fri, Apr 10, 2026 at 11:15:49AM +0100, Matt Fleming wrote:
> From: Matt Fleming <mfleming@xxxxxxxxxxxxxx>
>
> should_reclaim_retry() uses zone_reclaimable_pages() to estimate whether
> retrying reclaim could eventually satisfy an allocation. It's possible
> for reclaim to make minimal or no progress on an LRU type despite having
> ample reclaimable pages, e.g. anonymous pages when the only swap is
> RAM-backed (zram).

Or incompressible memory on zswap with writeback disabled or overcommitted
memory.min.

> This can cause the reclaim path to loop indefinitely.
>
> Track LRU reclaim progress (anon vs file) through a new struct
> reclaim_progress passed out of try_to_free_pages(), and only count a
> type's reclaimable pages if at least reclaim_progress_pct% was actually
> reclaimed in the last cycle.
>
> The threshold is exposed as /proc/sys/vm/reclaim_progress_pct (default
> 1, range 0-100).

Let's not expose any sysctl or user visible API for this heuristic. It will
evolve and then this interface would be awkward and hard to remove.

> Setting 0 disables the gate and restores the previous
> behaviour. Environments with only RAM-backed swap (zram) and small
> memory may need a higher value to prevent futile anon LRU churn from
> keeping the allocator spinning.
>
> Suggested-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Signed-off-by: Matt Fleming <mfleming@xxxxxxxxxxxxxx>
> ---

[...]

>
> @@ -4637,7 +4672,24 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
> !__cpuset_zone_allowed(zone, gfp_mask))
> continue;
>
> - available = reclaimable = zone_reclaimable_pages(zone);
> + /*
> + * Only count reclaimable pages from an LRU type if reclaim
> + * actually made headway on that type in the last cycle.
> + * This prevents the allocator from looping endlessly on
> + * account of a large pool of pages that reclaim cannot make
> + * progress on, e.g. anonymous pages when the only swap is
> + * RAM-backed (zram).
> + */
> + reclaimable = 0;
> + reclaimable_file = zone_reclaimable_file_pages(zone);
> + reclaimable_anon = zone_reclaimable_anon_pages(zone);

Here we are getting the current reclaimable pages.

> +
> + if (reclaim_progress_sufficient(progress->nr_file, reclaimable_file))
> + reclaimable += reclaimable_file;
> + if (reclaim_progress_sufficient(progress->nr_anon, reclaimable_anon))
> + reclaimable += reclaimable_anon;

And here we are comparing the current reclaimable pages with last iteration. Is
this intentional to keep things simple?

> +
> + available = reclaimable;
> available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
>

Another heuristic we can play with is to also pass through the vmscan scan
count. If for couple of consecutive iterations, we continue to see low reclaim
efficiency, go for OOM. Also maybe compare the scan count with the watermark as
I expect we don't see much difference scan count for consecutive reclaim
iteration, so, it is a good representative of reclaimable memory.

The reclaim efficiency heuristic should handle the swap-on-zram or
incomp-zswap-with-no-writeback. Treating scan count as proxy for reclaimable
memory should handle the overcommitted memory.min case.