Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim

Next message: Wei-Lin Chang: "Re: [PATCH v2 3/4] KVM: arm64: sefltests: Add basic NV selftest"
Previous message: Bobby Eshleman: "Re: [PATCH net] eth: fbnic: fix double-free of PCS on phylink creation failure"
In reply to: Matt Fleming: "Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Shakeel Butt

Date: Thu Apr 16 2026 - 17:58:53 EST

On Thu, Apr 16, 2026 at 09:44:55AM +0800, Barry Song wrote:
> On Fri, Apr 10, 2026 at 6:16 PM Matt Fleming <matt@xxxxxxxxxxxxxxxx> wrote:
> >
> > From: Matt Fleming <mfleming@xxxxxxxxxxxxxx>
> >
> > should_reclaim_retry() uses zone_reclaimable_pages() to estimate whether
> > retrying reclaim could eventually satisfy an allocation. It's possible
> > for reclaim to make minimal or no progress on an LRU type despite having
> > ample reclaimable pages, e.g. anonymous pages when the only swap is
> > RAM-backed (zram). This can cause the reclaim path to loop indefinitely.
>
> I am still struggling to understand when zram-backed
> reclamation cannot make progress. Is it because zram is
> full, or because folio_alloc_swap() fails?
>
> Or does zs_malloc() fail, causing pageout() to fail?
> Even incompressible pages are still written as
> ZRAM_HUGE pages and reclaimed successfully.

We should have counters for these, right?

>
> >
> > Track LRU reclaim progress (anon vs file) through a new struct
> > reclaim_progress passed out of try_to_free_pages(), and only count a
> > type's reclaimable pages if at least reclaim_progress_pct% was actually
> > reclaimed in the last cycle.
>
> I would rather detect what causes the lack of progress
> and implement a better fallback.

This is a good question. I think we have appropriate counters in /proc/vmstat
for cases where pages keep getting recycled in the LRUs instead of reclaim.

Matt, do you see anything unexpected in /proc/vmstat?