Re: [PATCH v7 03/15] mm/mglru: relocate the LRU scan batch limit to callers

From: Shakeel Butt

Date: Fri May 29 2026 - 17:29:53 EST


On Fri, May 29, 2026 at 02:01:43PM +0800, Kairui Song wrote:
> On Fri, May 29, 2026 at 1:40 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
> >
> > On Tue, Apr 28, 2026 at 02:06:54AM +0800, Kairui Song via B4 Relay wrote:
> > > From: Kairui Song <kasong@xxxxxxxxxxx>
> > >
> > > Same as active / inactive LRU, MGLRU isolates and scans folios in batches.
> > > The batch split is done hidden deep in the helper, which makes the code
> > > harder to follow. The helper's arguments are also confusing since callers
> > > usually request more folios than the batch size, so the helper almost
> > > never processes the full requested amount.
> > >
> > > Move the batch splitting into the top loop to make it cleaner, there
> > > should be no behavior change.
> > >
> > > Reviewed-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
> > > Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> > > Reviewed-by: Barry Song <baohua@xxxxxxxxxx>
> > > Reviewed-by: Chen Ridong <chenridong@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
> > > ---
> > > mm/vmscan.c | 16 +++++++++-------
> > > 1 file changed, 9 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 7f011ff4c478..a011733a6392 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -4695,10 +4695,10 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > > int scanned = 0;
> > > int isolated = 0;
> > > int skipped = 0;
> > > - int scan_batch = min(nr_to_scan, MAX_LRU_BATCH);
> > > - int remaining = scan_batch;
> > > + unsigned long remaining = nr_to_scan;
> > > struct lru_gen_folio *lrugen = &lruvec->lrugen;
> > >
> > > + VM_WARN_ON_ONCE(nr_to_scan > MAX_LRU_BATCH);
> >
> > Do we really need a warning here? Also why are we limiting it to MAX_LRU_BATCH?
> > For memcg/proactive reclaim, we can get larger number. What will break if we
> > remove this limitation?
>
> Hi,
> Isolating a large chunk of folios off the list is usually a bad idea,
> livelock is one concern, besides, concurrent reclaimer won't see them
> anymore, and LRU operations on them will be skipped (eg. roration).
> Under heavy pressure, this could lead to premature OOM because some
> reclaimers will see the LRU as empty. or accuracy loss.

Oh this is the isolation count/limit. CLRU isolates SWAP_CLUSTER_MAX folios at a
time. Any reason why MGLRU does 4096 (on 64-bit machine)?

Later we should evaluate what would be the right isolation count and use same
for both unless there is some inherent reason behind the difference.

>
> There is a LRU isolate throttle for CLRU, which is missing for MGLRU,

I am not a fan of that specific (too_many_isolated) throttling. In the long term
we should properly throttle the number of direct reclaimers.