Re: [PATCH v3 06/14] mm/mglru: use a smaller batch for reclaim

From: Kairui Song

Date: Fri Apr 03 2026 - 05:11:01 EST

On Fri, Apr 03, 2026 at 03:50:37PM +0800, Barry Song wrote:
> On Fri, Apr 3, 2026 at 2:53 AM Kairui Song via B4 Relay
> <devnull+kasong.tencent.com@xxxxxxxxxx> wrote:
> >
> > From: Kairui Song <kasong@xxxxxxxxxxx>
> >
> > With a fixed number to reclaim calculated at the beginning, making each
> > following step smaller should reduce the lock contention and avoid
> > over-aggressive reclaim of folios, as it will abort earlier when the
> > number of folios to be reclaimed is reached.
> >
> > Reviewed-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
> > Reviewed-by: Chen Ridong <chenridong@xxxxxxxxxxxxxxx>
> > Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
> > ---
> > mm/vmscan.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 643f9fc10214..9c28afb0219c 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -5008,7 +5008,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
> > break;
> > }
> >
> > - nr_batch = min(nr_to_scan, MAX_LRU_BATCH);
> > + nr_batch = min(nr_to_scan, MIN_LRU_BATCH);
>
> I’m fine with the smaller batch size, but I wonder if
> MIN_LRU_BATCH is too small.

Thanks for the review, Barry!

It's quite reasonable value I think, for comparison classical LRU's
batch size is SWAP_CLUSTER_MAX (32), even smaller than
MIN_LRU_BATCH (64).

I ran many different benchmarks on this which can be found in
V2 / V1's cover letter (it getting too long so I didn't include these
results in V3 but I did retest). The new value looked good from large
server to small VMs.

It's also a much more reasonable value for batch throttling and dirty
writeback IMO.

>
> Just curious if we are calling get_nr_to_scan() more frequently
> before we can abort the while (true) loop if reclamation
> is not making good progress.
>
> Assume get_nr_to_scan() also has a cost. Not sure if a
> value between MIN_LRU_BATCH and MAX_LRU_BATCH
> would be better.

We are calling that less frequently actually, in a previous
commit it was moved out of the loop to act like a budget
control. That's also where using a smaller batch start
to makes more sense.

The overhead of other function calls also seems trivial.

I also wonder if we can unify or remove some
SWAP_CLUSTER_MAX usage, that value might be no longer
suitable in many places.