Re: [PATCH v5 04/14] mm/mglru: restructure the reclaim loop
From: Kairui Song
Date: Thu Apr 23 2026 - 12:56:02 EST
On Thu, Apr 16, 2026 at 02:33:48PM +0800, Barry Song wrote:
> On Mon, Apr 13, 2026 at 12:48 AM Kairui Song via B4 Relay
> <devnull+kasong.tencent.com@xxxxxxxxxx> wrote:
> >
> > From: Kairui Song <kasong@xxxxxxxxxxx>
> >
> > The current loop will calculate the scan number on each iteration. The
> > number of folios to scan is based on the LRU length, with some unclear
> > behaviors, eg, the scan number is only shifted by reclaim priority when
> > aging is not needed or when at the default priority, and it couples
> > the number calculation with aging and rotation.
> >
> > Adjust, simplify it, and decouple aging and rotation. Just calculate the
> > scan number for once at the beginning of the reclaim, always respect the
> > reclaim priority, and make the aging and rotation more explicit.
> >
> > This slightly changes how aging and offline memcg reclaim works:
> > Previously, aging was always skipped at DEF_PRIORITY even when
> > eviction was impossible. Now, aging is always triggered when it
> > is necessary to make progress. The old behavior may waste a reclaim
> > iteration only to escalate priority, potentially causing over-reclaim
> > of slab and breaking reclaim balance in multi-cgroup setups.
> >
> > Similar for offline memcg. Previously, offline memcg wouldn't be
> > aged unless it didn't have any evictable folios. Now, we might age
> > it if it has only 3 generations and the reclaim priority is less
> > than DEF_PRIORITY, which should be fine. On one hand, offline memcg
> > might still hold long-term folios, and in fact, a long-existing offline
> > memcg must be pinned by some long-term folios like shmem. These folios
> > might be used by other memcg, so aging them as ordinary memcg seems
> > correct. Besides, aging enables further reclaim of an offlined memcg,
> > which will certainly happen if we keep shrinking it. And offline
> > memcg might soon be no longer an issue with reparenting.
> >
> > And while at it, make it clear that unevictable memcg will get rotated
> > so following reclaim will more likely to skip them, as a optimization.
> > And apply a minimal batch factor when reclaim is running with higher
> > priority.
> >
> > Overall, the memcg LRU rotation, as described in mmzone.h,
> > remains the same.
> >
> > Reviewed-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
> > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
> > ---
> > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++----------------------------
> > 1 file changed, 39 insertions(+), 33 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 963362523782..d4aaaa62056d 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4913,49 +4913,41 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > }
> >
> > static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
> > - int swappiness, unsigned long *nr_to_scan)
> > + struct scan_control *sc, int swappiness)
> > {
> > DEFINE_MIN_SEQ(lruvec);
> >
> > - *nr_to_scan = 0;
> > /* have to run aging, since eviction is not possible anymore */
> > if (evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS > max_seq)
> > return true;
> >
> > - *nr_to_scan = lruvec_evictable_size(lruvec, swappiness);
> > + /* try to get away with not aging at the default priority */
>
> Not a native speaker, and I’ve been struggling a bit with this sentence.
> Does it mean “try to avoid aging at the default priority”?
Yes, good suggestion. Let me update this comment while at it then.
> > + if (sc->priority == DEF_PRIORITY)
> > + return false;
>
>
> "This slightly changes how aging and offline memcg reclaim works:
>
> Previously, aging was always skipped at DEF_PRIORITY even when
> eviction was impossible. Now, aging is always triggered when it
> is necessary to make progress."
>
> It seems clear that you are returning false for DEF_PRIORITY.
> How should I understand “aging is always triggered”?
It can return true above. But yeah, my commit message can be improved.
Will slightly update it.
> > +/*
> > + * For future optimizations:
> > + * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
> > + * reclaim.
> > + */
> > static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
> > {
> > + bool need_rotate = false;
> > long nr_batch, nr_to_scan;
> > - unsigned long scanned = 0;
> > int swappiness = get_swappiness(lruvec, sc);
> > + struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> > +
> > + nr_to_scan = get_nr_to_scan(lruvec, sc, memcg, swappiness);
> > + if (!nr_to_scan)
> > + need_rotate = true;
> >
> > - while (true) {
> > + while (nr_to_scan > 0) {
> > int delta;
> > + DEFINE_MAX_SEQ(lruvec);
> >
> > - nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
> > - if (nr_to_scan <= 0)
> > + if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) {
> > + need_rotate = true;
> > break;
> > + }
> > +
> > + if (should_run_aging(lruvec, max_seq, sc, swappiness)) {
> > + if (try_to_inc_max_seq(lruvec, max_seq, swappiness, false))
>
> Could we move the original comment here:
> /* stop scanning this lruvec as it's low on cold folios */
In a later commit we will drop the "stop scanning" behavior,
but I can keep the comment for now indeed.
Thanks for the review.