Re: [PATCH v6 00/14] mm/mglru: improve reclaim loop and dirty folio handling

From: Barry Song

Date: Sun Apr 26 2026 - 04:35:09 EST

On Sun, Apr 26, 2026 at 2:59 PM Kairui Song <ryncsn@xxxxxxxxx> wrote:
>
> On Sun, Apr 26, 2026 at 4:58 AM Barry Song (Xiaomi) <baohua@xxxxxxxxxx> wrote:
> >
> > On Sat, Apr 25, 2026 at 9:30 PM Kairui Song <ryncsn@xxxxxxxxx> wrote:
> > [...]
> > >
> > > I just ran a matrix of for kernels (mainline, mm-new HEAD, before this
> > > series, after this series) X 3 different memcg configs (-j96 3G, -j48
> > > 2G, -j24 1G), and none of these showed any regression but all
> > > improvement. That's really odd.
> > >
> > > One possibility is that I removed the:
> > >
> > > if (evictable_min_seq(lrugen->min_seq, swappiness) + MIN_NR_GENS >
> > > lrugen->max_seq)
> > > scanned = 0;
> > >
> > > Which will make the reclaim loop go further and trigger aging.
> > > Previously if reclaim drained the LRU's cold gens, it may go reclaim
> > > slab instead. So idle inodes will be dropped with the mapping and
> > > reclaim more file, and we won't see any refault data from that since
> > > the mapping itself is gone. Sys will be lower too, as IO isn't counted
> > > as sys. Checking your data, despite sys is higher, real is acutually
> > > lower, which matches my guess.
> > >
> > > Will the following patch help? I'm not sure if this is the problem,
> > > but this added back that early abort, personally I don't think this
> > > really makes much sense as it's more like a workaround for other
> > > issues, but if that helps we might better keep it.
> >
> > Hi Kairui, after five hours of sleep I’m feeling much more
> > refreshed and should have identified the issue. I think the
> > problem will be clear once you look at the patch below.
> > Feel free to include it in the new version if you find it
> > helpful.
> >
> > From e3a0b50dc53a3a5f2ef1adfb73111336e3c2d08b Mon Sep 17 00:00:00 2001
> > From: "Barry Song (Xiaomi)" <baohua@xxxxxxxxxx>
> > Date: Sun, 26 Apr 2026 08:34:21 +1200
> > Subject: [PATCH] mm/mglru: Avoid falling back to another type when
> > scan_folios() leaves no remaining pages
> >
> > While remaining reaches 0 in scan_folios(), we quickly fall back
> > to the other type in isolate_folios(). This is incorrect, as the
> > current type may still have sufficient folios. Falling back can
> > undermine the positive_ctrl_err() result from get_type_to_scan(),
> > which is derived from swappiness.
> >
> > A simple fix is to continue scanning this type for another round.
> >
> > Signed-off-by: Barry Song (Xiaomi) <baohua@xxxxxxxxxx>
> > ---
> > mm/vmscan.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index ef45f3a4aa38..169fbbe17c7b 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4788,8 +4788,13 @@ static int isolate_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > *isolate_scanned = scanned;
> > break;
> > }
> > -
> > - type = !type;
> > + /*
> > + * If scanned > 0 and isolated == 0, avoid falling back to the
> > + * other type, as this type remains sufficient. Falling back
> > + * too readily can disrupt the positive_ctrl_err() bias.
> > + */
> > + if (!scanned)
> > + type = !type;
> > }
>
> Thanks so much for catching this! The fix makes perfect sense, it
> restores the pre-patch behavior. I was too aggressive with the
> fallback; positive_ctrl_err() should win whenever the preferred type
> is actually productive.
>
> Would you prefer I squash this into the original patch (with a
> Co-developed-by for you?), or keep it as a standalone fix on top? I'm
> fine either way.

Either approach is fine—use whichever works better for organizing v7.
My boss would definitely be happier with the latter, as it would better
support and encourage more active engagement in upstream development.
It’s rare these days to find a boss like this who is open-minded and
insightful about upstream contributions, especially with everyone
moving toward AI. :-)

>
> One related thought for a follow-up: scanned == 0 from scan_folios()
> is essentially driven by get_nr_gens(lruvec, type) == MIN_NR_GENS. So
> when the type get_type_to_scan() picked is the one that's
> gen-exhausted, maybe the right response is also to age that type
> rather than fall back. Right now should_run_aging() only looks at the
> combined evictable_min_seq, so the ctrl_err preferred type can
> silently stay starved while we evict from the other one, that could be
> an old issue we can fix later.

Yep, I’ve been thinking about the same thing for quite a
few days. This might also help address swappiness. On the
other hand, it could lead to more aging, but it seems like
a necessary trade-off if we want a simple solution that
doesn’t require separate max_seq for file and anon to fix
swappiness for mglru.

I’m also trying another approach. For example, if the
number of folios in the old generation is too low relative
to swappiness, should_run_aging() could return true—similar
to Yu’s earliest patch as below, but with a swappiness bias.

+ /*
+ * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1)
+ * of the total number of pages for each generation. A reasonable range
+ * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The
+ * aging cares about the upper bound of hot pages, while the eviction
+ * cares about the lower bound of cold pages.
+ */
+ if (young * MIN_NR_GENS > total)
+ return true;
+ if (old * (MIN_NR_GENS + 2) < total)
+ return true;

Hopefully, the above can resolve the problem before we have to
resort to separate max_seq, which would break the shared time
axis between file and anon.

Maybe I will send an RFC before LSF/MM/BPF if we have enough
time.

Thanks
Barry