RE: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201

From: wangzicheng

Date: Thu Apr 09 2026 - 04:58:24 EST

> > > I suspect that once we can age file and anonymous pages
> > > separately, this issue will resolve itself. David already has
> > > some code for this [1].
> > >
> > > Not sure when he will have time to push it upstream, but I
> > > may carve out some time to take care of it this month.
> > >
> > > [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@xxxxxxxxxx/
> >
> > Hi, thanks for sharing the idea.
> >
> > Right, a few weeks ago I also got info from CachyOS that they are using
> > following patch for MGLRU:
> >
> > https://github.com/firelzrd/re-swappiness
> >
> > The idea is also split the seq number for anon / file so swappiness
> > works again.
> >
> > However, I really not sure if this is the right approach. It changes
> > the model of MGLRU and things like TTL may no longer work as expected.
> > And TTL does solve real problems too (also from CachyOS):
> >
> > https://github.com/firelzrd/le9uo
> >
> > TTL replaced the le9 patch above in a cleaner way for thrashing
> > prevention.
> >
> > Right now we do page table walk (and it walks both anon / folio)
> > while generating one unified new gen, meaning the folios in that
> > gen have the same (or at least all older than a specific) access
> > time, which is used as the metric for TTL.
> >
> > Besides, having unified gens also help implementing things like
> > workingset reporting where each gen is like a bin for histogram:
> >
> > https://lwn.net/Articles/976985/
> >
> > Aging triggering could be a bit more problematic too.
> > I think the right way is to just do the aging asynchronously, Yu
> > even left a TODO comment in vmscan.c:
> >
> > /*
> > * For future optimizations:
> > * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for
> memcg
> > * reclaim.
> > */
>
> Aging asynchronously could be a separate topic, as we can
> do many things in an async manner—similar to proposals for
> asynchronous compression. These async approaches may improve
> performance, but they also add complexity—for example, managing
> CPU utilization of reclamation threads to prevent devices from
> overheating.
>

Asynchronously reclamation could indeed help(when swap is enabled).
We have also saw some improvements with a similar approach in
Android workloads. Async aging makes swappiness more effective
so that more anonymous pages eventually become reclaimable.

Similar to async aging, giving aging more opportunities may also help.
For example, in should_run_aging(), return true when
Evictable pages < MIN_LRU_BATCH.
I haven't tested this yet but plan to try it.

> >
> > Then, we start the aging when ever there is less than 4 gens, and
> > allow reclaim to always go on even if there is only 2 gens left.
>
> I don’t think allowing reclamation with two generations left
> will resolve the problem. The fundamental issue with sharing the
> same generations for file and anon is that one type must catch
> up with the other—either through reclamation or via what this
> patch is (admittedly) doing as a workaround. If we have to go
> through reclamation, that effectively makes swappiness invalid
> again.
>
> Allowing reclamation with two generations may let one type move
> ahead briefly, but over a smoothed time window there is no real
> difference, as the other type still has to catch up with the one
> that has fewer generations left.
>

That is true.

In some previous experiments on Android we observed that when tasks
are *frozen* and aging is triggered via the debugfs interface, pages may
gradually accumulate into a single generation. In that state the MGLRU
reclaim pattern controlled by swappiness becomes very similar to
classic LRU reclaim.

> >
> > The performance would be better since the is no more blocking
> > on aging, no change to existing model, and the change should
> > be smaller and easier to review IIUC.
> >
> > One concerning part is doing reclaim while only having 2 gens left.
> > I think it seems OK. It should be rare as 3 gens act as a buffer
> > already, having only 2 gens left means the async aging can't catch
> > up and system is under extreme pressure so it's unlikely the folios
> > will get access enough times to get meaningful heat info, and
> > refault will be more meaningful help to sorting out the workingset:
> >
> > https://lwn.net/Articles/945266/
> >
> > Cgroup reclaim can do some throttling on that too, and kswapd can
> > still do aging synchronically.
> >
> > Just some ideas, we may need to do some test and benchmark
> > to figure out which is the best solution. Discussion
> > is welcomed! :D
>
> Maybe we can still find a way to address the concerns you raised
> above, as well as TTL—for example, by using separate timestamps
> for anon and file pages.
>
> Thanks
> Barry

Best,
Zicheng