Re: [LSF/MM/BPF TOPIC] Towards Unified and Extensible Memory Reclaim (reclaim_ext)

From: Shakeel Butt

Date: Thu Mar 26 2026 - 09:58:19 EST


On Thu, Mar 26, 2026 at 08:12:10AM +0100, Michal Hocko wrote:
> On Wed 25-03-26 14:06:37, Shakeel Butt wrote:
> > The Problem
> > -----------
> >
> > Memory reclaim in the kernel is a mess. We ship two completely separate
> > eviction algorithms -- traditional LRU and MGLRU -- in the same file.
> > mm/vmscan.c is over 8,000 lines. 40% of it is MGLRU-specific code that
> > duplicates functionality already present in the traditional path. Every
> > bug fix, every optimization, every feature has to be done twice or it
> > only works for half the users. This is not sustainable. It has to stop.
>
> While I do agree that having 2 implementations available and to maintain
> them is not long term sustainable I would disagree with your above line
> of argumentation. We are not aiming to have the two in feature parity
> nor they are overlapping in bug space all that much.

There is definitely basic set of features which we want from a reclaim
mechanism (e.g. writeback and writeback throttling which MGLRU lacked for a long
time) and it does not mean we should aim for feature parity.

For the bugs/debugging, we always need to answer if it is impacting one or the
other or both.

>
> > We should unify both algorithms into a single code path. In this path,
> > both algorithms are a set of hooks called from that path.
>
> Isn't this the case from a large part? MGRLU tends to have couple of
> entry points in the shared code base (node/memcg scanning code).

Most of the code is diverged at the reclaim entry point and from what I see the
code at the lowest layer (shrink_folio_list) is shared.

>
> > Everyone
> > maintains, understands, and evolves a single codebase. Optimizations are
> > now evaluated against -- and available to -- both algorithms. And the
> > next time someone develops a new LRU algorithm, they can do so in a way
> > that does not add churn to existing code.
>
> I think we should focus to make a single canonical reclaim
> implementation work well. I.e. we deal with most (or ideally all) known
> regressions of MGLRU.

Here we disagree on the approach or steps to reach the single canonical reclaim
implementation. MGLRU is a plethora of different mechanisms and policies and it
never went through rigorous evaluation for each of those mechanisms and
policies individually. To me that needs to be done to have one solution.

> In the initial presentation of the MGRLU framework
> we were told that the implemenation should be extensible to provide more
> creative aging algorithms etc.
>
> > The Fix: One Reclaim, Pluggable and Extensible
> > -----------------------------------------------
> >
> > We need one reclaim system, not two. One code path that everyone
> > maintains, everyone tests, and everyone benefits from. But it needs to
> > be pluggable as there will always be cases where someone wants some
> > customization for their specialized workload or wants to explore some
> > new techniques/ideas, and we do not want to get into the current mess
> > again.
>
> I would go that way only if/after we are done with MGLRU unification and
> after we will have depleted the potential of that approach and hit cases
> where we cannot implement new extensions without going $foo_ext way. TBH
> I am not convinced "make it pluginable to solve hard problems" is the
> best way forward.

The reason I have added pluggable/extensible part in this proposal is that I
want to avoid the same scenario all over again in the future. There will always
be some users with very specialized workloads needing some fancy/weird
heuristic. Rather than polluting the core reclaim, letting such users to do
fancy policies should be part of our long term strategy. In addition, we will
want to explore different algorithms and techniques, providing a way to easily
do that without changing the core is definitely needed for future proofing the
reclaim.