Re: [PATCH v10 10/14] mm: multi-gen LRU: kill switch

From: Andrew Morton
Date: Tue Apr 26 2022 - 18:22:47 EST


On Tue, 26 Apr 2022 14:57:15 -0600 Yu Zhao <yuzhao@xxxxxxxxxx> wrote:

> On Mon, Apr 11, 2022 at 8:16 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, 6 Apr 2022 21:15:22 -0600 Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
> >
> > > Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that
> > > can be disabled include:
> > > 0x0001: the multi-gen LRU core
> > > 0x0002: walking page table, when arch_has_hw_pte_young() returns
> > > true
> > > 0x0004: clearing the accessed bit in non-leaf PMD entries, when
> > > CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y
> > > [yYnN]: apply to all the components above
> > > E.g.,
> > > echo y >/sys/kernel/mm/lru_gen/enabled
> > > cat /sys/kernel/mm/lru_gen/enabled
> > > 0x0007
> > > echo 5 >/sys/kernel/mm/lru_gen/enabled
> > > cat /sys/kernel/mm/lru_gen/enabled
> > > 0x0005
> >
> > I'm shocked that this actually works. How does it work? Existing
> > pages & folios are drained over time or synchrnously?
>
> Basically we have a double-throw way, and once flipped, new (isolated)
> pages can only be added to the lists of the current implementation.
> Existing pages on the lists of the previous implementation are
> synchronously drained (isolated and then re-added), with
> cond_resched() of course.
>
> > Supporting
> > structures remain allocated, available for reenablement?
>
> Correct.
>
> > Why is it thought necessary to have this? Is it expected to be
> > permanent?
>
> This is almost a must for large scale deployments/experiments.
>
> For deployments, we need to keep fix rollout (high priority) and
> feature enabling (low priority) separate. Rolling out multiple
> binaries works but will make the process slower and more painful. So
> generally for each release, there is only one binary to roll out, and
> unless it's impossible, new features are disabled by default. Once a
> rollout completes, i.e., reaches enough population and remains stable,
> new features are turned on gradually. If something goes wrong with a
> new feature, we turn off that feature rather than roll back the
> kernel.
>
> Similarly, for A/B experiments, we don't want to use two binaries.

Please let's spell out this sort of high-level thinking in the
changelogging.

>From what you're saying, this is a transient thing. It sounds that
this enablement is only needed when mglru is at an early stage. Once
it has matured more then successive rollouts will have essentially the
same mglru implementation and being able to disable mglru at runtime
will no longer be required?

I guess the capability is reasonable simple/small and is livable with,
but does it have a long-term future?

I mean, when organizations such as google start adopting the mglru
implementation which is present in Linus's tree we're, what, a year or
more into the future? Will they still need a kill switch then?