Re: [RFC] mm/vmscan: add periodic slab shrinker

From: Dave Chinner
Date: Sun Apr 03 2022 - 21:15:07 EST

Next message: Damien Le Moal: "Re: [PATCH v2 2/2] ata: ahci: Protect users from setting policies their drives don't support"
Previous message: Yuntao Wang: "[PATCH bpf-next v3] libbpf: Don't return -EINVAL if hdr_len < offsetofend(core_relo_len)"
In reply to: Roman Gushchin: "Re: [RFC] mm/vmscan: add periodic slab shrinker"
Next in thread: Roman Gushchin: "Re: [RFC] mm/vmscan: add periodic slab shrinker"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, Apr 03, 2022 at 08:56:18AM +0800, Hillf Danton wrote:
> On Sat, 2 Apr 2022 10:54:36 -0700 Roman Gushchin wrote:
> > Hello Hillf!
> >
> Hello Roman,
>
> > Thank you for sharing it, really interesting! I=E2=80=99m actually working o=
> > n the same problem.=20
>
> Good to know you have some interest in it.
> Feel free to let me know you would like to take it over to avoid
> repeated works on both sides.
>
> >
> > No code to share yet, but here are some of my thoughts:
> > 1) If there is a =E2=80=9Cnatural=E2=80=9D memory pressure, no additional sl=
> > ab scanning is needed.
>
> Agree - the periodic shrinker can be canceled once kswapd wakes up.

I think we should be waking up per-node kswapd to do the periodic
shrinking, not adding yet another way of executing (thousands of)
shrinkers (across hundreds of nodes) from a single threaded context.

Indeed, the problem of "no reclaim when there is no memory
pressure" also affects the page cache, not just shrinkable caches
and we really don't want periodic reclaim to have a compeltely
different behaviour to general memory reclaim.

i.e. the amount of work that shrinkers need to do in a periodic scan
is largerly determined by the rate of shrinkable cache memory usage
growth rather than memory reclaim priority as it is now. Hence there
needs to be different high level "shrinker needs to do X amount of
work" calculation for periodic reclaim than there is now.

e.g. we calculate a rolling average of the size of the cache and a
rate of change over a series of polling operations (i.e. calling
->scan_count) and then when sustained growth is detected we start
trying to shrink the cache to limit the rate of growth of the cache.

If the cache keeps growing, then it's objects are being repeatedly
referenced and it *should* keep growing. If it's one-off objects
that are causing the growth of the cache and so objects are being
reclaimed by the shrinker, then matching the periodic shrink scan to
the growth rate will substantially reduce the rate of growth of that
cache.

And if it's integrated into the existing do_shrink_slab
calculations, the moment actual memory reclaim calls the shrinker
the periodic scan calculations can be reset back to zero and it
starts again...

> > 2) =46rom a power perspective it=E2=80=99s better to scan more at once, but l=
> > ess often.
>
> The shrinker proposed is a catapult on the vmscan side without knowing
> where the cold slab objects are piling up in Dave's backyard but he is
> free to take different actions than the regular shrinker - IOW this
> shrinker alone does not make much sense wrt shooting six birds without
> the stone on the slab owner side.
>
> It is currently scanning *every* slab cache at an arbitrary frequency,
> once 30 seconds - I am open to a minute or whatever.

Sorry, I don't understand what "Dave's backyard" is or why it would
ever need to be special cased?

> > 3) Maybe we need a feedback loop with the slab allocator: e.g. if slabs are a=
> > lmost full there is more sense to do a proactive scanning and free up some m=
> > emory, otherwise we=E2=80=99ll end up allocating more slabs. But it=E2=80=99=
> > s tricky.
>
> There are 31 bits available in the periodic flag added to shrink control.
>
> > 4) If the scanning is not resulting in any memory reclaim, maybe we should (=
> > temporarily) exclude the corresponding shrinker from the scanning.
>
> Given the periodic flag, Dave is free to ignore the scan request and the
> scan result is currently dropped on the vmscan side because what is
> considered is the cold slab objects that for instance have been inactive
> for more than 30 seconds in every slab cache, rather than kswapd's cake.

I don't understand how passing a "periodic" flag to individual
shrinkers is really useful here. How does the shrinker
implementation use this to determine how much work it needs to do?

i.e. The amount of work a shrinker needs to perform is calculated by
the high level slab scanning code based on relative cache size and
reclaim priority. If there's a periodic scanner, it should be
calculating a specific amount of work for the shrinkers to do way up
in do_shrink_slab() and then asking the shrinker to perform that
work in exactly the way it does now - the shrinker itself doesn't
need to know anything about whether it's a periodic memory reclaim
scan or whether there's actual memory pressure - it just needs to
scan the oldest objects in it's cache to try to reclaim them.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

Next message: Damien Le Moal: "Re: [PATCH v2 2/2] ata: ahci: Protect users from setting policies their drives don't support"
Previous message: Yuntao Wang: "[PATCH bpf-next v3] libbpf: Don't return -EINVAL if hdr_len < offsetofend(core_relo_len)"
In reply to: Roman Gushchin: "Re: [RFC] mm/vmscan: add periodic slab shrinker"
Next in thread: Roman Gushchin: "Re: [RFC] mm/vmscan: add periodic slab shrinker"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]