Re: [PATCH v4 45/48] mm: shrinker: make global slab shrink lockless

From: Dave Chinner
Date: Mon Aug 07 2023 - 19:28:56 EST


On Mon, Aug 07, 2023 at 07:09:33PM +0800, Qi Zheng wrote:
> The shrinker_rwsem is a global read-write lock in shrinkers subsystem,
> which protects most operations such as slab shrink, registration and
> unregistration of shrinkers, etc. This can easily cause problems in the
> following cases.
....
> This commit uses the refcount+RCU method [5] proposed by Dave Chinner
> to re-implement the lockless global slab shrink. The memcg slab shrink is
> handled in the subsequent patch.
....
> ---
> include/linux/shrinker.h | 17 ++++++++++
> mm/shrinker.c | 70 +++++++++++++++++++++++++++++-----------
> 2 files changed, 68 insertions(+), 19 deletions(-)

There's no documentation in the code explaining how the lockless
shrinker algorithm works. It's left to the reader to work out how
this all goes together....

> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index eb342994675a..f06225f18531 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -4,6 +4,8 @@
>
> #include <linux/atomic.h>
> #include <linux/types.h>
> +#include <linux/refcount.h>
> +#include <linux/completion.h>
>
> #define SHRINKER_UNIT_BITS BITS_PER_LONG
>
> @@ -87,6 +89,10 @@ struct shrinker {
> int seeks; /* seeks to recreate an obj */
> unsigned flags;
>
> + refcount_t refcount;
> + struct completion done;
> + struct rcu_head rcu;

What does the refcount protect, why do we need the completion, etc?

> +
> void *private_data;
>
> /* These are for internal use */
> @@ -120,6 +126,17 @@ struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...);
> void shrinker_register(struct shrinker *shrinker);
> void shrinker_free(struct shrinker *shrinker);
>
> +static inline bool shrinker_try_get(struct shrinker *shrinker)
> +{
> + return refcount_inc_not_zero(&shrinker->refcount);
> +}
> +
> +static inline void shrinker_put(struct shrinker *shrinker)
> +{
> + if (refcount_dec_and_test(&shrinker->refcount))
> + complete(&shrinker->done);
> +}
> +
> #ifdef CONFIG_SHRINKER_DEBUG
> extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
> const char *fmt, ...);
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index 1911c06b8af5..d318f5621862 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -2,6 +2,7 @@
> #include <linux/memcontrol.h>
> #include <linux/rwsem.h>
> #include <linux/shrinker.h>
> +#include <linux/rculist.h>
> #include <trace/events/vmscan.h>
>
> #include "internal.h"
> @@ -577,33 +578,42 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
> if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
> return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
>
> - if (!down_read_trylock(&shrinker_rwsem))
> - goto out;
> -
> - list_for_each_entry(shrinker, &shrinker_list, list) {
> + rcu_read_lock();
> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
> struct shrink_control sc = {
> .gfp_mask = gfp_mask,
> .nid = nid,
> .memcg = memcg,
> };
>
> + if (!shrinker_try_get(shrinker))
> + continue;
> +
> + /*
> + * We can safely unlock the RCU lock here since we already
> + * hold the refcount of the shrinker.
> + */
> + rcu_read_unlock();
> +
> ret = do_shrink_slab(&sc, shrinker, priority);
> if (ret == SHRINK_EMPTY)
> ret = 0;
> freed += ret;
> +
> /*
> - * Bail out if someone want to register a new shrinker to
> - * prevent the registration from being stalled for long periods
> - * by parallel ongoing shrinking.
> + * This shrinker may be deleted from shrinker_list and freed
> + * after the shrinker_put() below, but this shrinker is still
> + * used for the next traversal. So it is necessary to hold the
> + * RCU lock first to prevent this shrinker from being freed,
> + * which also ensures that the next shrinker that is traversed
> + * will not be freed (even if it is deleted from shrinker_list
> + * at the same time).
> */

This comment really should be at the head of the function,
describing the algorithm used within the function itself. i.e. how
reference counts are used w.r.t. the rcu_read_lock() usage to
guarantee existence of the shrinker and the validity of the list
walk.

I'm not going to remember all these little details when I look at
this code in another 6 months time, and having to work it out from
first principles every time I look at the code will waste of a lot
of time...

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx