Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless

From: Qi Zheng
Date: Wed Dec 06 2023 - 03:37:55 EST


Hi,

On 2023/12/6 16:23, Lai Jiangshan wrote:
On Wed, Dec 6, 2023 at 3:55 PM Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote:

Hi,

On 2023/12/6 15:47, Lai Jiangshan wrote:
On Tue, Sep 12, 2023 at 9:57 PM Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote:

- if (!down_read_trylock(&shrinker_rwsem))
- goto out;
-
- list_for_each_entry(shrinker, &shrinker_list, list) {
+ /*
+ * lockless algorithm of global shrink.
+ *
+ * In the unregistration setp, the shrinker will be freed asynchronously
+ * via RCU after its refcount reaches 0. So both rcu_read_lock() and
+ * shrinker_try_get() can be used to ensure the existence of the shrinker.
+ *
+ * So in the global shrink:
+ * step 1: use rcu_read_lock() to guarantee existence of the shrinker
+ * and the validity of the shrinker_list walk.
+ * step 2: use shrinker_try_get() to try get the refcount, if successful,
+ * then the existence of the shrinker can also be guaranteed,
+ * so we can release the RCU lock to do do_shrink_slab() that
+ * may sleep.
+ * step 3: *MUST* to reacquire the RCU lock before calling shrinker_put(),
+ * which ensures that neither this shrinker nor the next shrinker
+ * will be freed in the next traversal operation.

Hello, Qi, Andrew, Paul,

I wonder know how RCU can ensure the lifespan of the next shrinker.
it seems it is diverged from the common pattern usage of RCU+reference.

cpu1:
rcu_read_lock();
shrinker_try_get(this_shrinker);
rcu_read_unlock();
cpu2: shrinker_free(this_shrinker);
cpu2: shrinker_free(next_shrinker); and free the memory of next_shrinker
cpu2: when shrinker_free(next_shrinker), no one updates this_shrinker's next
cpu2: since this_shrinker has been removed first.

No, this_shrinker will not be removed from the shrinker_list until the
last refcount is released. See below:

rcu_read_lock();
shrinker_put(this_shrinker);

CPU 1 CPU 2

--> if (refcount_dec_and_test(&shrinker->refcount))
complete(&shrinker->done);

wait_for_completion(&shrinker->done);
list_del_rcu(&shrinker->list);

since shrinker will not be removed from the shrinker_list until the
last refcount is released.

Is it possible that shrinker_free() can be starved by continuous
scanners getting and putting the refcount?

I actually considered this case, but the probability of this
happening was low, so I discarded the relevant code (v2 --> v3).
If this problem really occurs in a production environment, we
can fix it, like below:

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 1a00be90d93a..e5ebbbf1414f 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -88,6 +88,7 @@ struct shrinker {
long batch; /* reclaim batch size, 0 = default */
int seeks; /* seeks to recreate an obj */
unsigned flags;
+ bool registered;

/*
* The reference count of this shrinker. Registered shrinker have an
@@ -138,7 +139,8 @@ void shrinker_free(struct shrinker *shrinker);

static inline bool shrinker_try_get(struct shrinker *shrinker)
{
- return refcount_inc_not_zero(&shrinker->refcount);
+ return READ_ONCE(shrinker->registered) &&
+ refcount_inc_not_zero(&shrinker->refcount);
}

static inline void shrinker_put(struct shrinker *shrinker)
diff --git a/mm/shrinker.c b/mm/shrinker.c
index dd91eab43ed3..9b8881d178c6 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -753,6 +753,7 @@ void shrinker_register(struct shrinker *shrinker)
* shrinker_try_get().
*/
refcount_set(&shrinker->refcount, 1);
+ WRITE_ONCE(shrinker->registered, true);
}
EXPORT_SYMBOL_GPL(shrinker_register);

@@ -773,6 +774,7 @@ void shrinker_free(struct shrinker *shrinker)
return;

if (shrinker->flags & SHRINKER_REGISTERED) {
+ WRITE_ONCE(shrinker->registered, false);
/* drop the initial refcount */
shrinker_put(shrinker);
/*

Thanks,
Qi


Thanks
Lai



travel to the freed next_shrinker.

a quick simple fix:

// called with other references other than RCU (i.e. refcount)
static inline rcu_list_deleted(struct list_head *entry)
{
// something like this:
return entry->prev == LIST_POISON2;
}

// in the loop
if (rcu_list_deleted(&shrinker->list)) {
shrinker_put(shrinker);
goto restart;
}
rcu_read_lock();
shrinker_put(shrinker);

Thanks
Lai

+ * step 4: do shrinker_put() paired with step 2 to put the refcount,
+ * if the refcount reaches 0, then wake up the waiter in
+ * shrinker_free() by calling complete().
+ */
+ rcu_read_lock();
+ list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
struct shrink_control sc = {
.gfp_mask = gfp_mask,
.nid = nid,
.memcg = memcg,
};

+ if (!shrinker_try_get(shrinker))
+ continue;
+
+ rcu_read_unlock();
+
ret = do_shrink_slab(&sc, shrinker, priority);
if (ret == SHRINK_EMPTY)
ret = 0;
freed += ret;
- /*
- * Bail out if someone want to register a new shrinker to
- * prevent the registration from being stalled for long periods
- * by parallel ongoing shrinking.
- */
- if (rwsem_is_contended(&shrinker_rwsem)) {
- freed = freed ? : 1;
- break;
- }
+
+ rcu_read_lock();
+ shrinker_put(shrinker);
}