Re: [PATCH 2/5] mm: vmscan: make memcg slab shrink lockless

From: Qi Zheng
Date: Wed Feb 22 2023 - 23:36:17 EST




On 2023/2/23 03:58, Kirill Tkhai wrote:
On 22.02.2023 10:32, Qi Zheng wrote:


On 2023/2/22 05:28, Kirill Tkhai wrote:
On 20.02.2023 12:16, Qi Zheng wrote:
<...>
    void reparent_shrinker_deferred(struct mem_cgroup *memcg)
  {
-    int i, nid;
+    int i, nid, srcu_idx;
      long nr;
      struct mem_cgroup *parent;
      struct shrinker_info *child_info, *parent_info;
@@ -429,16 +443,16 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
          parent = root_mem_cgroup;
        /* Prevent from concurrent shrinker_info expand */
-    down_read(&shrinker_rwsem);
+    srcu_idx = srcu_read_lock(&shrinker_srcu);

Don't we still have to be protected against parallel expand_one_shrinker_info()?

It looks like parent->nodeinfo[nid]->shrinker_info pointer may be changed in expand*
right after we've dereferenced it here.

Hi Kirill,

Oh, indeed. We may wrongly reparent the child's nr_deferred to the old
parent's nr_deferred (it is about to be freed by call_srcu).

The reparent_shrinker_deferred() will only be called on the offline
path (not a hotspot path), so we may be able to use shrinker_mutex
introduced later for protection. What do you think?

It looks good for me. One more thing I'd analyzed is whether we want to have
is two reparent_shrinker_deferred() are executing in parallel.

I see that mem_cgroup_css_offline() is already protected by
cgroup_mutex, so maybe shrinker_mutex is enough here. :)


Possible, we should leave rwsem there as it was used before..

      for_each_node(nid) {
-        child_info = shrinker_info_protected(memcg, nid);
-        parent_info = shrinker_info_protected(parent, nid);
+        child_info = shrinker_info_srcu(memcg, nid);
+        parent_info = shrinker_info_srcu(parent, nid);
          for (i = 0; i < shrinker_nr_max; i++) {
              nr = atomic_long_read(&child_info->nr_deferred[i]);
              atomic_long_add(nr, &parent_info->nr_deferred[i]);
          }
      }
-    up_read(&shrinker_rwsem);
+    srcu_read_unlock(&shrinker_srcu, srcu_idx);
  }
    static bool cgroup_reclaim(struct scan_control *sc)
@@ -891,15 +905,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
  {
      struct shrinker_info *info;
      unsigned long ret, freed = 0;
+    int srcu_idx;
      int i;
        if (!mem_cgroup_online(memcg))
          return 0;
  -    if (!down_read_trylock(&shrinker_rwsem))
-        return 0;
-
-    info = shrinker_info_protected(memcg, nid);
+    srcu_idx = srcu_read_lock(&shrinker_srcu);
+    info = shrinker_info_srcu(memcg, nid);
      if (unlikely(!info))
          goto unlock;
  @@ -949,14 +962,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
                  set_shrinker_bit(memcg, nid, i);
          }
          freed += ret;
-
-        if (rwsem_is_contended(&shrinker_rwsem)) {
-            freed = freed ? : 1;
-            break;
-        }
      }
  unlock:
-    up_read(&shrinker_rwsem);
+    srcu_read_unlock(&shrinker_srcu, srcu_idx);
      return freed;
  }
  #else /* CONFIG_MEMCG */




--
Thanks,
Qi