[PATCH v3] mm, slab: Extend slab/shrink to shrink all memcg caches

From: Waiman Long
Date: Tue Jul 23 2019 - 11:15:04 EST


Currently, a value of '1" is written to /sys/kernel/slab/<slab>/shrink
file to shrink the slab by flushing out all the per-cpu slabs and free
slabs in partial lists. This can be useful to squeeze out a bit more memory
under extreme condition as well as making the active object counts in
/proc/slabinfo more accurate.

This usually applies only to the root caches, as the SLUB_MEMCG_SYSFS_ON
option is usually not enabled and "slub_memcg_sysfs=1" not set. Even
if memcg sysfs is turned on, it is too cumbersome and impractical to
manage all those per-memcg sysfs files in a real production system.

So there is no practical way to shrink memcg caches. Fix this by
enabling a proper write to the shrink sysfs file of the root cache
to scan all the available memcg caches and shrink them as well. For a
non-root memcg cache (when SLUB_MEMCG_SYSFS_ON or slub_memcg_sysfs is
on), only that cache will be shrunk when written.

On a 2-socket 64-core 256-thread arm64 system with 64k page after
a parallel kernel build, the the amount of memory occupied by slabs
before shrinking slabs were:

# grep task_struct /proc/slabinfo
task_struct 53137 53192 4288 61 4 : tunables 0 0
0 : slabdata 872 872 0
# grep "^S[lRU]" /proc/meminfo
Slab: 3936832 kB
SReclaimable: 399104 kB
SUnreclaim: 3537728 kB

After shrinking slabs (by echoing "1" to all shrink files):

# grep "^S[lRU]" /proc/meminfo
Slab: 1356288 kB
SReclaimable: 263296 kB
SUnreclaim: 1092992 kB
# grep task_struct /proc/slabinfo
task_struct 2764 6832 4288 61 4 : tunables 0 0
0 : slabdata 112 112 0

[v3: Drop patch 2 & update doc]

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
Acked-by: Roman Gushchin <guro@xxxxxx>
Acked-by: Christoph Lameter <cl@xxxxxxxxx>
---
Documentation/ABI/testing/sysfs-kernel-slab | 13 +++++---
mm/slab.h | 1 +
mm/slab_common.c | 37 +++++++++++++++++++++
mm/slub.c | 2 +-
4 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab
index 29601d93a1c2..ed35833ad7f0 100644
--- a/Documentation/ABI/testing/sysfs-kernel-slab
+++ b/Documentation/ABI/testing/sysfs-kernel-slab
@@ -429,10 +429,15 @@ KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@xxxxxxxxxxxxxx>,
Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>
Description:
- The shrink file is written when memory should be reclaimed from
- a cache. Empty partial slabs are freed and the partial list is
- sorted so the slabs with the fewest available objects are used
- first.
+ The shrink file is used to reclaim unused slab cache
+ memory from a cache. Empty per-cpu or partial slabs
+ are freed and the partial list is sorted so the slabs
+ with the fewest available objects are used first.
+ It only accepts a value of "1" on write for shrinking
+ the cache. Other input values are considered invalid.
+ Shrinking slab caches might be expensive and can
+ adversely impact other running applications. So it
+ should be used with care.

What: /sys/kernel/slab/cache/slab_size
Date: May 2007
diff --git a/mm/slab.h b/mm/slab.h
index 9057b8056b07..5bf615cb3f99 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -174,6 +174,7 @@ int __kmem_cache_shrink(struct kmem_cache *);
void __kmemcg_cache_deactivate(struct kmem_cache *s);
void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s);
void slab_kmem_cache_release(struct kmem_cache *);
+void kmem_cache_shrink_all(struct kmem_cache *s);

struct seq_file;
struct file;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 807490fe217a..6491c3a41805 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -981,6 +981,43 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
}
EXPORT_SYMBOL(kmem_cache_shrink);

+/**
+ * kmem_cache_shrink_all - shrink a cache and all memcg caches for root cache
+ * @s: The cache pointer
+ */
+void kmem_cache_shrink_all(struct kmem_cache *s)
+{
+ struct kmem_cache *c;
+
+ if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !is_root_cache(s)) {
+ kmem_cache_shrink(s);
+ return;
+ }
+
+ get_online_cpus();
+ get_online_mems();
+ kasan_cache_shrink(s);
+ __kmem_cache_shrink(s);
+
+ /*
+ * We have to take the slab_mutex to protect from the memcg list
+ * modification.
+ */
+ mutex_lock(&slab_mutex);
+ for_each_memcg_cache(c, s) {
+ /*
+ * Don't need to shrink deactivated memcg caches.
+ */
+ if (s->flags & SLAB_DEACTIVATED)
+ continue;
+ kasan_cache_shrink(c);
+ __kmem_cache_shrink(c);
+ }
+ mutex_unlock(&slab_mutex);
+ put_online_mems();
+ put_online_cpus();
+}
+
bool slab_is_available(void)
{
return slab_state >= UP;
diff --git a/mm/slub.c b/mm/slub.c
index e6c030e47364..9736eb10dcb8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5294,7 +5294,7 @@ static ssize_t shrink_store(struct kmem_cache *s,
const char *buf, size_t length)
{
if (buf[0] == '1')
- kmem_cache_shrink(s);
+ kmem_cache_shrink_all(s);
else
return -EINVAL;
return length;
--
2.18.1