[PATCH] mm: add swappiness=max arg to memory.reclaim for only anon reclaim

From: Zhongkun He
Date: Tue Mar 18 2025 - 09:55:14 EST


With this patch 'commit <68cd9050d871> ("mm: add swappiness= arg to
memory.reclaim")', we can submit an additional swappiness=<val> argument
to memory.reclaim. It is very useful because we can dynamically adjust
the reclamation ratio based on the anonymous folios and file folios of
each cgroup. For example,when swappiness is set to 0, we only reclaim
from file folios.

However,we have also encountered a new issue: when swappiness is set to
the MAX_SWAPPINESS, it may still only reclaim file folios.

So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
proactive memory reclaim only reclaims from anonymous folios when
swappiness is set to max. The swappiness semantics from a user
perspective remain unchanged.

For example, something like this:

echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim

will perform reclaim on the rootcg with a swappiness setting of 'max' (a
new mode) regardless of the file folios. Users have a more comprehensive
view of the application's memory distribution because there are many
metrics available. For example, if we find that a certain cgroup has a
large number of inactive anon folios, we can reclaim only those and skip
file folios, because with the zram/zswap, the IO tradeoff that
cache_trim_mode or other file first logic is making doesn't hold -
file refaults will cause IO, whereas anon decompression will not.

With this patch, the swappiness argument of memory.reclaim has a new
mode 'max', means reclaiming just from anonymous folios both in traditional
LRU and MGLRU.

Here is the previous discussion:
https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@xxxxxxxxxxxxx/
https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@xxxxxxxxxxxxx/

Suggested-by: Yosry Ahmed <yosry.ahmed@xxxxxxxxx>
Signed-off-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx>
---
Documentation/admin-guide/cgroup-v2.rst | 4 ++++
include/linux/swap.h | 4 ++++
mm/memcontrol.c | 5 +++++
mm/vmscan.c | 10 ++++++++++
4 files changed, 23 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index cb1b4e759b7e..c39ef4314499 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1343,6 +1343,10 @@ The following nested keys are defined.
same semantics as vm.swappiness applied to memcg reclaim with
all the existing limitations and potential future extensions.

+ If set swappiness=max, memory reclamation will exclusively
+ target the anonymous folio list for both traditional LRU and
+ MGLRU reclamation algorithms.
+
memory.peak
A read-write single value file which exists on non-root cgroups.

diff --git a/include/linux/swap.h b/include/linux/swap.h
index b13b72645db3..a94efac10fe5 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -419,6 +419,10 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
#define MEMCG_RECLAIM_PROACTIVE (1 << 2)
#define MIN_SWAPPINESS 0
#define MAX_SWAPPINESS 200
+
+/* Just recliam from anon folios in proactive memory reclaim */
+#define ONLY_ANON_RECLAIM_MODE (MAX_SWAPPINESS + 1)
+
extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
unsigned long nr_pages,
gfp_t gfp_mask,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4de6acb9b8ec..0d0400f141d1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4291,11 +4291,13 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,

enum {
MEMORY_RECLAIM_SWAPPINESS = 0,
+ MEMORY_RECLAIM_ONLY_ANON_MODE,
MEMORY_RECLAIM_NULL,
};

static const match_table_t tokens = {
{ MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"},
+ { MEMORY_RECLAIM_ONLY_ANON_MODE, "swappiness=max"},
{ MEMORY_RECLAIM_NULL, NULL },
};

@@ -4329,6 +4331,9 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS)
return -EINVAL;
break;
+ case MEMORY_RECLAIM_ONLY_ANON_MODE:
+ swappiness = ONLY_ANON_RECLAIM_MODE;
+ break;
default:
return -EINVAL;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c767d71c43d7..779a9a3cf715 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2438,6 +2438,16 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
goto out;
}

+ /*
+ * Do not bother scanning file folios if the memory reclaim
+ * invoked by userspace through memory.reclaim and set
+ * 'swappiness=max'.
+ */
+ if (sc->proactive && (swappiness == ONLY_ANON_RECLAIM_MODE)) {
+ scan_balance = SCAN_ANON;
+ goto out;
+ }
+
/*
* Do not apply any pressure balancing cleverness when the
* system is close to OOM, scan both anon and file equally
--
2.39.5