[PATCH mm-unstable v1 4/4] mm/mglru: reclaim offlined memcgs harder

From: Yu Zhao
Date: Fri Dec 08 2023 - 01:14:28 EST


In the effort to reduce zombie memcgs [1], it was discovered that the
memcg LRU doesn't apply enough pressure on offlined memcgs.
Specifically, instead of rotating them to the tail of the current
generation (MEMCG_LRU_TAIL) for a second attempt, it moves them to the
next generation (MEMCG_LRU_YOUNG) after the first attempt.

Not applying enough pressure on offlined memcgs can cause them to
build up, and this can be particularly harmful to memory-constrained
systems.

On Pixel 8 Pro, launching apps for 50 cycles:
Before After Change
Zombie memcgs 45 35 -22%

[1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@xxxxxxxxxxxxxx/

Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
Reported-by: T.J. Mercier <tjmercier@xxxxxxxxxx>
Tested-by: T.J. Mercier <tjmercier@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---
include/linux/mmzone.h | 8 ++++----
mm/vmscan.c | 24 ++++++++++++++++--------
2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e3093ef9530f..2efd3be484fd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -524,10 +524,10 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw);
* 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
* 2. The first attempt to reclaim a memcg below low, which triggers
* MEMCG_LRU_TAIL;
- * 3. The first attempt to reclaim a memcg below reclaimable size threshold,
- * which triggers MEMCG_LRU_TAIL;
- * 4. The second attempt to reclaim a memcg below reclaimable size threshold,
- * which triggers MEMCG_LRU_YOUNG;
+ * 3. The first attempt to reclaim a memcg offlined or below reclaimable size
+ * threshold, which triggers MEMCG_LRU_TAIL;
+ * 4. The second attempt to reclaim a memcg offlined or below reclaimable size
+ * threshold, which triggers MEMCG_LRU_YOUNG;
* 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG;
* 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG;
* 7. Offlining a memcg, which triggers MEMCG_LRU_OLD.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cac38e9cac86..dad4b80b04cd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4626,7 +4626,12 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
}

/* try to scrape all its memory if this memcg was deleted */
- *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
+ if (!mem_cgroup_online(memcg)) {
+ *nr_to_scan = total;
+ return false;
+ }
+
+ *nr_to_scan = total >> sc->priority;

/*
* The aging tries to be lazy to reduce the overhead, while the eviction
@@ -4747,14 +4752,9 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
bool success;
unsigned long scanned = sc->nr_scanned;
unsigned long reclaimed = sc->nr_reclaimed;
- int seg = lru_gen_memcg_seg(lruvec);
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);

- /* see the comment on MEMCG_NR_GENS */
- if (!lruvec_is_sizable(lruvec, sc))
- return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
-
mem_cgroup_calculate_protection(NULL, memcg);

if (mem_cgroup_below_min(NULL, memcg))
@@ -4762,7 +4762,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)

if (mem_cgroup_below_low(NULL, memcg)) {
/* see the comment on MEMCG_NR_GENS */
- if (seg != MEMCG_LRU_TAIL)
+ if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL)
return MEMCG_LRU_TAIL;

memcg_memory_event(memcg, MEMCG_LOW);
@@ -4778,7 +4778,15 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)

flush_reclaim_state(sc);

- return success ? MEMCG_LRU_YOUNG : 0;
+ if (success && mem_cgroup_online(memcg))
+ return MEMCG_LRU_YOUNG;
+
+ if (!success && lruvec_is_sizable(lruvec, sc))
+ return 0;
+
+ /* one retry if offlined or too small */
+ return lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL ?
+ MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
}

#ifdef CONFIG_MEMCG
--
2.43.0.472.g3155946c3a-goog