Re: [PATCH v4] mm/mglru: fix cgroup OOM during MGLRU state switching

From: Barry Song

Date: Wed Mar 18 2026 - 03:21:37 EST

On Wed, Mar 18, 2026 at 11:29 AM Leno Hou via B4 Relay
<devnull+lenohou.gmail.com@xxxxxxxxxx> wrote:
>
> From: Leno Hou <lenohou@xxxxxxxxx>

[...]

>
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index ad50688d89db..1f6b19bf365b 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -102,6 +102,12 @@ static __always_inline enum lru_list folio_lru_list(const struct folio *folio)
>
> #ifdef CONFIG_LRU_GEN
>
> +static inline bool lru_gen_draining(void)
> +{
> + DECLARE_STATIC_KEY_FALSE(lru_drain_core);
> +
> + return static_branch_unlikely(&lru_drain_core);
> +}

Can we name it lru_gen_switch() or lru_switch?
Since “drain” implies disabling MGLRU, the operation
could just as well be enabling it. Also, can we drop
the _core suffix?

> #ifdef CONFIG_LRU_GEN_ENABLED
> static inline bool lru_gen_enabled(void)
> {
> @@ -316,6 +322,11 @@ static inline bool lru_gen_enabled(void)
> return false;
> }
>
> +static inline bool lru_gen_draining(void)

lru_gen_switching()?

> +{
> + return false;
> +}
> +
> static inline bool lru_gen_in_fault(void)
> {
> return false;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 6398d7eef393..0b5f663f3062 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -966,7 +966,7 @@ static bool folio_referenced_one(struct folio *folio,
> nr = folio_pte_batch(folio, pvmw.pte, pteval, max_nr);
> }
>
> - if (lru_gen_enabled() && pvmw.pte) {
> + if (lru_gen_enabled() && !lru_gen_draining() && pvmw.pte) {

Ack. When LRU is switching, we don’t know where the
surrounding folios are—they could be on active/inactive
lists or on MGLRU. So the simplest approach is to
disable this look-around optimization.
But please add a comment here explaining it.

> if (lru_gen_look_around(&pvmw, nr))
> referenced++;
> } else if (pvmw.pte) {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 33287ba4a500..88b9db06e331 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -886,7 +886,7 @@ static enum folio_references folio_check_references(struct folio *folio,
> if (referenced_ptes == -1)
> return FOLIOREF_KEEP;
>
> - if (lru_gen_enabled()) {
> + if (lru_gen_enabled() && !lru_gen_draining()) {

I’m curious what prompted you to do this.

This feels a bit odd. I assume this effectively makes
folios on MGLRU, as well as those on active/inactive
lists, always follow the active/inactive logic.

It might be fine, but it needs thorough documentation here.

another approach would be:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33287ba4a500..91b60664b652 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -122,6 +122,9 @@ struct scan_control {
/* Proactive reclaim invoked by userspace */
unsigned int proactive:1;

+ /* Are we reclaiming from MGLRU */
+ unsigned int lru_gen:1;
+
/*
* Cgroup memory below memory.low is protected as long as we
* don't threaten to OOM. If any cgroup is reclaimed at
@@ -886,7 +889,7 @@ static enum folio_references
folio_check_references(struct folio *folio,
if (referenced_ptes == -1)
return FOLIOREF_KEEP;

- if (lru_gen_enabled()) {
+ if (sc->lru_gen) {
if (!referenced_ptes)
return FOLIOREF_RECLAIM;

This makes the logic perfectly correct (you know exactly
where your folios come from), but I’m not sure it’s worth it.

Anyway, I’d like to understand why you always need to
use the active/inactive logic even for folios from MGLRU.
To me, it seems to work only by coincidence, which isn’t good.

Thanks
Barry