Re: [PATCH v2] mm/mglru: use folio_mark_accessed to replace folio_set_active
From: Kairui Song
Date: Mon May 25 2026 - 13:58:56 EST
On Mon, May 25, 2026 at 8:32 PM Barry Song (Xiaomi) <baohua@xxxxxxxxxx> wrote:
>
> MGLRU gives high priority to folios mapped in page tables. As a result,
> folio_set_active() is invoked for all folios read during page faults. In
> practice, however, readahead can bring in many folios that are never
> accessed via page tables.
>
> A previous attempt by Lei Liu proposed introducing a separate LRU for
> readahead[1] to make readahead pages easier to reclaim, but that approach
> is likely over-engineered.
>
> Before commit 4d5d14a01e2c ("mm/mglru: rework workingset protection"),
> folios with PG_active were always placed in the youngest generation,
> leading to over-protection and increased refaults. After that commit,
> PG_active folios are placed in the second youngest generation, which is
> still too optimistic given the presence of readahead. In contrast, the
> classic active/inactive scheme is more conservative.
>
> This patch switches to using folio_mark_accessed(). If
> folio_check_references() later detects referenced PTEs, the folio
> will be promoted based on the reference flag set by
> folio_mark_accessed(). We should also adjust
> WORKINGSET_ACTIVATE and lru_gen_folio_seq(); for example, we should
> not unconditionally protect anon folios accordingly.
>
> The following uses a simple model to demonstrate why the current code is
> not ideal. It runs fio-3.42 in a memcg, reading a file in a strided
> pattern—4KB every 64KB—to simulate prefaulted pages that may not be
> accessed.
Thanks so much for the update! This looks much better than V1 RFC.
> Without the patch, we observed 12883855 file refaults and a very low
> bandwidth of 58.5 MiB/s, because prefaulted but unused pages occupy hot
> positions, continuously pushing out the real working set and causing
> incorrect reclaim. With the patch, we observed 0 refaults and bandwidth
> increased to 5078 MiB/s.
>
> Running the same test on x86:
> w/o patch:
> File Refault Delta is 3240029
> READ: bw=13.2MiB/s io=1183MiB
>
> w/ patch:
> File Refault Delta is 0
> READ: bw=7708MiB/s io=676GiB
Really nice result here.
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index a171070e15f0..c637e679a450 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -242,12 +242,11 @@ static inline unsigned long lru_gen_folio_seq(const struct lruvec *lruvec,
> gen = MIN_NR_GENS - folio_test_workingset(folio);
> else if (reclaiming)
> gen = MAX_NR_GENS;
> - else if ((!folio_is_file_lru(folio) && !folio_test_swapcache(folio)) ||
> - (folio_test_reclaim(folio) &&
> - (folio_test_dirty(folio) || folio_test_writeback(folio))))
> + else if (folio_test_reclaim(folio) &&
> + (folio_test_dirty(folio) || folio_test_writeback(folio)))
This change is more than just "use folio_mark_accessed to replace
folio_set_active", you are changing the pattern for all anon folios
here? But I think we are mostly working with file folios here
especially in the first test you posted?
I do like your new if condition though, but maybe it's not the right
time, or better add some words about it, how anon are handled now.
> gen = MIN_NR_GENS;
> else
> - gen = MAX_NR_GENS - folio_test_workingset(folio);
> + gen = MAX_NR_GENS - folio_test_workingset(folio) || folio_test_referenced(folio);
Nice, this aligns exactly with what I want here :), see lru_gen_folio_seq:
https://lore.kernel.org/linux-mm/20260502-mglru-fg-v1-6-913619b014d9@xxxxxxxxxxx/
But I think you got the precedence wrong? Did you mean:
(MAX_NR_GENS - folio_test_workingset(folio)) || folio_test_referenced(folio)
Or?
MAX_NR_GENS - (folio_test_workingset(folio) || folio_test_referenced(folio))
>
> return max(READ_ONCE(lrugen->max_seq) - gen + 1, READ_ONCE(lrugen->min_seq[type]));
> }
> diff --git a/mm/swap.c b/mm/swap.c
> index 5cc44f0de987..f320f93d60df 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -511,8 +511,12 @@ void folio_add_lru(struct folio *folio)
>
> /* see the comment in lru_gen_folio_seq() */
> if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
> - lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
> - folio_set_active(folio);
> + lru_gen_in_fault() && !(current->flags & PF_MEMALLOC)) {
> + if (folio_test_workingset(folio))
> + folio_set_active(folio);
What if we also move this set_active to mm/workingset.c? I think that
the only place where a folio with PG_workingset can get here?
> + else if (!folio_test_referenced(folio))
> + folio_mark_accessed(folio);
> + }
Same here? Also this folio_mark_accessed only means
folio_set_referenced here, which matches the folio_test_referenced
check you just added in lru_gen_folio_seq right?
>
> folio_batch_add_and_move(folio, lru_add);
> }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index e452cb043d46..48355f10542b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -848,7 +848,8 @@ static bool lru_gen_set_refs(struct folio *folio)
> return false;
> }
>
> - set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_workingset));
> + if (folio_test_active(folio))
> + set_mask_bits(&folio->flags.f, LRU_REFS_FLAGS, BIT(PG_workingset));
This looks a bit strange, PG_active is always unset for on LRU folios
for MGLRU right?
There is even a comment for that:
"PG_active is always cleared while a folio is on one of lrugen->folios[]"
And lru_gen_set_refs is mostly for on-LRU folios or isolated ones,
either would have this flag. So PG_workingset seems broken by this
point, rmap walk no longer get PG_workingset set, which removes the
"promote on 2nd access" mechanism entirely?
> return true;
> }
> #else
> diff --git a/mm/workingset.c b/mm/workingset.c
> index 07e6836d0502..2f0c08aa8623 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -319,11 +319,11 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
>
> atomic_long_add(delta, &lrugen->refaulted[hist][type][tier]);
>
> - /* see folio_add_lru() where folio_set_active() will be called */
> - if (lru_gen_in_fault())
> - mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta);
> -
> if (workingset) {
> + if (lru_gen_in_fault()) {
> + folio_set_active(folio);
> + mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta);
> + }
Overall I agree with this directly that file folios shouldn't get over
protected. It will be even better if we actually activate the folio
using things like refault distance... Which can be done later.
BTW, this would break reactivation if we age periodically. That is
less of a problem compared to overprotection, I think.
And do you think we can keep a folio_set_active if folio_test_anon ==
true in swap.c? Proactive activation relies on the assumption that
applications never expect an jitter for ordinary memory access, which
holds true for anon folios, but file folios don't follow that idea.
And I'm not trying to seize the topic but my previous experiment using
refault distance-based reactivation resulted in a 500% performance
gain for some databases with no other regressions observed:
https://lwn.net/Articles/945266/
So for this patch I think it's on the right track, we can definitely
merge some follow up of this patch first, and then refine the pattern
step by step toward refault distance, MGLRU-FG or both plus others.
That will take a long time, so let's fix the low-hanging fruit first.
Really looking forward to a few more updates from this patch :) !