RE: [PATCH RFC] mm/mglru: lazily activate folios while folios are really mapped

Next message: George Guo: "[PATCH 1/1] LoongArch: proc: Simplify cpuinfo features output"
Previous message: Marc Dietrich: "Re: [PATCH] ARM: tegra: paz00: configure WiFi rfkill switch through device tree"
In reply to: Barry Song: "Re: [PATCH RFC] mm/mglru: lazily activate folios while folios are really mapped"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: wangzicheng

Date: Sat Feb 28 2026 - 05:28:57 EST

Hi Barry,
>
> I find your concern a bit surprising. If I understand correctly,
> you’re observing that file folios are currently being over-reclaimed.
> In that case, placing hot pages at the tail might make them harder
> to reclaim after PTE scanning (since they may still be young), but
> this seems to violate the fundamental principle of LRU. Moreover,
> when scanning encounters young file folios, reclaim will simply
> continue scanning more folios to find reclaimable ones, so scanning
> hot folios only wastes CPU time.
> Since read-ahead cold folios are placed at the head, relatively hotter
> folios may be reclaimed instead, causing refaults and further triggering
> reclaim, which can worsen the situation.
>
Thank you for the detailed explanation.
> >
> > We'll test this when available and report back. We hope to have a
> > chance to discuss this topic at LSF/MM/BPF.
> >
>
> Sure, thanks!
>
> Barry

For evaluation I’m using a workload that repeatedly cold-starts and
drives same user actions in 20+ apps on Android.
I’m comparing baseline(v6.6) vs. the patched kernel and watching
`/proc/vmstat -> workingset_refault_file`, expecting it to go down.

I ran 3 runs per kernel, but `workingset_refault_file` is quite noisy,
the Coefficient of Variation is around 40%, so the result doesn’t look
statistically solid.

Do you have any suggestions on how to measure the benefit more
robustly? For example:
- different or longer-running workloads,
- better normalization for refaults (per time, per faults, etc.),
- or other vmstat metrics that you found more stable in practice?

I’m also considering increasing the number of runs and using a t-test,
or comparing the CDF between baseline and patched kernels.
If you have a preferred methodology, I’d like to align with that.

Thanks,
Zicheng