Re: [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling

From: Kairui Song

Date: Wed Mar 25 2026 - 05:51:58 EST


On Wed, Mar 25, 2026 at 5:27 PM Eric Naim <dnaim@xxxxxxxxxxx> wrote:
>
> On 3/25/26 1:47 PM, Kairui Song wrote:
> > On Wed, Mar 25, 2026 at 1:04 PM Eric Naim <dnaim@xxxxxxxxxxx> wrote:
> >>
> >> Hi Kairui,
> >>
> >> On 3/18/26 3:08 AM, Kairui Song via B4 Relay wrote:
> >>> This series cleans up and slightly improves MGLRU's reclaim loop and
> >>> dirty flush logic. As a result, we can see an up to ~50% reduce of file
> >>> faults and 30% increase in MongoDB throughput with YCSB and no swap
> >>> involved, other common benchmarks have no regression, and LOC is
> >>> reduced, with less unexpected OOM in our production environment.
> >>>
> >
> > ...
> >
> >>
> >> I applied this patch set to 7.0-rc5 and noticed the system locking up when performing the below test.
> >>
> >> fallocate -l 5G 5G
> >> while true; do tail /dev/zero; done
> >> while true; do time cat 5G > /dev/null; sleep $(($(cat /sys/kernel/mm/lru_gen/min_ttl_ms)/1000+1)); done
> >>
> >> After reading [1], I suspect that this was because the system was using zram as swap, and yes if zram is disabled then the lock up does not occur.
> >
> > Hi Eric,
> >
> > Thanks for the report, I was about to send V2 but noticing your report
> > I'll try to reproduce your issue first.
> >
> > So far I didn't notice any regression, is this an issue caused by this
> > patch or is it an existing issue? I don't have any context about how
> > you are doing the test. BTW the calculation in patch "mm/mglru:
> > restructure the reclaim loop" needs to have a lowest bar
> > "max(nr_to_scan, SWAP_CLUSTER_MAX)" for small machines, not sure if
> > related but will add to V2.
> >
>
> As of writing this, I got some new information that makes this a bit more confusing. The kernel that doesn't have the issue was patched with [1] as a means of protecting the working set (similar to lru_gen_min_ttl_ms).
>
> So this time on an unpatched kernel, the system still freezes but quickly recovers itself after about 2 seconds. With this patchset applied, the system freezes but it doesn't quickly recover (if at all).
>
> Curiously, I had the user test again but this time with lru_gen_min_ttl_ms = 100. With this set, the system doesn't freeze at all with or without this patchset.

Ah thanks, that makes sense now, the downstream patch you mentioned
limits the reclaim of file pages to avoid thrashing, and your test
cases exhaust the memory on purpose which forces the kernel to reclaim
all reclaimable folios including page cache.

A thrashing page cache causes desktop hangs easily, using TTL is an
effective way to avoid thrashing and trigger OOM early. That's why the
problem is gone with lru_gen_min_ttl_ms = 100 or le9.

> > And about the test you posted:
> > while true; do tail /dev/zero; done
> >
> > I believe this will just consume all memory with zero pages and then
> > get OOM killed, that's exactly what the test is meant to do. By lockup
> > I'm not sure you mean since you mentioned OOM kill. The system
> > actually hung or the desktop is dead?
>
> The system actually hung. They needed a hard reset to recover the system. (pure speculation: given a few minutes the system would likely recover itself as this seems to be a common scenario)

Yeah I believe so.

Thrashing prevention is why MGLRU's TTL is introduced, so I do suggest
using that. It can be further improved too.

Will keep that in mind and try to make some test cases to cover your
case too and make some adjustments.

BTW how does the kernel behave with MGLRU disabled for your case?