Re: [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling

From: Kairui Song

Date: Wed Mar 25 2026 - 01:48:39 EST


On Wed, Mar 25, 2026 at 1:04 PM Eric Naim <dnaim@xxxxxxxxxxx> wrote:
>
> Hi Kairui,
>
> On 3/18/26 3:08 AM, Kairui Song via B4 Relay wrote:
> > This series cleans up and slightly improves MGLRU's reclaim loop and
> > dirty flush logic. As a result, we can see an up to ~50% reduce of file
> > faults and 30% increase in MongoDB throughput with YCSB and no swap
> > involved, other common benchmarks have no regression, and LOC is
> > reduced, with less unexpected OOM in our production environment.
> >

...

> > Before: 2881.41s
> > After patch 3: 2894.09s
> > After patch 4: 2846.73s
> > After patch 5: 2847.91s
> > After patch 6: 2835.17s
> > After patch 7: 2842.90s
> >
> > Also seem only noise level changes, no regression or very slightly better.
> >
> > Link: https://lore.kernel.org/linux-mm/CAMgjq7BoekNjg-Ra3C8M7=8=75su38w=HD782T5E_cxyeCeH_g@xxxxxxxxxxxxxx/ [1]
> > Link: https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloadb [2]
> > Link: https://lore.kernel.org/all/20221220214923.1229538-1-yuzhao@xxxxxxxxxx/ [3]
> > Link: https://github.com/ryncsn/emm-test-project/tree/master/file-anon-mix-pressure [4]
> >
> > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
>
> I applied this patch set to 7.0-rc5 and noticed the system locking up when performing the below test.
>
> fallocate -l 5G 5G
> while true; do tail /dev/zero; done
> while true; do time cat 5G > /dev/null; sleep $(($(cat /sys/kernel/mm/lru_gen/min_ttl_ms)/1000+1)); done
>
> After reading [1], I suspect that this was because the system was using zram as swap, and yes if zram is disabled then the lock up does not occur.

Hi Eric,

Thanks for the report, I was about to send V2 but noticing your report
I'll try to reproduce your issue first.

So far I didn't notice any regression, is this an issue caused by this
patch or is it an existing issue? I don't have any context about how
you are doing the test. BTW the calculation in patch "mm/mglru:
restructure the reclaim loop" needs to have a lowest bar
"max(nr_to_scan, SWAP_CLUSTER_MAX)" for small machines, not sure if
related but will add to V2.

And about the test you posted:
while true; do tail /dev/zero; done

I believe this will just consume all memory with zero pages and then
get OOM killed, that's exactly what the test is meant to do. By lockup
I'm not sure you mean since you mentioned OOM kill. The system
actually hung or the desktop is dead?

I just ran that with or without ZRAM on two machines and my laptop,
everything looks good here with this series.

> zram as swap seems to be unsupported by upstream.

That's simply not true, other distros like Fedora even have ZRAM as
swap by default:
https://fedoraproject.org/wiki/Changes/SwapOnZRAM

And systemd have a widely used ZRAM swap support:
https://github.com/systemd/zram-generator

Android also uses that, and we are using ZRAM by default in our fleet
which runs fine.

> the user that tested this wasn't able to get a
> good kernel trace, the only thing left was
> a trace of the OOM killer firing.

No worry, that's fine, just send me the OOM trace or log, the more
detailed context I get the better.