Re: [PATCH 7/8] mm/mglru: simplify and improve dirty writeback handling

From: Kairui Song

Date: Tue Mar 24 2026 - 07:13:40 EST


On Tue, Mar 24, 2026 at 5:10 PM Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote:
> On 2026/3/18 3:09, Kairui Song via B4 Relay wrote:
> > From: Kairui Song <kasong@xxxxxxxxxxx>
> >
> > The current handling of dirty writeback folios is not working well for
> > file page heavy workloads: Dirty folios are protected and move to next
> > gen upon isolation of getting throttled or reactivated upon pageout
> > (shrink_folio_list).
> >
> > This might help to reduce the LRU lock contention slightly, but as a
> > result, the ping-pong effect of folios between head and tail of last two
> > gens is serious as the shrinker will run into protected dirty writeback
> > folios more frequently compared to activation. The dirty flush wakeup
> > condition is also much more passive compared to active/inactive LRU.
> > Active / inactve LRU wakes the flusher if one batch of folios passed to
> > shrink_folio_list is unevictable due to under writeback, but MGLRU
> > instead has to check this after the whole reclaim loop is done, and then
> > count the isolation protection number compared to the total reclaim
> > number.
> >
> > And we previously saw OOM problems with it, too, which were fixed but
> > still not perfect [1].
> >
> > So instead, just drop the special handling for dirty writeback, just
> > re-activate it like active / inactive LRU. And also move the dirty flush
> > wake up check right after shrink_folio_list. This should improve both
> > throttling and performance.
> >
> > Test with YCSB workloadb showed a major performance improvement:
> >
> > Before this series:
> > Throughput(ops/sec): 61642.78008938203
> > AverageLatency(us): 507.11127774145166
> > pgpgin 158190589
> > pgpgout 5880616
> > workingset_refault 7262988
> >
> > After this commit:
> > Throughput(ops/sec): 80216.04855744806 (+30.1%, higher is better)
> > AverageLatency(us): 388.17633477268913 (-23.5%, lower is better)
> > pgpgin 101871227 (-35.6%, lower is better)
> > pgpgout 5770028
> > workingset_refault 3418186 (-52.9%, lower is better)
> >
> > The refault rate is 50% lower, and throughput is 30% higher, which is a
> > huge gain. We also observed significant performance gain for other
> > real-world workloads.
> >
> > We were concerned that the dirty flush could cause more wear for SSD:
> > that should not be the problem here, since the wakeup condition is when
> > the dirty folios have been pushed to the tail of LRU, which indicates
> > that memory pressure is so high that writeback is blocking the workload
> > already.
> >
> > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
> > Link: https://lore.kernel.org/linux-mm/20241026115714.1437435-1-jingxiangzeng.cas@xxxxxxxxx/ [1]
> > ---
> > mm/vmscan.c | 44 +++++++++++++-------------------------------
> > 1 file changed, 13 insertions(+), 31 deletions(-)
> >

...

>
> I may be missing something, but I think this change moves dirty/writeback
> folios into `shrink_folio_list()` without moving the corresponding reclaim
> feedback as well.
>
> Before this patch, MGLRU mostly filtered dirty/writeback folios in
> `sort_folio()`. After this patch they can be isolated and processed by
> `shrink_folio_list()`, but the new code seems to only keep the flusher wakeup
> and no longer feeds the resulting state back into `sc->nr.*` (`dirty`,
> `congested`, `writeback`, `immediate`, `taken`).
>
> Those counters are consumed later by reclaim/throttling logic, so shouldn't
> MGLRU update them here too, similar to the classic inactive-LRU path?
>

Yeah, how about we make better use of them in a seperate patch? MGLRU
pretty much just ignored these counters and never populated some
sc->nr.* so far. It's still not an issue introduced by this patch,
could be an existing issue, if that's a valid issue.

This patch only changed sc->nr.unqueued_dirty/file_taken, combined
with tweaks to dirty handling, the result is pretty good.

How about a seperate patch after cleaning up the counters? The next
patch will remove unused ones, I think another patch can be
separately tested and reviewed for things like throttling.