Re: [PATCH v2 12/12] mm/vmscan: unify writeback reclaim statistic and throttling
From: Kairui Song
Date: Wed Apr 01 2026 - 22:58:00 EST
On Wed, Apr 01, 2026 at 07:39:03PM +0800, Shakeel Butt wrote:
> On Sun, Mar 29, 2026 at 03:52:38AM +0800, Kairui Song via B4 Relay wrote:
> > From: Kairui Song <kasong@xxxxxxxxxxx>
> >
> > Currently MGLRU and non-MGLRU handle the reclaim statistic and
> > writeback handling very differently, especially throttling.
> > Basically MGLRU just ignored the throttling part.
> >
> > Let's just unify this part, use a helper to deduplicate the code
> > so both setups will share the same behavior. Also remove the
> > folio_clear_reclaim in isolate_folio which was actively invalidating
> > the congestion control. PG_reclaim is now handled by shrink_folio_list,
> > keeping it in isolate_folio is not helpful.
> >
> > Test using following reproducer using bash:
> >
> > echo "Setup a slow device using dm delay"
> > dd if=/dev/zero of=/var/tmp/backing bs=1M count=2048
> > LOOP=$(losetup --show -f /var/tmp/backing)
> > mkfs.ext4 -q $LOOP
> > echo "0 $(blockdev --getsz $LOOP) delay $LOOP 0 0 $LOOP 0 1000" | \
> > dmsetup create slow_dev
> > mkdir -p /mnt/slow && mount /dev/mapper/slow_dev /mnt/slow
> >
> > echo "Start writeback pressure"
> > sync && echo 3 > /proc/sys/vm/drop_caches
> > mkdir /sys/fs/cgroup/test_wb
> > echo 128M > /sys/fs/cgroup/test_wb/memory.max
> > (echo $BASHPID > /sys/fs/cgroup/test_wb/cgroup.procs && \
> > dd if=/dev/zero of=/mnt/slow/testfile bs=1M count=192)
> >
> > echo "Clean up"
> > echo "0 $(blockdev --getsz $LOOP) error" | dmsetup load slow_dev
> > dmsetup resume slow_dev
> > umount -l /mnt/slow && sync
> > dmsetup remove slow_dev
> >
> > Before this commit, `dd` will get OOM killed immediately if
> > MGLRU is enabled. Classic LRU is fine.
> >
> > After this commit, congestion control is now effective and no more
>
> What do you mean by congestion control here?
This particular case demostrated here is VMSCAN_THROTTLE_CONGESTED so
I described it as "congestion control", may I'll just say throttling to
avoid confusion, it's not limited to that.
>
> > spin on LRU or premature OOM.
> >
> > Stress test on other workloads also looking good.
> >
> > Suggested-by: Chen Ridong <chenridong@xxxxxxxxxxxxxxx>
> > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
>
> There is still differences for global and kswapd reclaim in the shrink_node()
> like kswapd throttling and congestion state management and throttling. Any plan
> to unify them?
Of course. Let fix it step by step, this series is pretty long already.
I originally plan to put this patch in a later series, but as Ridong
pointed out leaving these counter updated but unused looks really
ugly. And this fix is clean and easily to understand I think.