Re: [RFC PATCH] mm: vmscan: fix dirty folios throttling on cgroup v1 for MGLRU
From: Kairui Song
Date: Wed Mar 25 2026 - 09:47:42 EST
On Wed, Mar 25, 2026 at 09:20:55PM +0800, Baolin Wang wrote:
> Hi Kairui,
>
> On 3/25/26 8:07 PM, Kairui Song wrote:
> > On Wed, Mar 25, 2026 at 07:50:40PM +0800, Baolin Wang wrote:
> > > The balance_dirty_pages() won't do the dirty folios throttling on cgroupv1.
> > > See commit 9badce000e2c ("cgroup, writeback: don't enable cgroup writeback
> > > on traditional hierarchies").
> > >
> > > Moreover, after commit 6b0dfabb3555 ("fs: Remove aops->writepage"), we no
> > > longer attempt to write back filesystem folios through reclaim.
> > >
> > > On large memory systems, the flusher may not be able to write back quickly
> > > enough. Consequently, MGLRU will encounter many folios that are already
> > > under writeback. Since we cannot reclaim these dirty folios, the system
> > > may run out of memory and trigger the OOM killer.
> > >
> > > Hence, for cgroup v1, let's throttle reclaim after waking up the flusher,
> > > which is similar to commit 81a70c21d917 ("mm/cgroup/reclaim: fix dirty
> > > pages throttling on cgroup v1"), to avoid unnecessary OOM.
> > >
> > > The following test program can easily reproduce the OOM issue. With this patch
> > > applied, the test passes successfully.
> > >
> > > $mkdir /sys/fs/cgroup/memory/test
> > > $echo 256M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
> > > $echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs
> > > $dd if=/dev/zero of=/mnt/data.bin bs=1M count=800
> > >
> > > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> > > ---
> > > mm/vmscan.c | 13 ++++++++++++-
> > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 33287ba4a500..a9648269fae8 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -5036,9 +5036,20 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
> > > * If too many file cache in the coldest generation can't be evicted
> > > * due to being dirty, wake up the flusher.
> > > */
> > > - if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken)
> > > + if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken) {
> > > + struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> > > +
> > > wakeup_flusher_threads(WB_REASON_VMSCAN);
> > > + /*
> > > + * For cgroupv1 dirty throttling is achieved by waking up
> > > + * the kernel flusher here and later waiting on folios
> > > + * which are in writeback to finish (see shrink_folio_list()).
> > > + */
> > > + if (!writeback_throttling_sane(sc))
> > > + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
> > > + }
> > > +
> > > /* whether this lruvec should be rotated */
> > > return nr_to_scan < 0;
> > > }
> >
> > Hi Baolin
> >
> > Interesting I want to fix this too, after or with:
> > https://lore.kernel.org/linux-mm/20260318-mglru-reclaim-v1-0-2c46f9eb0508@xxxxxxxxxxx/
>
> Thanks for taking a look.
>
> >
> > With current fix you posted, MGLRU's dirty throttling is still
> > a bit different from active / inactive LRU. In fact MGLRU
> > treat dirty folios quite differently causing many other issues too,
> > e.g. it's much more likely for dirty folios to stuck at the tail
> > for MGLRU so simply apply the throttling could cause too
> > aggressive throttling. Or batch is too large to trigger the
> > throttling.
>
> Thanks for sharing this.
Hi Baolin,
>
> > So I'm planning to add below patch to V2 of that series (also this
> > is suggested by Ridong), how do you think? There are several
> > other throttling things to be fixed too, more than just the
> > V1 support. I can have your suggested-by too.
>
> But I still think this fix deserves its own commit, because this is indeed
> fixing a real issue that I ran into. Even if the throttling isn't perfect
> for cgroup v1, it aligns with the legacy-LRU behavior and is essential to
> avoid premature OOMs firstly. MGLRU dirty folio handling improvement can be
> done as a separate optimization in your series.
>
> Anyway, let's also wait for more feedback from others.
>
Sure, fixing this first is fine to me, just saying that you may
still see unexpected throttling or ineffective throttling with this.
This is no conflict between these two approach. I can rebase that
series on top of yours, and that series would help to solve the
rest of issues.