Re: [PATCH v2 08/12] mm/mglru: simplify and improve dirty writeback handling

From: Kairui Song

Date: Sun Mar 29 2026 - 04:21:44 EST

Forwarding sashiko review:

https://sashiko.dev/#/message/20260329-mglru-reclaim-v2-8-b53a3678513c%40tencent.com

> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 8de5c8d5849e..17b5318fad39 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> [ ... ]
> > @@ -4633,21 +4632,6 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
> > return true;
> > }
> >
> > - dirty = folio_test_dirty(folio);
> > - writeback = folio_test_writeback(folio);
> > - if (type == LRU_GEN_FILE && dirty) {
> > - sc->nr.file_taken += delta;
> > - if (!writeback)
> > - sc->nr.unqueued_dirty += delta;
> > - }
> > -
> > - /* waiting for writeback */
> > - if (writeback || (type == LRU_GEN_FILE && dirty)) {
> > - gen = folio_inc_gen(lruvec, folio, true);
> > - list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
> > - return true;
> > - }
> > -
> > return false;
> > }
>
> Could removing this proactive promotion cause a livelock and premature OOM
> during GFP_NOIO or GFP_NOFS direct reclaim?
> Without this check, dirty file folios fall through to isolate_folio(). If
> the reclaim does not have __GFP_IO, isolate_folio() will skip them.
> These skipped folios are spliced back to the front of the oldest generation
> list in scan_folios(). Because isolate_folio() isolates 0 folios,
> evict_folios() returns early.
> This completely bypasses the new wakeup_flusher_threads() call, leaving the
> dirty folios unqueued for writeback, and the oldest generation can never
> advance.

This is a nice found. For GFP_NOIO (or actually !__GFP_IO), we also need
to active and set reclaim for the dirty folios. It's a narrow case though.

Following update should work:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8170aee096e9..342ba3afe77c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4641,8 +4641,7 @@ static bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct sca

/* swap constrained */
if (!(sc->gfp_mask & __GFP_IO) &&
- (folio_test_dirty(folio) ||
- (folio_test_anon(folio) && !folio_test_swapcache(folio))))
+ (folio_test_anon(folio) && !folio_test_swapcache(folio)))
return false;

> [ ... ]
> > @@ -4858,12 +4840,27 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > return scanned;
> > retry:
> > reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false, memcg);
> > - sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
> > sc->nr_reclaimed += reclaimed;
> > trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
> > type_scanned, reclaimed, &stat, sc->priority,
> > type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
> >
> > + /*
> > + * If too many file cache in the coldest generation can't be evicted
> > + * due to being dirty, wake up the flusher.
> > + */
> > + if (stat.nr_unqueued_dirty == isolated) {
>
> Is the isolated variable stale when evaluated on the retry path?
> If evict_folios() jumps back to the retry label, shrink_folio_list()
> processes a smaller list of only clean folios. The isolated variable retains
> the size of the original list, while stat.nr_unqueued_dirty can only be as
> large as the new, smaller list.
> Does this logically impossible condition cause any unintended behavior, or
> should the check be moved outside the retry loop to avoid confusion?

This is fine, stat.nr_unqueued_dirty is always smaller than isolated. The
"retry" label above only used to handle some folios that are failed to be
reclaimed after isolation. Meanwhile I do think we should clean up this
retry logic as it will also confuse the tracepoint. Better do it later.