Re: [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback
From: Jeff Layton
Date: Thu Apr 02 2026 - 08:50:06 EST
On Wed, 2026-04-01 at 22:27 -0700, Christoph Hellwig wrote:
> On Wed, Apr 01, 2026 at 03:10:59PM -0400, Jeff Layton wrote:
> > When the PAGECACHE_TAG_WRITEBACK tag clears after a round of writeback
> > completes, all concurrent IOCB_DONTCACHE writers see the tag clear
> > simultaneously and submit proportional flushes at once — a thundering
> > herd that causes p99.9 tail latency spikes.
> >
> > Add an AS_DONTCACHE_FLUSHING flag to the address_space and use
> > test_and_set_bit() to ensure at most one IOCB_DONTCACHE writer
> > flushes at a time. Other writers that find the bit set skip their
> > flush entirely. The bit is cleared when the flush completes.
>
> This sounds like a bad reimplementation of the single writeback thread
> :)
>
> Have you considered stopping to do in-caller writeback for
> IOCB_DONTCACHE vs just leaving it to the writeback daeon?
>
> Either by totally disabling the writeback and just leaving the
> dropbehind bit, or by queuing up wb_writeback_work instances for
> the ranges, or by just increasing the pressure for the writeback
> daemon. Note that with all schemes including the one in this patch
> we might eventually run into writeback scalability limits, which
> will require multiple writeback workers.
I did test a "dropbehind" mode that just set the dropbehind bit without
doing the flush at the end of the write. It was better than stock
dontcache but the tail latencies were still pretty bad.
I think having each writer do some writeback submission work makes a
lot of sense. It helps keep the dirty pages below the dirty thresholds
and doesn't seem to tax each writing task _too_ much. The trick is
avoiding lock contention while doing it.
I think what would be ideal would be to have some (lockless) mechanism
to say "there is enough data touched by the range just written to kick
off a write that's a suitable size for the backing store". Each writer
could check that and then kick off writeback for an approprite range.
I think this even could be beneficial in the normal buffered write
codepath too.
Anyway, I'll play around with this idea some more and come back with a
v2.
Thanks for the review!
--
Jeff Layton <jlayton@xxxxxxxxxx>