Re: [PATCH 1/4] mm: fix IOCB_DONTCACHE write performance with rate-limited writeback

From: Christoph Hellwig

Date: Mon Apr 06 2026 - 01:44:57 EST


On Thu, Apr 02, 2026 at 08:28:42AM -0400, Jeff Layton wrote:
> > On Wed, Apr 01, 2026 at 03:10:58PM -0400, Jeff Layton wrote:
> > > IOCB_DONTCACHE calls filemap_flush_range() with nr_to_write=LONG_MAX
> > > on every write, which flushes all dirty pages in the written range.
> > >
> > > Under concurrent writers this creates severe serialization on the
> > > writeback submission path, causing throughput to collapse to ~47% of
> > > buffered I/O with multi-second tail latency. Even single-client
> > > sequential writes suffer: on a 512GB file with 256GB RAM, the
> > > aggressive flushing triggers dirty throttling that limits throughput
> > > to 575 MB/s vs 1442 MB/s with rate-limited writeback.
> >
> > I'm not sure the first how you think the first paragraph relate to
> > the second.
> >
>
> The belief is that under heavy parallel write workload on the same
> inode, the writers all end up stacking up on the mapping's xa_lock.
> However as Ritesh points out, I should probably confirm that with perf.

But nr_to_write should not change anything. If .range_start and
.range_end are set in a writeback_iter() loop, writeback_iter will try to
get and writeback every page in the range. Setting nr_to_write in
addition to that could only reduce the amount written if it was less than
the size of the range, which in your patch it isn't.

In fact we should probably have a debug check to never set both a range
and nr_to_write as that combination doesn't make sense.