Re: [PATCH 1/4] mm: fix IOCB_DONTCACHE write performance with rate-limited writeback

From: Christoph Hellwig

Date: Thu Apr 02 2026 - 01:22:18 EST


On Wed, Apr 01, 2026 at 03:10:58PM -0400, Jeff Layton wrote:
> IOCB_DONTCACHE calls filemap_flush_range() with nr_to_write=LONG_MAX
> on every write, which flushes all dirty pages in the written range.
>
> Under concurrent writers this creates severe serialization on the
> writeback submission path, causing throughput to collapse to ~47% of
> buffered I/O with multi-second tail latency. Even single-client
> sequential writes suffer: on a 512GB file with 256GB RAM, the
> aggressive flushing triggers dirty throttling that limits throughput
> to 575 MB/s vs 1442 MB/s with rate-limited writeback.

I'm not sure the first how you think the first paragraph relate to
the second.

> Replace the filemap_flush_range() call in generic_write_sync() with a
> new filemap_dontcache_writeback_range() that uses two rate-limiting
> mechanisms:
>
> 1. Skip-if-busy: check mapping_tagged(PAGECACHE_TAG_WRITEBACK)
> before flushing. If writeback is already in progress on the
> mapping, skip the flush entirely. This eliminates writeback
> submission contention between concurrent writers.

Makes sense.

> 2. Proportional cap: when flushing does occur, cap nr_to_write to
> the number of pages just written. This prevents any single
> write from triggering a large flush that would starve concurrent
> readers.

This doesn't make any sense at all.
filemap_flush_range/filemap_writeback always caps the number of written
pages to the range passed in. What do you think is the change here?

> + return filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr,
> + WB_REASON_BACKGROUND);

filemap_writeback only has 5 arguments in any tree I've looked at
including linux-next.