[PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback

From: Jeff Layton

Date: Wed Apr 01 2026 - 15:11:39 EST


When the PAGECACHE_TAG_WRITEBACK tag clears after a round of writeback
completes, all concurrent IOCB_DONTCACHE writers see the tag clear
simultaneously and submit proportional flushes at once — a thundering
herd that causes p99.9 tail latency spikes.

Add an AS_DONTCACHE_FLUSHING flag to the address_space and use
test_and_set_bit() to ensure at most one IOCB_DONTCACHE writer
flushes at a time. Other writers that find the bit set skip their
flush entirely. The bit is cleared when the flush completes.

Together with the existing skip-if-busy check on
PAGECACHE_TAG_WRITEBACK (which provides temporal rate limiting by
skipping flushes while prior writeback is still draining), this
creates a two-level guard: the writeback tag paces flush frequency
to match device speed, while the atomic flag prevents the thundering
herd at tag-clear transitions.

Additionally, add a dirty pressure escape hatch: when dirty pages
exceed 75% of the dirty_ratio threshold, bypass the WRITEBACK tag
skip and attempt to flush anyway. Under heavy multi-writer load,
the skip-if-busy check can cause dirty pages to accumulate (most
writers skip because writeback is always in progress), eventually
triggering balance_dirty_pages() throttling with severe tail latency.
By forcing extra flushes when dirty pressure is high, dontcache
writers help drain dirty pages before the throttle threshold is hit.

Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
include/linux/pagemap.h | 1 +
mm/filemap.c | 36 +++++++++++++++++++++++++++++-------
2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 31a848485ad9d9850d37185418349b89e6efe420..e71bf75f6c22d0da5330c17c6e525cb12d254dfe 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -210,6 +210,7 @@ enum mapping_flags {
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
account usage to user cgroups */
+ AS_DONTCACHE_FLUSHING = 11, /* dontcache writeback in progress */
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,
diff --git a/mm/filemap.c b/mm/filemap.c
index af2024b736bef74571cc22ab7e3cde2c8e872efe..1b5577bd4eda8ad8ee182e58acd50d99f0a8f9f5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -444,11 +444,21 @@ EXPORT_SYMBOL_GPL(filemap_flush_range);
* @end: last byte offset (inclusive) for writeback
* @nr_written: number of bytes just written by the caller
*
- * Rate-limited writeback for IOCB_DONTCACHE writes. Skips the flush
- * entirely if writeback is already in progress on the mapping (skip-if-busy),
- * and when flushing, caps nr_to_write to the number of pages just written
- * (proportional cap). Together these avoid writeback contention between
- * concurrent writers and prevent I/O bursts that starve readers.
+ * Rate-limited writeback for IOCB_DONTCACHE writes. Uses three guards to
+ * avoid writeback contention between concurrent writers:
+ *
+ * 1. Skip-if-busy: if writeback is already in progress on the mapping
+ * (PAGECACHE_TAG_WRITEBACK set), skip the flush — unless dirty pages
+ * are approaching the dirty_ratio threshold, in which case flush anyway
+ * to help drain before balance_dirty_pages() throttles all writers.
+ *
+ * 2. Atomic flush guard: use test_and_set_bit(AS_DONTCACHE_FLUSHING) so
+ * that at most one dontcache writer flushes at a time, preventing a
+ * thundering herd when the writeback tag clears and multiple writers
+ * try to flush simultaneously.
+ *
+ * 3. Proportional cap: cap nr_to_write to the number of pages just written,
+ * preventing any single flush from starving concurrent readers.
*
* Return: %0 on success, negative error code otherwise.
*/
@@ -456,13 +466,25 @@ int filemap_dontcache_writeback_range(struct address_space *mapping,
loff_t start, loff_t end, ssize_t nr_written)
{
long nr;
+ int ret;
+
+ if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
+ unsigned long thresh, bg_thresh, dirty;

- if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
+ global_dirty_limits(&bg_thresh, &thresh);
+ dirty = global_node_page_state(NR_FILE_DIRTY);
+ if (dirty < thresh * 3 / 4)
+ return 0;
+ }
+
+ if (test_and_set_bit(AS_DONTCACHE_FLUSHING, &mapping->flags))
return 0;

nr = (nr_written + PAGE_SIZE - 1) >> PAGE_SHIFT;
- return filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr,
+ ret = filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr,
WB_REASON_BACKGROUND);
+ clear_bit(AS_DONTCACHE_FLUSHING, &mapping->flags);
+ return ret;
}
EXPORT_SYMBOL_GPL(filemap_dontcache_writeback_range);


--
2.53.0