[PATCH v3 2/2] ext4: avoid tail write_begin walk for uptodate folios

From: Jia Zhu

Date: Mon Jun 08 2026 - 23:52:41 EST

Ext4 buffered writes into large folios also pay a full buffer_head
walk in ext4_block_write_begin(). For a small overwrite of an existing
cached folio, the folio is already uptodate and the write only needs to
prepare the buffers through the written range. Walking the suffix still
makes the write_begin cost proportional to the folio size.

Before ext4 enabled large folios for regular files, the same loop was
bounded by a single page of buffers. That commit made the existing
full-folio walk visible as a regression for cached small overwrites.

The suffix walk is needed for non-uptodate folios, where ext4 may have
to submit reads for partial blocks, preserve new-buffer cleanup, and run
error zeroing. Keep those folios on the old full walk.

For already-uptodate folios, keep the walk starting at the first buffer
rather than seeking directly to from. This preserves the existing prefix
buffer state handling. Stop once block_start reaches the end of the
write range, because the skipped suffix would only repeat the
outside-range uptodate handling for buffers beyond @to.

On current master, the libMicro ext4 large-folio overwrite test shows
the following full-series result. Results are median usecs/call over 10
runs, lower is better:

case nofix this series improvement
write_u1k 1.418 0.3405 76.0%
write_u10k 1.887 0.4175 77.9%
pwrite_u1k 1.6775 0.3390 79.8%
pwrite_u10k 1.9035 0.4130 78.3%

Fixes: 7ac67301e82f0 ("ext4: enable large folio for regular file")
Cc: stable@xxxxxxxxxxxxxxx # v6.16+
Reviewed-by: Jan Kara <jack@xxxxxxx>
Signed-off-by: Jia Zhu <zhujia.zj@xxxxxxxxxxxxx>
---
fs/ext4/inode.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c2c2d6ac7f3d1..0fccb8f6a2116 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1182,6 +1182,7 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
int nr_wait = 0;
int i;
bool should_journal_data = ext4_should_journal_data(inode);
+ bool folio_uptodate = folio_test_uptodate(folio);

BUG_ON(!folio_test_locked(folio));
BUG_ON(to > folio_size(folio));
@@ -1193,13 +1194,13 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
head = create_empty_buffers(folio, blocksize, 0);
block = EXT4_PG_TO_LBLK(inode, folio->index);

- for (bh = head, block_start = 0; bh != head || !block_start;
+ for (bh = head, block_start = 0;
+ block_start < to || (!folio_uptodate && bh != head);
block++, block_start = block_end, bh = bh->b_this_page) {
block_end = block_start + blocksize;
if (block_end <= from || block_start >= to) {
- if (folio_test_uptodate(folio)) {
+ if (folio_uptodate)
set_buffer_uptodate(bh);
- }
continue;
}
if (WARN_ON_ONCE(buffer_new(bh)))
@@ -1220,7 +1221,7 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
if (should_journal_data)
do_journal_get_write_access(handle,
inode, bh);
- if (folio_test_uptodate(folio)) {
+ if (folio_uptodate) {
/*
* Unlike __block_write_begin() we leave
* dirtying of new uptodate buffers to
@@ -1237,7 +1238,7 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
continue;
}
}
- if (folio_test_uptodate(folio)) {
+ if (folio_uptodate) {
set_buffer_uptodate(bh);
continue;
}
--
2.20.1