[PATCH v3 0/2] ext4: avoid tail walks for cached large-folio writes
From: Jia Zhu
Date: Mon Jun 08 2026 - 23:54:47 EST
Hi,
This series addresses a buffered-write regression we found during our
v6.12 -> v6.18 LTS upgrade testing on ext4.
The regression is in the remaining buffer_head path. A small overwrite
of an already cached, uptodate large folio still walks every buffer_head
attached to the folio in both write_begin and write_end. With order-0
folios this was bounded by the page size. After ext4 enabled large
folios for regular files, the same loops became proportional to the
folio size.
I agree that converting ext4 buffered I/O to iomap is the right long-term
direction, and that would avoid this problem. This series is meant as a
small fix for current and LTS kernels that still use the buffer_head path.
Patch 1 follows Willy's suggestion for block_commit_write(): if the folio
was already uptodate on entry, stop the commit walk once the copied range
has been processed.
Patch 2 applies the same conservative shape to ext4_block_write_begin().
It keeps walking from the first buffer, so prefix buffer state handling is
unchanged, and only skips the suffix for folios that were already
uptodate on entry.
The workload is from libMicro, which we use in kernel release testing:
https://github.com/rzezeski/libMicro
The table below includes the v6.12 baseline from the same release
benchmark. The v6.12 and v6.18 columns were run with THP=always. The
last column is v6.18 with this series applied. Results are usecs/call,
lower is better, and the improvement is relative to unpatched v6.18.
case v6.12 v6.18 v6.18 + series improvement
write_u1k 0.609 4.659 0.528 88.7%
write_u10k 1.408 4.869 0.809 83.4%
pwrite_u1k 0.609 4.659 0.538 88.5%
pwrite_u10k 1.399 4.889 0.819 83.2%
writev_u1k 2.238 5.277 1.179 77.7%
writev_u10k 11.057 8.029 4.219 47.5%
For the cases that regressed from v6.12 to v6.18 in this test, this
series brings the v6.18 numbers back below the v6.12 cost.
Previous versions:
v2:
https://lore.kernel.org/r/20260608120131.45146-1-zhujia.zj@xxxxxxxxxxxxx
v1:
https://lore.kernel.org/r/20260603134800.25155-1-zhujia.zj@xxxxxxxxxxxxx
Changes since v2:
- simplify the ext4 loop condition as suggested by Jan;
- add Reviewed-by tags from Jan;
- add stable Cc tags.
Changes since v1:
- replace the ext4 seek-to-@from optimization with a conservative tail
break that preserves prefix buffer handling;
- add the block_commit_write() tail break suggested by Willy;
- add v6.12 and v6.18 benchmark results for the full series.
Jia Zhu (2):
fs/buffer: avoid tail commit walk for uptodate folios
ext4: avoid tail write_begin walk for uptodate folios
fs/buffer.c | 3 +++
fs/ext4/inode.c | 11 ++++++-----
2 files changed, 9 insertions(+), 5 deletions(-)
--
2.20.1