Re: [PATCH] ext4: avoid full buffer walks for large folio partial writes
From: Jia Zhu
Date: Fri Jun 05 2026 - 05:17:51 EST
On Wed, Jun 03, 2026 at 07:11:48PM +0100, Matthew Wilcox wrote:
> Is this a common case for you, or is this something you noticed by
> inspection?
This was found by our kernel release benchmark. We run libMicro as part
of that test suite:
https://github.com/rzezeski/libMicro
The regression shows up in buffered write/pwrite/writev overwrite tests
on ext4 large folios.
> Wouldn't you get just as much benefit from this?
Yes. I tested this approach, and it gives almost the same result as my
original partial-commit helper.
I agree this is a better direction for block_commit_write(). It keeps the
existing buffer-head state handling and only stops the tail walk after an
already-uptodate folio has been committed through @to. That removes the
main large-folio cost in our small-overwrite benchmark while keeping the
change much closer to the old code.
> I'm unconvinced that this is safe ...
Agreed. The original ext4_block_write_begin() change was too aggressive.
Seeking directly to @from also skips the prefix buffers, which makes the
old side effects harder to prove.
For v2 I plan to drop that part and keep the existing walk from the head.
The ext4 change would only stop after @to when the folio was already
uptodate on entry, similar to your block_commit_write() suggestion:
+ bool folio_uptodate = folio_test_uptodate(folio);
+
for (bh = head, block_start = 0;
- bh != head || !block_start;
+ (bh != head || !block_start) &&
+ (!folio_uptodate || block_start < to);
block++, block_start = block_end, bh = bh->b_this_page) {
...
}
So the prefix path and all in-range handling stay unchanged. The only
skipped work is the tail part after @to, and only for a folio that was
already uptodate before write_begin() started.
> ... converting ext4 to use iomap instead of buffer heads.
I strongly agree that iomap is the right direction for ext4. The iomap
buffered write path would make this particular buffer-head walk cost go
away.
The reason I am still looking at this path is that the regression is
visible in our LTS upgrade testing from 6.12 to 6.18. It was introduced
by the ext4 large-folio enablement in v6.16. For example, in our
libMicro release benchmark with THP always enabled, usecs/call, lower is
better:
case v6.12 v6.18 regression
write_u1k 0.609 4.659 +665.0%
write_u10k 1.408 4.869 +245.8%
The iomap conversion is the long-term fix, but it does not help kernels
which still use the buffer-head buffered write path. I would like to keep
this as a small regression fix for that path, and make it minimal enough
to be suitable for stable/LTS backport.
Would this v2 direction look OK to you?