Re: [PATCH] ext4: avoid full buffer walks for large folio partial writes
From: Matthew Wilcox
Date: Fri Jun 05 2026 - 10:32:16 EST
On Fri, Jun 05, 2026 at 05:02:53PM +0800, Jia Zhu wrote:
> On Wed, Jun 03, 2026 at 07:11:48PM +0100, Matthew Wilcox wrote:
> > Is this a common case for you, or is this something you noticed by
> > inspection?
>
> This was found by our kernel release benchmark. We run libMicro as part
> of that test suite:
>
> https://github.com/rzezeski/libMicro
>
> The regression shows up in buffered write/pwrite/writev overwrite tests
> on ext4 large folios.
Makes sense. I'll assume this can correspond to a reasonable workload.
It certainly seems like something that could exist.
> > Wouldn't you get just as much benefit from this?
>
> Yes. I tested this approach, and it gives almost the same result as my
> original partial-commit helper.
Excellent! Obviously it'd be even better if we didn't have to walk the
leading buffer_heads ... but there's no way to do this with the data
structure we have.
> Agreed. The original ext4_block_write_begin() change was too aggressive.
> Seeking directly to @from also skips the prefix buffers, which makes the
> old side effects harder to prove.
>
> For v2 I plan to drop that part and keep the existing walk from the head.
> The ext4 change would only stop after @to when the folio was already
> uptodate on entry, similar to your block_commit_write() suggestion:
>
> + bool folio_uptodate = folio_test_uptodate(folio);
> +
> for (bh = head, block_start = 0;
> - bh != head || !block_start;
> + (bh != head || !block_start) &&
> + (!folio_uptodate || block_start < to);
> block++, block_start = block_end, bh = bh->b_this_page) {
> ...
> }
Yes, I think that's a good approach.
> So the prefix path and all in-range handling stay unchanged. The only
> skipped work is the tail part after @to, and only for a folio that was
> already uptodate before write_begin() started.
>
> > ... converting ext4 to use iomap instead of buffer heads.
>
> I strongly agree that iomap is the right direction for ext4. The iomap
> buffered write path would make this particular buffer-head walk cost go
> away.
>
> The reason I am still looking at this path is that the regression is
> visible in our LTS upgrade testing from 6.12 to 6.18. It was introduced
> by the ext4 large-folio enablement in v6.16. For example, in our
> libMicro release benchmark with THP always enabled, usecs/call, lower is
> better:
>
> case v6.12 v6.18 regression
> write_u1k 0.609 4.659 +665.0%
> write_u10k 1.408 4.869 +245.8%
Ouch ;-) No wonder you want to address this. Do you recover all the
regression with this fix?
> The iomap conversion is the long-term fix, but it does not help kernels
> which still use the buffer-head buffered write path. I would like to keep
> this as a small regression fix for that path, and make it minimal enough
> to be suitable for stable/LTS backport.
Is it that you're using some ext4 features that aren't supported by
iomap yet? Could you say which ones? That might motivate someone to
prioritise that support.
> Would this v2 direction look OK to you?
Absolutely. Very happy with this approach.