Re: [PATCH] ext4: avoid full buffer walks for large folio partial writes
From: Matthew Wilcox
Date: Wed Jun 03 2026 - 14:15:07 EST
On Wed, Jun 03, 2026 at 09:48:00PM +0800, Jia Zhu wrote:
> Ext4 buffered writes into large folios still walk every buffer_head in the
> folio in ext4_block_write_begin() and again in block_commit_write(). Before
> regular files used large folios this was cheap, but a large folio can
> contain hundreds of buffer_heads. Small overwrites of an existing large
> folio therefore pay work proportional to the folio size instead of the
> write size.
Is this a common case for you, or is this something you noticed by
inspection?
> Start the ext4 write_begin walk at the first buffer that overlaps the
> write. For already-uptodate large folio overwrites, add a partial commit
> path which marks only the written buffers uptodate and dirty. Leave
> non-uptodate folios on the old full-buffer commit path so BH_New cleanup
> and folio-uptodate discovery are preserved.
Wouldn't you get just as much benefit from this?
+++ b/fs/buffer.c
@@ -2096,6 +2096,7 @@ void block_commit_write(struct folio *folio, size_t from,
size_t to)
{
size_t block_start, block_end;
bool partial = false;
+ bool uptodate = folio_test_uptodate(folio);
unsigned blocksize;
struct buffer_head *bh, *head;
@@ -2118,6 +2119,8 @@ void block_commit_write(struct folio *folio, size_t from, size_t to)
clear_buffer_new(bh);
block_start = block_end;
+ if (uptodate && block_start >= to)
+ break;
bh = bh->b_this_page;
} while (bh != head);
> @@ -1191,17 +1191,18 @@ int ext4_block_write_begin(handle_t *handle, struct folio *folio,
> head = folio_buffers(folio);
> if (!head)
> head = create_empty_buffers(folio, blocksize, 0);
> - block = EXT4_PG_TO_LBLK(inode, folio->index);
> + if (from == to)
> + return 0;
> + block_start = round_down(from, blocksize);
> + block = EXT4_PG_TO_LBLK(inode, folio->index) +
> + (block_start >> inode->i_blkbits);
> + bh = head;
> + for (i = 0; i < block_start; i += blocksize)
> + bh = bh->b_this_page;
>
> - for (bh = head, block_start = 0; bh != head || !block_start;
> - block++, block_start = block_end, bh = bh->b_this_page) {
> + for (; block_start < to;
> + block++, block_start = block_end, bh = bh->b_this_page) {
> block_end = block_start + blocksize;
> - if (block_end <= from || block_start >= to) {
> - if (folio_test_uptodate(folio)) {
> - set_buffer_uptodate(bh);
> - }
> - continue;
> - }
> if (WARN_ON_ONCE(buffer_new(bh)))
> clear_buffer_new(bh);
> if (!buffer_mapped(bh)) {
>
I'm unconvinced that this is safe ... but all of this is a distraction
form what we should really be doing which is converting ext4 to use
iomap instead of buffer heads.