Re: [PATCH v4 06/23] ext4: pass out extent seq counter when mapping da blocks

From: Zhang Yi

Date: Tue Jun 16 2026 - 08:43:37 EST


On 6/16/2026 6:04 PM, Jan Kara wrote:
On Mon 11-05-26 15:23:26, Zhang Yi wrote:
From: Zhang Yi <yi.zhang@xxxxxxxxxx>

The iomap buffered write path does not hold any locks between querying
inode extent mapping information and performing buffered writes. It
relies on the sequence counter saved in the inode to detect stale
mappings.

Now that I'm looking at it again, I've got a bit confused here. Buffered
write path is holding i_rwsem between mapping blocks and using them so
there shouldn't be races. Perhaps you mean buffered *writeback* path? But
then ext4_da_map_blocks() should not ever get called in the writeback path
because it is allocating delayed blocks... So this change looks unnecessary
to me now. Am I missing something?

Honza


Hi Jan,

Thanks for coming back to this series. Sorry for the inaccurate
description in the commit message. However, this change is still needed.

As mentioned in the comment before the ->iomap_valid() callback in
iomap_write_begin(), consider the following scenario — a buffered write
to a dirty unwritten extent, with this concurrent race:

Buffer write (holds i_rwsem) Writeback (no i_rwsem)
ext4_da_map_blocks()
// ext4_es_lookup_extent()
// finds UNWRITTEN extent
ext4_set_iomap()
// type = IOMAP_UNWRITTEN
// validity_cookie = m_seq
ext4_iomap_writepages()
// writeback for unwritten extent
// ext4_convert_unwritten_extents()
// extent tree: UNWRITTEN → WRITTEN
// advancing i_es_seq
__iomap_write_begin()
// ext4_iomap_valid(): cookie != i_es_seq
// iomap invalidated, re-maps
// gets type = IOMAP_MAPPED (WRITTEN)
// iomap_block_needs_zeroing(): FALSE

Without passing out m_seq, the stale IOMAP_UNWRITTEN type from the iomap
would cause __iomap_write_begin()->iomap_block_needs_zeroing() to zero
out blocks that have already been written, corrupting data on partial
writes.

Thanks,
Yi


Commit 07c440e8da8f ("ext4: pass out extent seq counter when mapping
blocks") added the m_seq field to ext4_map_blocks to pass out extent
sequence numbers, but it missed two callsites within
ext4_da_map_blocks(). These callsites are on the delayed allocation
path, which is also used in the iomap buffered write path. Pass out the
sequence counter to ensure stale mappings can be detected.

Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
Reviewed-by: Jan Kara <jack@xxxxxxx>
---
fs/ext4/inode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6c4d9137b279..39577a6b65b9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1929,7 +1929,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map)
ext4_check_map_extents_env(inode);
/* Lookup extent status tree firstly */
- if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, NULL)) {
+ if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, &map->m_seq)) {
map->m_len = min_t(unsigned int, map->m_len,
es.es_len - (map->m_lblk - es.es_lblk));
@@ -1982,7 +1982,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map)
* is held in write mode, before inserting a new da entry in
* the extent status tree.
*/
- if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, NULL)) {
+ if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, &map->m_seq)) {
map->m_len = min_t(unsigned int, map->m_len,
es.es_len - (map->m_lblk - es.es_lblk));
--
2.52.0