Re: [PATCH v4 14/23] ext4: implement partial block zero range path using iomap
From: Jan Kara
Date: Tue Jun 16 2026 - 08:29:35 EST
On Mon 11-05-26 15:23:34, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>
> Introduce a new iomap_ops instance, ext4_iomap_zero_ops, along with
> ext4_iomap_block_zero_range() to implement block zeroing via the iomap
> infrastructure for ext4.
>
> ext4_iomap_block_zero_range() calls iomap_zero_range() with
> ext4_iomap_zero_begin() as the callback. The callback locates and zeros
> out either a mapped partial block or a dirty, unwritten partial block.
>
> Important constraints:
>
> Zeroing out under an active journal handle can cause deadlock, because
> the order of acquiring the folio lock and starting a handle is
> inconsistent with the iomap writeback path.
>
> Therefore, ext4_iomap_block_zero_range():
> - Must NOT be called under an active handle.
> - Cannot rely on data=ordered mode to ensure zeroed data persistence
> before updating i_disksize (for the cases of post-EOF append write,
> post-EOF fallocate, and truncate up). In subsequent patches, we will
> address this by synchronizing commit I/O but doesn't waiting for
> completion, and updating i_disksize to i_size only after the zeroed
> data has been written back.
>
> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
> ---
> fs/ext4/inode.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 92 insertions(+)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c6fe42d012fc..e0dae2501292 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4101,6 +4101,51 @@ static int ext4_iomap_buffered_da_write_end(struct inode *inode, loff_t offset,
> return 0;
> }
>
> +static int ext4_iomap_zero_begin(struct inode *inode,
> + loff_t offset, loff_t length, unsigned int flags,
> + struct iomap *iomap, struct iomap *srcmap)
> +{
> + struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);
This looks like a layering violation to me. I don't think you can safely
assume the iomap you're passed is a part of iomap_iter...
> + struct ext4_map_blocks map;
> + u8 blkbits = inode->i_blkbits;
> + unsigned int iomap_flags = 0;
> + int ret;
> +
> + ret = ext4_emergency_state(inode->i_sb);
> + if (unlikely(ret))
> + return ret;
> +
> + if (WARN_ON_ONCE(!(flags & IOMAP_ZERO)))
> + return -EINVAL;
> +
> + ret = ext4_iomap_map_blocks(inode, offset, length, NULL, &map);
> + if (ret < 0)
> + return ret;
> +
> + /*
> + * Look up dirty folios for unwritten mappings within EOF. Providing
> + * this bypasses the flush iomap uses to trigger extent conversion
> + * when unwritten mappings have dirty pagecache in need of zeroing.
> + */
> + if (map.m_flags & EXT4_MAP_UNWRITTEN) {
> + loff_t start = ((loff_t)map.m_lblk) << blkbits;
> + loff_t end = ((loff_t)map.m_lblk + map.m_len) << blkbits;
> +
> + iomap_fill_dirty_folios(iter, &start, end, &iomap_flags);
> + if ((start >> blkbits) < map.m_lblk + map.m_len)
> + map.m_len = (start >> blkbits) - map.m_lblk;
> + }
... and you need access to iter only for this which seems to be really a
hack that's trying to outsmart the iomap code. I have to admit I don't
fully understand what you are trying to achieve here. Are you trying to
avoid flushing of the range that will be zeroed out?
> + ret = iomap_zero_range(inode, from, length, did_zero,
> + &ext4_iomap_zero_ops, &ext4_iomap_write_ops,
> + NULL);
> + if (ret)
> + return ret;
> +
> + /*
> + * TODO: The iomap does not distinguish between different types of
> + * zeroing and always sets zero_written if a zeroing operation is
> + * performed, which may result in unnecessary order operations.
> + */
Is this still true after your fix to did_zero handling?
> + if (did_zero && zero_written)
> + *zero_written = *did_zero;
> +
> + return 0;
> +}
> +
> /*
> * Zeros out a mapping of length 'length' starting from file offset
> * 'from'. The range to be zero'd must be contained with in one block.
Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR