Re: [PATCH 09/10] ext4: move zero partial block range functions out of active handle
From: Zhang Yi
Date: Mon Mar 23 2026 - 23:14:12 EST
On 3/24/2026 4:17 AM, Jan Kara wrote:
> On Tue 10-03-26 09:41:00, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>
>> Move ext4_block_zero_eof() and ext4_zero_partial_blocks() calls out of
>> the active handle context, making them independent operations. This is
>> safe because it still ensures data is updated before metadata for
>> data=ordered mode and data=journal mode because we still zero data and
>> ordering data before modifying the metadata.
>>
>> This change is required for iomap infrastructure conversion because the
>> iomap buffered I/O path does not use the same journal infrastructure for
>> partial block zeroing. The lock ordering of folio lock and starting
>> transactions is "folio lock -> transaction start", which is opposite of
>> the current path. Therefore, zeroing partial blocks cannot be performed
>> under the active handle.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
>
> So in this patch you also move ext4_zero_partial_blocks() before the
> transaction start in ext4_zero_range() and ext4_punch_hole(). However
> cannot these operations in theory fail with error such as EDQUOT or ENOSPC
> when they need to grow the extent tree as a result of the operation? In
> that case we'd return such error but partial blocks are already zeroed out
> which is a problem...
Thank you for review this patch set!
Let me see, I believe you are referring to the cases where
ext4_xxx_remove_space() in ext4_punch_hole() and
ext4_alloc_file_blocks() in ext4_zero_range() return normal error codes
such as EDQUOT or ENOSPC.
In ext4_punch_hole(), we first zero out the partial blocks at the
beginning and end of the punch range, and then release the aligned
blocks in the middle. If errors occur during the middle of the
hole-punching operation, there will left some data in the middle of the
punch range, this is weird. Conversely, if the zeroization is performed
in sequence, the result is zeroization at the front and the retention of
valid data at the rear, which is acceptable. right? Besides, this
problem seems not occur in ext4_zero_range() because
ext4_zero_partial_blocks() is still called after
ext4_alloc_file_blocks() after this patch.
Regarding this, I found that the result is the same whether
ext4_alloc_file_blocks() is called inside transaction start or not. The
partial blocks at the head and tail have already been zeroed out after
calling ext4_zero_partial_blocks(), am I missing some thing? Or do you
suggest that we should also move ext4_zero_partial_blocks() to execute
after ext4_xxx_remove_space() in ext4_punch_hole()?
> OTOH in these cases we don't need to order the data
> at all so in principle we shouldn't need to move the zeroing earlier
> here. Am I missing something?
>
Moving the zero partial block before the start handle is not only
because they do not need to order the data, but rather to avoid
deadlock. iomap_zero_range() acquires the folio lock, and if it is
called after ext4_journal_start(), the locking order would be
transaction start -> folio lock, which is the opposite of the locking
order used in the writeback process. If these two operations are
performed concurrently (), it could potentially result in an ABBA
deadlock.
In the subsequent iomap patch [1], I will add a WARNING and a comment
in ext4_iomap_block_zero_range(I don't see anything can prevent this)
to ensure it is not called under an active handle.
[1] https://lore.kernel.org/linux-ext4/20260203062523.3869120-18-yi.zhang@xxxxxxxxxx/
Thanks,
Yi.
> Honza
>
>> @@ -4722,25 +4724,18 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>> if (IS_ALIGNED(offset | end, blocksize))
>> return ret;
>>
>> - /*
>> - * In worst case we have to writeout two nonadjacent unwritten
>> - * blocks and update the inode
>> - */
>> - credits = (2 * ext4_ext_index_trans_blocks(inode, 2)) + 1;
>> - if (ext4_should_journal_data(inode))
>> - credits += 2;
>> - handle = ext4_journal_start(inode, EXT4_HT_MISC, credits);
>> + /* Zero out partial block at the edges of the range */
>> + ret = ext4_zero_partial_blocks(inode, offset, len);
>> + if (ret)
>> + return ret;
>> +
>> + handle = ext4_journal_start(inode, EXT4_HT_MISC, 1);
>> if (IS_ERR(handle)) {
>> ret = PTR_ERR(handle);
>> ext4_std_error(inode->i_sb, ret);
>> return ret;
>> }
>>
>> - /* Zero out partial block at the edges of the range */
>> - ret = ext4_zero_partial_blocks(inode, offset, len);
>> - if (ret)
>> - goto out_handle;
>> -
>> if (new_size)
>> ext4_update_inode_size(inode, new_size);
>> ret = ext4_mark_inode_dirty(handle, inode);
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index d5b783a7c814..5288d36b0f09 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -4443,8 +4443,12 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>> if (ret)
>> return ret;
>>
>> + ret = ext4_zero_partial_blocks(inode, offset, length);
>> + if (ret)
>> + return ret;
>> +
>> if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
>> - credits = ext4_chunk_trans_extent(inode, 2);
>> + credits = ext4_chunk_trans_extent(inode, 0);
>> else
>> credits = ext4_blocks_for_truncate(inode);
>> handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
>> @@ -4454,10 +4458,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>> return ret;
>> }
>>
>> - ret = ext4_zero_partial_blocks(inode, offset, length);
>> - if (ret)
>> - goto out_handle;
>> -
>> /* If there are blocks to remove, do it */
>> start_lblk = EXT4_B_TO_LBLK(inode, offset);
>> end_lblk = end >> inode->i_blkbits;