Re: [PATCH 09/10] ext4: move zero partial block range functions out of active handle
From: Zhang Yi
Date: Mon Mar 23 2026 - 23:14:31 EST
On 3/24/2026 11:10 AM, Zhang Yi wrote:
> On 3/24/2026 4:17 AM, Jan Kara wrote:
>> On Tue 10-03-26 09:41:00, Zhang Yi wrote:
>>> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>>
>>> Move ext4_block_zero_eof() and ext4_zero_partial_blocks() calls out of
>>> the active handle context, making them independent operations. This is
>>> safe because it still ensures data is updated before metadata for
>>> data=ordered mode and data=journal mode because we still zero data and
>>> ordering data before modifying the metadata.
>>>
>>> This change is required for iomap infrastructure conversion because the
>>> iomap buffered I/O path does not use the same journal infrastructure for
>>> partial block zeroing. The lock ordering of folio lock and starting
>>> transactions is "folio lock -> transaction start", which is opposite of
>>> the current path. Therefore, zeroing partial blocks cannot be performed
>>> under the active handle.
>>>
>>> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>
>> So in this patch you also move ext4_zero_partial_blocks() before the
>> transaction start in ext4_zero_range() and ext4_punch_hole(). However
>> cannot these operations in theory fail with error such as EDQUOT or ENOSPC
>> when they need to grow the extent tree as a result of the operation? In
>> that case we'd return such error but partial blocks are already zeroed out
>> which is a problem...
>
> Thank you for review this patch set!
>
> Let me see, I believe you are referring to the cases where
> ext4_xxx_remove_space() in ext4_punch_hole() and
> ext4_alloc_file_blocks() in ext4_zero_range() return normal error codes
> such as EDQUOT or ENOSPC.
>
> In ext4_punch_hole(), we first zero out the partial blocks at the
> beginning and end of the punch range, and then release the aligned
> blocks in the middle. If errors occur during the middle of the
> hole-punching operation, there will left some data in the middle of the
> punch range, this is weird. Conversely, if the zeroization is performed
> in sequence, the result is zeroization at the front and the retention of
> valid data at the rear, which is acceptable. right? Besides, this
> problem seems not occur in ext4_zero_range() because
> ext4_zero_partial_blocks() is still called after
> ext4_alloc_file_blocks() after this patch.
>
> Regarding this, I found that the result is the same whether
> ext4_alloc_file_blocks() is called inside transaction start or not. The
> partial blocks at the head and tail have already been zeroed out after
> calling ext4_zero_partial_blocks(), am I missing some thing? Or do you
> suggest that we should also move ext4_zero_partial_blocks() to execute
> after ext4_xxx_remove_space() in ext4_punch_hole()?
>
>> OTOH in these cases we don't need to order the data
>> at all so in principle we shouldn't need to move the zeroing earlier
>> here. Am I missing something?
>>
>
> Moving the zero partial block before the start handle is not only
> because they do not need to order the data, but rather to avoid
> deadlock. iomap_zero_range() acquires the folio lock, and if it is
> called after ext4_journal_start(), the locking order would be
> transaction start -> folio lock, which is the opposite of the locking
> order used in the writeback process. If these two operations are
> performed concurrently (), it could potentially result in an ABBA
^^ A
> deadlock.
>
> In the subsequent iomap patch [1], I will add a WARNING and a comment
> in ext4_iomap_block_zero_range(I don't see anything can prevent this)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sorry, this sentence should be placed at position A.
> to ensure it is not called under an active handle.
>
> [1] https://lore.kernel.org/linux-ext4/20260203062523.3869120-18-yi.zhang@xxxxxxxxxx/
>
> Thanks,
> Yi.
>
>> Honza
>>
>>> @@ -4722,25 +4724,18 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>>> if (IS_ALIGNED(offset | end, blocksize))
>>> return ret;
>>>
>>> - /*
>>> - * In worst case we have to writeout two nonadjacent unwritten
>>> - * blocks and update the inode
>>> - */
>>> - credits = (2 * ext4_ext_index_trans_blocks(inode, 2)) + 1;
>>> - if (ext4_should_journal_data(inode))
>>> - credits += 2;
>>> - handle = ext4_journal_start(inode, EXT4_HT_MISC, credits);
>>> + /* Zero out partial block at the edges of the range */
>>> + ret = ext4_zero_partial_blocks(inode, offset, len);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + handle = ext4_journal_start(inode, EXT4_HT_MISC, 1);
>>> if (IS_ERR(handle)) {
>>> ret = PTR_ERR(handle);
>>> ext4_std_error(inode->i_sb, ret);
>>> return ret;
>>> }
>>>
>>> - /* Zero out partial block at the edges of the range */
>>> - ret = ext4_zero_partial_blocks(inode, offset, len);
>>> - if (ret)
>>> - goto out_handle;
>>> -
>>> if (new_size)
>>> ext4_update_inode_size(inode, new_size);
>>> ret = ext4_mark_inode_dirty(handle, inode);
>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>> index d5b783a7c814..5288d36b0f09 100644
>>> --- a/fs/ext4/inode.c
>>> +++ b/fs/ext4/inode.c
>>> @@ -4443,8 +4443,12 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>>> if (ret)
>>> return ret;
>>>
>>> + ret = ext4_zero_partial_blocks(inode, offset, length);
>>> + if (ret)
>>> + return ret;
>>> +
>>> if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
>>> - credits = ext4_chunk_trans_extent(inode, 2);
>>> + credits = ext4_chunk_trans_extent(inode, 0);
>>> else
>>> credits = ext4_blocks_for_truncate(inode);
>>> handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
>>> @@ -4454,10 +4458,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>>> return ret;
>>> }
>>>
>>> - ret = ext4_zero_partial_blocks(inode, offset, length);
>>> - if (ret)
>>> - goto out_handle;
>>> -
>>> /* If there are blocks to remove, do it */
>>> start_lblk = EXT4_B_TO_LBLK(inode, offset);
>>> end_lblk = end >> inode->i_blkbits;
>