Re: [PATCH v4 07/23] ext4: do not use data=ordered mode for inodes using buffered iomap path
From: Jan Kara
Date: Tue Jun 16 2026 - 06:03:33 EST
On Mon 11-05-26 15:23:27, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>
> The data=ordered mode introduces two fundamental conflicts with the
> iomap buffered write path, leading to potential deadlocks.
>
> 1) Lock ordering conflict
> In the iomap writeback path, each folio is processed sequentially:
> the folio lock is acquired first, followed by starting a transaction
> to create block mappings. In data=ordered mode, writeback triggered
> by the journal commit process may attempt to acquire a folio lock
> that is already held by iomap. Meanwhile, iomap, under that same
> folio lock, may start a new transaction and wait for the currently
> committing transaction to finish, resulting in a deadlock.
>
> 2) Partial folio submission not supported
> When block size is smaller than folio size, a folio may contain both
> mapped and unmapped blocks. In data=ordered mode, if the journal
> waits for such a folio to be written back while the regular writeback
> process has already started committing it (with the writeback flag
> set), mapping the remaining unmapped blocks can deadlock. This is
> because the writeback flag is cleared only after the entire folio is
> processed and committed.
>
> To support data=ordered mode, the iomap core would need two invasive
> changes:
> - Acquire the transaction handle before locking any folio for
> writeback.
> - Support partial folio submission.
>
> Both changes are complicated and risk performance regressions.
> Therefore, we must avoid using data=ordered mode when converting to the
> iomap path.
>
> Currently, data=ordered mode is used in three scenarios:
> - Append write
> - Post-EOF partial block truncate-up followed by append write
> - Online defragmentation
>
> We can address the first two without data=ordered mode:
> - For append write: always allocate unwritten blocks (i.e. always
> enable dioread_nolock), preserving the behavior of current
> extent-type inodes.
> - For post-EOF truncate-up + append write: postpone updating i_disksize
> until after the zeroed partial block has been written back.
>
> Online defragmentation does not yet support iomap; this can be resolved
> separately in the future.
>
> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@xxxxxxx>
Honza
> ---
> fs/ext4/ext4_jbd2.h | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index 63d17c5201b5..26999f173870 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -383,7 +383,12 @@ static inline int ext4_should_journal_data(struct inode *inode)
>
> static inline int ext4_should_order_data(struct inode *inode)
> {
> - return ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE;
> + /*
> + * inodes using the iomap buffered I/O path do not use the
> + * data=ordered mode.
> + */
> + return !ext4_inode_buffered_iomap(inode) &&
> + (ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE);
> }
>
> static inline int ext4_should_writeback_data(struct inode *inode)
> --
> 2.52.0
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR