[PATCH v3 07/22] ext4: do not use data=ordered mode for inodes using buffered iomap path

From: Zhang Yi

Date: Tue Apr 21 2026 - 22:20:38 EST

From: Zhang Yi <yi.zhang@xxxxxxxxxx>

Do not use data=ordered mode for inodes using the buffered iomap path.
There are two reasons:

1. The lock ordering of the folio lock and starting transactions
conflicts with the data=ordered mode. In the writeback path of the
iomap, it processes each folio one by one. It first holds the folio
lock and then starts a transaction to create the block mapping. In
the data=ordered mode, if we perform writeback through the journal
commit process, it may try to acquire the folio lock of a folio
already locked by iomap, and the iomap could start a new transaction
under this folio lock, which may also wait for the current committing
transaction to finish, finally triggering a deadlock.
2. The iomap writeback path doesn't support partial folio submission. In
the data=ordered mode, when the journal process is waiting for a
folio to be written back, and the folio may also contain unmapped
blocks with a block size smaller than the folio size, if the regular
writeback process has already started committing this folio (and set
the writeback flag), then a deadlock may occur while mapping the
remaining unmapped blocks. This is because the writeback flag is
cleared only after the entire folio are processed and committed.

To support the data=ordered mode, we need to modify the iomap
infrastructure by grabbing the transaction handle before we lock any
folio for writeback. In addition, we need to add support for submitting
partial folios, which is complicated and tricky, and may also cause
performance regressions. Therefore, we need to get rid of the
data=ordered mode when doing the conversion.

Currently, there are three scenarios where the data=ordered mode is used:

- Append write
- Post-EOF partial block truncate up and append write
- Online defragmentation

For append write, we can get rid of it by always allocating unwritten
blocks, retains the behavior of the current extents-type inode. For
post-E0F partial block truncate up and append write, we can get rid of
it by postponing updating i_disksize after the zeroed partial block is
written back. For the case of online defragmentation, it has not yet
been supported, we can find other solutions later.

Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
---
fs/ext4/ext4_jbd2.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 63d17c5201b5..26999f173870 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -383,7 +383,12 @@ static inline int ext4_should_journal_data(struct inode *inode)

static inline int ext4_should_order_data(struct inode *inode)
{
- return ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE;
+ /*
+ * inodes using the iomap buffered I/O path do not use the
+ * data=ordered mode.
+ */
+ return !ext4_inode_buffered_iomap(inode) &&
+ (ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE);
}

static inline int ext4_should_writeback_data(struct inode *inode)
--
2.52.0