[PATCH 03/27] ext4: don't write back data before punch hole in nojournal mode

From: Zhang Yi
Date: Mon Oct 21 2024 - 23:14:40 EST


From: Zhang Yi <yi.zhang@xxxxxxxxxx>

There is no need to write back all data before punching a hole in
data=ordered|writeback mode since it will be dropped soon after removing
space, so just remove the filemap_write_and_wait_range() in these modes.
However, in data=journal mode, we need to write dirty pages out before
discarding page cache in case of crash before committing the freeing
data transaction, which could expose old, stale data.

Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
---
fs/ext4/inode.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f8796f7b0f94..94b923afcd9c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3965,17 +3965,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)

trace_ext4_punch_hole(inode, offset, length, 0);

- /*
- * Write out all dirty pages to avoid race conditions
- * Then release them.
- */
- if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
- ret = filemap_write_and_wait_range(mapping, offset,
- offset + length - 1);
- if (ret)
- return ret;
- }
-
inode_lock(inode);

/* No need to punch hole beyond i_size */
@@ -4037,6 +4026,21 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
ret = ext4_update_disksize_before_punch(inode, offset, length);
if (ret)
goto out_dio;
+
+ /*
+ * For journalled data we need to write (and checkpoint) pages
+ * before discarding page cache to avoid inconsitent data on
+ * disk in case of crash before punching trans is committed.
+ */
+ if (ext4_should_journal_data(inode)) {
+ ret = filemap_write_and_wait_range(mapping,
+ first_block_offset, last_block_offset);
+ if (ret)
+ goto out_dio;
+ }
+
+ ext4_truncate_folios_range(inode, first_block_offset,
+ last_block_offset + 1);
truncate_pagecache_range(inode, first_block_offset,
last_block_offset);
}
--
2.46.1