Re: [PATCH v4 03/10] ext4: don't write back data before punch hole in nojournal mode
From: Ojaswin Mujoo
Date: Tue Dec 17 2024 - 09:33:17 EST
On Mon, Dec 16, 2024 at 09:39:08AM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>
> There is no need to write back all data before punching a hole in
> non-journaled mode since it will be dropped soon after removing space.
> Therefore, the call to filemap_write_and_wait_range() can be eliminated.
Hi, sorry I'm a bit late to this however following the discussion here
[1], I believe the initial concern was that we don't in PATCH v1 01/10
was that after truncating the pagecache, the ext4_alloc_file_blocks()
call might fail with errors like EIO, ENOMEM etc leading to inconsistent
data.
Is my understanding correct that we realised that these are very rare
cases and are not worth the performance penalty of writeback? In which
case, is it really okay to just let the scope for corruption exist even
though its rare. There might be some other error cases we might be
missing which might be more easier to hit. For eg I think we can also
fail ext4_alloc_file_blocks() with ENOSPC in case there is a written to
unwritten extent conversion causing an extent split leading to extent
tree node allocation. (Maybe can be avoided by using PRE_IO with
EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT in the first ext4_alloc_file_blocks() call)
So does it make sense to retain the writeback behavior or am I just
being paranoid :)
Regards,
ojaswin
> Besides, similar to ext4_zero_range(), we must address the case of
> partially punched folios when block size < page size. It is essential to
> remove writable userspace mappings to ensure that the folio can be
> faulted again during subsequent mmap write access.
>
> In journaled mode, we need to write dirty pages out before discarding
> page cache in case of crash before committing the freeing data
> transaction, which could expose old, stale data, even if synchronization
> has been performed.
>
> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
> ---
> fs/ext4/inode.c | 18 +++++-------------
> 1 file changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index bf735d06b621..a5ba2b71d508 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4018,17 +4018,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>
> trace_ext4_punch_hole(inode, offset, length, 0);
>
> - /*
> - * Write out all dirty pages to avoid race conditions
> - * Then release them.
> - */
> - if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> - ret = filemap_write_and_wait_range(mapping, offset,
> - offset + length - 1);
> - if (ret)
> - return ret;
> - }
> -
> inode_lock(inode);
>
> /* No need to punch hole beyond i_size */
> @@ -4090,8 +4079,11 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
> ret = ext4_update_disksize_before_punch(inode, offset, length);
> if (ret)
> goto out_dio;
> - truncate_pagecache_range(inode, first_block_offset,
> - last_block_offset);
> +
> + ret = ext4_truncate_page_cache_block_range(inode,
> + first_block_offset, last_block_offset + 1);
> + if (ret)
> + goto out_dio;
> }
>
> if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> --
> 2.46.1
>