Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop

From: Jan Kara
Date: Wed Aug 17 2022 - 10:32:03 EST


On Wed 17-08-22 21:27:01, Baokun Li wrote:
> In do_writepages, if the value returned by ext4_writepages is "-ENOMEM"
> and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met.
>
> In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL,
> the function returns -ENOMEM.
>
> In __getblk_slow, if the return value of grow_buffers is less than 0,
> the function returns NULL.
>
> When the three processes are connected in series like the following stack,
> an infinite loop may occur:
>
> do_writepages <--- keep retrying
> ext4_writepages
> mpage_map_and_submit_extent
> mpage_map_one_extent
> ext4_map_blocks
> ext4_ext_map_blocks
> ext4_ext_handle_unwritten_extents
> ext4_ext_convert_to_initialized
> ext4_split_extent
> ext4_split_extent_at
> __ext4_ext_dirty
> __ext4_mark_inode_dirty
> ext4_reserve_inode_write
> ext4_get_inode_loc
> __ext4_get_inode_loc <--- return -ENOMEM
> sb_getblk
> __getblk_gfp
> __getblk_slow <--- return NULL
> grow_buffers
> grow_dev_page <--- return -ENXIO
> ret = (block < end_block) ? 1 : -ENXIO;
>
> In this issue, bg_inode_table_hi is overwritten as an incorrect value.
> As a result, `block < end_block` cannot be met in grow_dev_page.
> Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages
> keeps retrying. As a result, the writeback process is in the D state due
> to an infinite loop.
>
> Add a check on inode table block in the __ext4_get_inode_loc function by
> referring to ext4_read_inode_bitmap to avoid this infinite loop.
>
> Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>

Thanks for the fixes. Normally, we check that inode table is fine in
ext4_check_descriptors() (and those checks are much stricter) so it seems
unnecessary to check it again here. I understand that in your case it was
resize that corrupted the group descriptor after the filesystem was mounted
which is nasty but there's much more metadata that can be corrupted like
this and it's infeasible to check each metadata block before we use it.

IMHO a proper fix to this class of issues would be for sb_getblk() to
return proper error so that we can distinguish ENOMEM from other errors.
But that will be a larger undertaking...

Honza

> ---
> fs/ext4/inode.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 601214453c3a..5e171879fa23 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4466,9 +4466,17 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
> inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
> inode_offset = ((ino - 1) %
> EXT4_INODES_PER_GROUP(sb));
> - block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block);
> iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb);
>
> + block = ext4_inode_table(sb, gdp);
> + if ((block <= le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) ||
> + (block >= ext4_blocks_count(EXT4_SB(sb)->s_es))) {
> + ext4_error(sb, "Invalid inode table block %llu in "
> + "block_group %u", block, iloc->block_group);
> + return -EFSCORRUPTED;
> + }
> + block += (inode_offset / inodes_per_block);
> +
> bh = sb_getblk(sb, block);
> if (unlikely(!bh))
> return -ENOMEM;
> --
> 2.31.1
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR