Re: [PATCH -next] ext4: Fix symlink file size not match to file content
From: Zhang Yi
Date: Wed Apr 06 2022 - 08:33:46 EST
On 2022/3/21 23:11, Jan Kara wrote:
> Hello Yi!
>
> On Mon 21-03-22 22:38:49, Zhang Yi wrote:
>> On 2022/3/21 19:37, Jan Kara wrote:
>>> On Mon 21-03-22 19:34:08, Ye Bin wrote:
>>>> We got issue as follows:
>>>> [home]# fsck.ext4 -fn ram0yb
>>>> e2fsck 1.45.6 (20-Mar-2020)
>>>> Pass 1: Checking inodes, blocks, and sizes
>>>> Pass 2: Checking directory structure
>>>> Symlink /p3/d14/d1a/l3d (inode #3494) is invalid.
>>>> Clear? no
>>>> Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0).
>>>> Fix? no
>>>>
>>>> As symlink file size not match to file content. If symlink data block
>>>> writback failed, will call ext4_finish_bio to end io. In this path don't
>>>> mark buffer error. When umount do checkpoint can't detect buffer error,
>>>> then will cleanup jounral. Actually, correct data maybe in journal area.
>>>> To solve this issue, mark buffer error when detect bio error in
>>>> ext4_finish_bio.
>>>
>>> Thanks for the patch! Let me rephrase the text a bit:
>>>
>>> As the symlink file size does not match the file content. If the writeback
>>> of the symlink data block failed, ext4_finish_bio() handles the end of IO.
>>> However this function fails to mark the buffer with BH_write_io_error and
>>> so when unmount does journal checkpoint it cannot detect the writeback
>>> error and will cleanup the journal. Thus we've lost the correct data in the
>>> journal area. To solve this issue, mark the buffer as BH_write_io_error in
>>> ext4_finish_bio().
>>>
>>
>> Thinking about this issue in depth, the symlink data block is one kind of
>> metadata, but the page mapping of such block is belongs to the ext4 inode,
>> it's not coordinate to other metadata blocks, e.g. directory block and extents
>> block. This is why we have already fix the same issue of other metadata blocks
>> in commit fcf37549ae19e9 "jbd2: ensure abort the journal if detect IO error
>> when writing original buffer back" but missing the case of symlink data block.
>> So, after Ye Bin's fix, I think it's worth to unify the symlink data block
>> mapping to bdev, any suggestions?
>
> Well, symlink with external block is essentially a case of data=journal
> data block. So even if we would handle symlinks, we would still need to
> deal with other inodes with journalled data. Also we need to keep the> symlink contents in the page cache to make it simple for generic VFS code
> handling symlinks. So I don't see how we could substantially unify
> things...
>
Yeah, this fix is still needed for other regular file's journalled data when we
mounted filesystem with data=jouranl mode. But if we just consider whether if we
could unify the journal mode of ext4's metadata blocks, it seems that using
data=journal mode for symlink's external data block is also complicated and
confused in the creating procedure. Instead, if we use ext4_bread(), it make
things clear, and it seems also has no side effect of reading symlinks. I write
a RFC patch to do this, please take a look at my latest mail "[RFC PATCH] ext4:
convert symlink external data block mapping to bdev".
Thanks,
Yi.