Re: [PATCH 07/28] ext4: call journal revoke when freeing ea_inode blocks

From: Andreas Dilger
Date: Mon Jun 05 2017 - 18:08:31 EST


On May 31, 2017, at 10:12 AM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
>
> On Wed, May 31, 2017 at 01:14:56AM -0700, Tahsin Erdogan wrote:
>> ea_inode contents are treated as metadata, that's why it is journaled
>> during initial writes. Failing to call revoke during freeing could cause
>> user data to be overwritten with original ea_inode contents during journal
>> replay.
>>
>> Signed-off-by: Tahsin Erdogan <tahsin@xxxxxxxxxx>
>> ---
>> fs/ext4/extents.c | 3 ++-
>> fs/ext4/indirect.c | 3 ++-
>> 2 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 3e36508610b7..e0a8425ff74d 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -2488,7 +2488,8 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int extents)
>>
>> static inline int get_default_free_blocks_flags(struct inode *inode)
>> {
>> - if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
>> + if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode) ||
>> + ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE))
>> return EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET;
>> else if (ext4_should_journal_data(inode))
>> return EXT4_FREE_BLOCKS_FORGET;
>> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
>> index bc15c2c17633..7ffa290cbb8e 100644
>> --- a/fs/ext4/indirect.c
>> +++ b/fs/ext4/indirect.c
>> @@ -829,7 +829,8 @@ static int ext4_clear_blocks(handle_t *handle, struct inode *inode,
>> int flags = EXT4_FREE_BLOCKS_VALIDATED;
>> int err;
>>
>> - if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
>> + if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode) ||
>> + ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE))
>
> I appreciate the thoroughness of doing this even for blockmapped
> ea_inode files, and I'm not complaining about this hunk at all. :)
>
> However, please consider requiring the extents feature + format as a
> prerequisite for ea_inodes. ext4 has traditionally been very ...
> permissive about supporting a diverse range of feature options, but the
> cost of that diversity is that the feature support matrix that the
> community has to support is already untestably large.
>
> I think it would be wise not to support !extents && ea_inode,
> particularly since blockmaps aren't protected by metadata_csum and so in
> the long run it's probably best to minimize the introduction of new
> blockmap files (on ext4 anyway).

Sorry, I have to disagree on this one.

The Lustre code ONLY uses the xattr inode with non-extent (indirect)
mapped inodes on the metadata target. This is because the MDT only
stores inodes and directories (and a handful of regular files) that
don't benefit from extents at all, but rather are hurt because directory
blocks are typically allocated incrementally over time and result in
fragmented block numbers. In this case, extents can increase the dir
size by 3x over indirect mapped files without any benefit.

Since the MDT only holds inodes, it never needs to be larger than 16TB
(by default only 2KB/inode gives a max MDT size of 8TB) so there is no
need for more than 2^32 blocks in the filesystem.

Cheers, Andreas

>
> --D
>
>> flags |= EXT4_FREE_BLOCKS_FORGET | EXT4_FREE_BLOCKS_METADATA;
>> else if (ext4_should_journal_data(inode))
>> flags |= EXT4_FREE_BLOCKS_FORGET;
>> --
>> 2.13.0.219.gdb65acc882-goog


Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP