Re: [PATCH] ext4: cache es->s_journal_inum in ext4_sb_info

From: Zhang Yi
Date: Wed Mar 26 2025 - 04:34:02 EST


On 2025/3/26 14:39, Ojaswin Mujoo wrote:
> On Wed, Mar 26, 2025 at 12:01:45PM +0800, Zhang Yi wrote:
>> On 2025/3/26 10:16, Baokun Li wrote:
>>> On 2025/3/26 1:57, Ojaswin Mujoo wrote:
>>>> On Tue, Mar 18, 2025 at 10:31:29PM -0400, Theodore Ts'o wrote:
>>>>> On Tue, Mar 18, 2025 at 01:42:31PM +0530, Ojaswin Mujoo wrote:
>>>>>>> So this is something we need to do if the journal is actived, and if
>>>>>>> it's active, then sbi->s_journal will be non-NULL, and so we can just
>>>>>>> check to see if inode == sbi->s_journal instead.  This will simplify
>>>>>> I believe you mean inode == sbi->s_journal->j_inode here right?
>>>>> Yes, that's what I meant; sorry for the not catching this before I
>>>>> sent my reply.
>>>>>
>>>>> Cheers,
>>>>>
>>>>>                     - Ted
>>>> Hi Ted, Baokun,
>>>>
>>>> I got some time to revisit this. Seems like checking against
>>>> s_journal->j_inode is not enough. This is because both
>>>> ext4_check_blockref() and check_block_validity() can be called even
>>>> before journal->j_inode is set:
>>>>
>>>> ext4_open_inode_journal
>>>>    ext4_get_journal_inode
>>>>       __ext4_iget
>>>>           ext4_ind_check_inode
>>>>               ext4_check_blockref  /* j_inode not set */
>>>>
>>>>    journal = jbd2_journal_init_inode
>>>>       bmap
>>>>           ext4_bmap
>>>>              iomap_bmap
>>>>                ext4_iomap_begin
>>>>                    ext4_map_blocks
>>>>                        check_block_validity
>>>>
>>>>    journal->j_inode = inode
>>>>
>>>>
>>>> Now, I think in this case the best solution might be to use the extra
>>>> field like we do in this patch but set  EXT4_SB(sb)->s_journal_ino
>>>> sufficiently early.
>>>>
>>>> Thoughts?
>>>
>>> Because system zone setup happens after the journal are loaded, I think we
>>> can skip the check if the journal haven't been loaded yet, like this:
>>>
>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>> index d04d8a7f12e7..38dc72ff7e78 100644
>>> --- a/fs/ext4/inode.c
>>> +++ b/fs/ext4/inode.c
>>> @@ -383,9 +383,10 @@ static int __check_block_validity(struct inode *inode, const char *func,
>>>                                 unsigned int line,
>>>                                 struct ext4_map_blocks *map)
>>>  {
>>> +       journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
>>> +
>>>         if (ext4_has_feature_journal(inode->i_sb) &&
>>> -           (inode->i_ino ==
>>> - le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum)))
>>> +           (!journal || inode == journal->j_inode))
>>>                 return 0;
>>>         if (!ext4_inode_block_valid(inode, map->m_pblk, map->m_len)) {
>>>                 ext4_error_inode(inode, func, line, map->m_pblk,
>>>
>>> If any part of the journal area overlaps with the system zone, we'll catch
>>> it when we add the journal area to the system zone later.
>>>
>>>
>>
>> Since the creation of the system zone relies on the journal being
>> loaded, I think there is no risk in proceeding to call
>> ext4_inode_block_valid() to perform a basic block range check for
>> the journal inode, or even better.
>>
>> Thanks,
>> Yi.
>
> Got it Yi, makes sense to me. So I believe you are suggesting something
> like:
>
> @@ -384,9 +384,10 @@ static int __check_block_validity(struct inode *inode, const char *func,
> unsigned int line,
> struct ext4_map_blocks *map)
> {
> + journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
> +
> if (ext4_has_feature_journal(inode->i_sb) &&

We are going to check ->s_journal, so I suppose we could drop this
feature check as well. Others looks good to me.

> - (inode->i_ino ==
> - le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum)))
> + (journal && journal->j_inode == inode))
> return 0;
> if (!ext4_inode_block_valid(inode, map->m_pblk, map->m_len)) {
> ext4_error_inode(inode, func, line, map->m_pblk,
>
>>
>
> So that even if it is a journal inode we can go ahead and perform some basic checks
> as the system zone rbtree will anyways be NULL at this point. From a cursory look,
> it seems that __ext4_iget(..., journal_inode) -> ext4_ext_check_inode() already relies
> on the fact that system zone is NULL, so we should be okay here as well.

Yeah, that's right. :)

Cheers,
Yi.

>
> If this looks good, I'll send a v2 with the suggested changes.
>
> Thanks,
> ojaswin