Re: [PATCH] ext4: cache es->s_journal_inum in ext4_sb_info

From: Baokun Li
Date: Wed Mar 26 2025 - 05:27:06 EST


On 2025/3/26 16:33, Zhang Yi wrote:
On 2025/3/26 14:39, Ojaswin Mujoo wrote:
On Wed, Mar 26, 2025 at 12:01:45PM +0800, Zhang Yi wrote:
On 2025/3/26 10:16, Baokun Li wrote:
On 2025/3/26 1:57, Ojaswin Mujoo wrote:
On Tue, Mar 18, 2025 at 10:31:29PM -0400, Theodore Ts'o wrote:
On Tue, Mar 18, 2025 at 01:42:31PM +0530, Ojaswin Mujoo wrote:
So this is something we need to do if the journal is actived, and if
it's active, then sbi->s_journal will be non-NULL, and so we can just
check to see if inode == sbi->s_journal instead.  This will simplify
I believe you mean inode == sbi->s_journal->j_inode here right?
Yes, that's what I meant; sorry for the not catching this before I
sent my reply.

Cheers,

                    - Ted
Hi Ted, Baokun,

I got some time to revisit this. Seems like checking against
s_journal->j_inode is not enough. This is because both
ext4_check_blockref() and check_block_validity() can be called even
before journal->j_inode is set:

ext4_open_inode_journal
   ext4_get_journal_inode
      __ext4_iget
          ext4_ind_check_inode
              ext4_check_blockref  /* j_inode not set */

   journal = jbd2_journal_init_inode
      bmap
          ext4_bmap
             iomap_bmap
               ext4_iomap_begin
                   ext4_map_blocks
                       check_block_validity

   journal->j_inode = inode


Now, I think in this case the best solution might be to use the extra
field like we do in this patch but set  EXT4_SB(sb)->s_journal_ino
sufficiently early.

Thoughts?
Because system zone setup happens after the journal are loaded, I think we
can skip the check if the journal haven't been loaded yet, like this:

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d04d8a7f12e7..38dc72ff7e78 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -383,9 +383,10 @@ static int __check_block_validity(struct inode *inode, const char *func,
                                unsigned int line,
                                struct ext4_map_blocks *map)
 {
+       journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
+
        if (ext4_has_feature_journal(inode->i_sb) &&
-           (inode->i_ino ==
- le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum)))
+           (!journal || inode == journal->j_inode))
                return 0;
        if (!ext4_inode_block_valid(inode, map->m_pblk, map->m_len)) {
                ext4_error_inode(inode, func, line, map->m_pblk,

If any part of the journal area overlaps with the system zone, we'll catch
it when we add the journal area to the system zone later.


Since the creation of the system zone relies on the journal being
loaded, I think there is no risk in proceeding to call
ext4_inode_block_valid() to perform a basic block range check for
the journal inode, or even better.
Indeed, performing some basic anomaly checks in advance can prevent
journal replay from worsening the situation in abnormal cases. Moreover,
since s_journal is NULL at this point, we won't schedule s_sb_upd_work
even if the check fails, which is safe.

Thanks,
Yi.
Got it Yi, makes sense to me. So I believe you are suggesting something
like:

@@ -384,9 +384,10 @@ static int __check_block_validity(struct inode *inode, const char *func,
unsigned int line,
struct ext4_map_blocks *map)
{
+ journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
+
if (ext4_has_feature_journal(inode->i_sb) &&
We are going to check ->s_journal, so I suppose we could drop this
feature check as well. Others looks good to me.
Seconded.

- (inode->i_ino ==
- le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum)))
+ (journal && journal->j_inode == inode))
return 0;
if (!ext4_inode_block_valid(inode, map->m_pblk, map->m_len)) {
ext4_error_inode(inode, func, line, map->m_pblk,

So that even if it is a journal inode we can go ahead and perform some basic checks
as the system zone rbtree will anyways be NULL at this point. From a cursory look,
it seems that __ext4_iget(..., journal_inode) -> ext4_ext_check_inode() already relies
on the fact that system zone is NULL, so we should be okay here as well.
Yeah, that's right. :)

Cheers,
Yi.

If this looks good, I'll send a v2 with the suggested changes.

Thanks,
ojaswin

Please mention in the commit message that we're now doing some basic
checks on the journal area.


Cheers,
Baokun