Re: [PATCH] ocfs2: revalidate the journal dinode before toggling dirty

From: ZhengYuan Huang

Date: Mon May 11 2026 - 22:45:47 EST

On Mon, May 11, 2026 at 2:15 PM Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx> wrote:
>
>
>
> On 5/11/26 10:58 AM, ZhengYuan Huang wrote:
> > On Sun, May 10, 2026 at 12:02 PM Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 5/9/26 9:52 PM, ZhengYuan Huang wrote:
> >>> [BUG]
> >>> A fuzzed OCFS2 image can corrupt the current slot journal dinode while
> >>> mount is still in progress. The mount path first reports the invalid
> >>> journal block and then crashes in shutdown:
> >>>
> >>> kernel BUG at fs/ocfs2/journal.c:1034!
> >>> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> >>> RIP: 0010:ocfs2_journal_toggle_dirty+0x2d6/0x340 fs/ocfs2/journal.c:1034
> >>> Call Trace:
> >>> ocfs2_journal_shutdown+0x414/0xc30 fs/ocfs2/journal.c:1116
> >>> ocfs2_mount_volume fs/ocfs2/super.c:1785 [inline]
> >>> ocfs2_fill_super+0x30a9/0x3cd0 fs/ocfs2/super.c:1083
> >>> get_tree_bdev_flags+0x38b/0x640 fs/super.c:1698
> >>> get_tree_bdev+0x24/0x40 fs/super.c:1721
> >>> ocfs2_get_tree+0x21/0x30 fs/ocfs2/super.c:1184
> >>> vfs_get_tree+0x9a/0x370 fs/super.c:1758
> >>> fc_mount fs/namespace.c:1199 [inline]
> >>> do_new_mount_fc fs/namespace.c:3642 [inline]
> >>> do_new_mount fs/namespace.c:3718 [inline]
> >>> path_mount+0x5b8/0x1ea0 fs/namespace.c:4028
> >>> do_mount fs/namespace.c:4041 [inline]
> >>> __do_sys_mount fs/namespace.c:4229 [inline]
> >>> __se_sys_mount fs/namespace.c:4206 [inline]
> >>> __x64_sys_mount+0x282/0x320 fs/namespace.c:4206
> >>> ...
> >>>
> >>>
> >>> [CAUSE]
> >>> ocfs2_journal_toggle_dirty() assumes journal->j_bh still contains the
> >>> same validated dinode that ocfs2_journal_init() locked earlier, and it
> >>> uses BUG_ON() when the buffer no longer looks like a dinode. That
> >>> assumption is too strong. The mount path can force the same current-slot
> >>> journal inode block back in from disk through
> >>> ocfs2_read_journal_inode(..., OCFS2_BH_IGNORE_CACHE) while
> >>> ocfs2_mark_dead_nodes() scans the journal slots. If that reread finds
> >>> corrupted metadata, mount unwinds through ocfs2_journal_shutdown(),
> >>> which reuses journal->j_bh and turns the metadata corruption into a
> >>> kernel BUG.
> >>>
> >>
> >> A bit confused.
> >> Since journal dinode is firstly validated, it means image is checked.
> >> Now mount is in progress, how to corrupt it during runtime?
> >>
> >> Thanks,
> >> Joseph
> >
> > Thanks for taking a look.
> >
> > Yes, the journal dinode is validated when it is first initialized. My
> > concern is that later in the mount path, the same journal inode block
> > may be read again from disk with OCFS2_BH_IGNORE_CACHE, so the buffer
> > used by ocfs2_journal_shutdown() may no longer be the same validated
> > contents.
> >
> After the validation in ocfs2_journal_init(), the in-memory copy won't
> spontaneously become invalid.
>
> And if it is broken by a re-write (e.g. recover), this a bug in the
> re-write flow and we have to fix the flow itself.
>
>
> > This does not mean the filesystem itself corrupts the block during
> > mount. Rather, after the initial validation and before the later use,
> > the block contents may change due to unexpected disk corruption, I/O
> > problems, or a forced reread of corrupted on-disk metadata. In that
> > case, ocfs2_journal_toggle_dirty() should not rely only on the earlier
> > validation.
> >
> ocfs2_validate_inode_block() is a bit heavy. So if we want to prevent a
> BUG_ON in case of unexpected disk corruption (still a bit strange, it is
> fine in init and then suddenly down...), a simpler alternative would be
> just replace BUG_ON with WARN_ON and return error.
>
> Thanks,
> Joseph
>
> > Since this is a cold mount/shutdown error path, adding this extra
> > validation should not have a meaningful performance impact. I see it
> > as a small robustness improvement to avoid turning bad metadata into a
> > kernel BUG.
> >

Thanks for the clarification. I have updated the patch according to
your suggestion and sent out v2.

Thanks,
ZhengYuan Huang