Re: [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount

From: Jiakai Xu

Date: Sat May 09 2026 - 00:28:58 EST


> It seems this is not enough, or TOCTOU still exists. Say:
>
> Thread A Thread B
> osb = OCFS2_SB(inode->i_sb)
> ocfs2_dismount_volume()
> -> sb->s_fs_info = NULL
> -> kfree(osb)
> use freed osb
>

Hi Joseph,

Thank you very much for the review! You are absolutely right about the
TOCTOU issue — simply adding a NULL check after OCFS2_SB() cannot
prevent the race where thread A reads a valid osb pointer before thread
B frees it.

> BTW, how did you find this issue?

I found this issue through fuzzing. The crash report shows a page fault
at __pv_queued_spin_lock_slowpath via the call path:

ocfs2_permission -> ocfs2_inode_lock_tracker ->
ocfs2_inode_lock_full_nested -> ocfs2_is_hard_readonly ->
spin_lock(&osb->osb_lock)

The fault address was in the kernel static data region, indicating that
the osb structure had been freed and its memory reused.

I have been thinking about a more robust fix and would like to get your
opinion on the following approach:

Currently, ocfs2_dismount_volume() is called from ocfs2_put_super(),
which runs inside generic_shutdown_super() while s_umount is still held.
The osb structure is freed at this point, but inodes with elevated
refcounts (e.g., held by inotify) survive evict_inodes() and may still
trigger filesystem operations (like ocfs2_permission) that access osb.

The idea is to move the osb cleanup out of ocfs2_dismount_volume() and
into an ocfs2-specific ->kill_sb() callback, so that the cleanup happens
after generic_shutdown_super() has completed and all concurrent VFS
operations have drained.

Specifically:

1. Remove ocfs2_delete_osb(), kfree(osb), and sb->s_fs_info = NULL from
ocfs2_dismount_volume(). Keep all the subsystem shutdown (journal,
dlm, recovery, quota, etc.) there.

2. Add a new ocfs2_kill_sb() that wraps kill_block_super():

static void ocfs2_kill_sb(struct super_block *sb)
{
struct ocfs2_super *osb = OCFS2_SB(sb);

kill_block_super(sb);
// At this point generic_shutdown_super() has completed,
// SB_DYING is set, and no new VFS operations can enter.

if (osb) {
ocfs2_delete_osb(osb);
kfree(osb);
sb->s_fs_info = NULL;
}
}

3. Update ocfs2_fs_type to use ocfs2_kill_sb instead of kill_block_super.

4. The NULL check in ocfs2_inode_lock_full_nested() can optionally be
kept as a defense-in-depth measure, though it is no longer strictly
necessary if the life-cycle ordering is correct.

This pattern is similar to ext4 — ext4_kill_sb() calls kill_block_super()
first and then handles cleanup after (e.g., journal_bdev_file).

Does this approach make sense?

Best regards,
Jiakai