Re: [PATCH] ocfs2: fix possible deadlock between unlink and dio_end_io_write

From: Heming Zhao

Date: Fri Mar 06 2026 - 00:23:25 EST


On Fri, Mar 06, 2026 at 11:22:11AM +0800, Joseph Qi wrote:
> ocfs2_unlink takes orphan dir inode_lock first and then ip_alloc_sem,
> while in ocfs2_dio_end_io_write, it acquires these locks in reverse
> order. This creates an ABBA lock ordering violation on lock classes
> ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE] and
> ocfs2_file_ip_alloc_sem_key.
>
> Lock Chain #0 (orphan dir inode_lock -> ip_alloc_sem):
> ocfs2_unlink
> ocfs2_prepare_orphan_dir
> ocfs2_lookup_lock_orphan_dir
> inode_lock(orphan_dir_inode) <- lock A
> __ocfs2_prepare_orphan_dir
> ocfs2_prepare_dir_for_insert
> ocfs2_extend_dir
> ocfs2_expand_inline_dir
> down_write(&oi->ip_alloc_sem) <- Lock B
>
> Lock Chain #1 (ip_alloc_sem -> orphan dir inode_lock):
> ocfs2_dio_end_io_write
> down_write(&oi->ip_alloc_sem) <- Lock B
> ocfs2_del_inode_from_orphan()
> inode_lock(orphan_dir_inode) <- Lock A
>
> Deadlock Scenario:
> CPU0 (unlink) CPU1 (dio_end_io_write)
> ------ ------
> inode_lock(orphan_dir_inode)
> down_write(ip_alloc_sem)
> down_write(ip_alloc_sem)
> inode_lock(orphan_dir_inode)
>
> Since ip_alloc_sem is to protect allocation changes, which is unrelated
> with operations in ocfs2_del_inode_from_orphan. So move
> ocfs2_del_inode_from_orphan out of ip_alloc_sem to fix the deadlock.
>
> Reported-by: syzbot+67b90111784a3eac8c04@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://syzkaller.appspot.com/bug?extid=67b90111784a3eac8c04
> Fixes: a86a72a4a4e0 ("ocfs2: take ip_alloc_sem in ocfs2_dio_get_block & ocfs2_dio_end_io_write")
> Signed-off-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>

LGTM.
Reviewed-by: Heming Zhao <heming.zhao@xxxxxxxx>

btw, I have a question below that is unrelated to this bug.
> ---
> fs/ocfs2/aops.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 17ba79f443ee..09146b43d1f0 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2294,8 +2294,6 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> goto out;
> }
>
> - down_write(&oi->ip_alloc_sem);
> -
> /* Delete orphan before acquire i_rwsem. */

The comment above looks wired. From commit a86a72a4a4e0, the correct one seems:
/* Delete orphan without acquiring i_rwsem. */

Heming
> if (dwc->dw_orphaned) {
> BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
> @@ -2308,6 +2306,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> mlog_errno(ret);
> }
>
> + down_write(&oi->ip_alloc_sem);
> di = (struct ocfs2_dinode *)di_bh->b_data;
>
> ocfs2_init_dinode_extent_tree(&et, INODE_CACHE(inode), di_bh);
> --
> 2.39.3
>