Re: [PATCH v7 0/4] ext4: fix xattr iput deadlock with s_writepages_rwsem
From: Jan Kara
Date: Wed Jun 17 2026 - 14:14:56 EST
On Tue 16-06-26 23:15:54, Yun Zhou wrote:
> This series fixes a circular lock dependency reported by syzbot:
>
> s_writepages_rwsem --> jbd2_handle --> xattr_sem --> s_writepages_rwsem
>
> The deadlock occurs when iput() on an EA inode triggers write_inode_now()
> while xattr_sem and a jbd2 handle are held. The triggering path is
> during mount-time orphan cleanup (!SB_ACTIVE) where iput_final() calls
> write_inode_now() synchronously.
>
> Patch 1 blocks the deadlock by skipping extra isize expansion when
> !SB_ACTIVE -- this prevents the xattr manipulation path from being
> entered during mount.
>
> Patch 2 is a belt-and-suspenders semantic improvement: an inode under
> eviction never needs extra isize expansion.
>
> Patches 3-4 are a structural improvement using a per-sb workqueue:
>
> Patch 3 introduces ext4_put_ea_inode(), which does direct iput() when
> SB_ACTIVE (zero overhead) and defers to a workqueue when !SB_ACTIVE.
> It also converts the first call site (ext4_xattr_block_set release
> path) which previously called iput under xattr_sem + jbd2 handle.
>
> Patch 4 converts the remaining EA inode iput() calls that execute
> under locks. Sites where direct iput() is provably safe (i_nlink=0
> after dec_ref, or lookup-only paths) are left unchanged with comments.
>
> Link: https://syzkaller.appspot.com/bug?extid=5d19358d7eb30ffb0cc5
Please don't send the series so quickly. I'd say twice per week is about
maximum sensible cadence. It takes time (easily several days) for people to
get to look at your patches and sending your patches sometimes even several
times per day just creates a mess in the mailbox.
Also in some previous version, I gave my Reviewed-by tag for patch 1 and
some comment and Reviewed-by tag for patch 2. So please reflect that in the
next posting.
Honza
> v7:
> - Replaced the deferred-iput array threading approach (v4-v6) with a
> simpler per-sb workqueue + lock-free llist design. No function
> signature changes needed. ext4_put_ea_inode() does direct iput when
> SB_ACTIVE (zero overhead in normal operation) and defers to the
> workqueue only during mount (!SB_ACTIVE).
> - Converted the iput in ext4_xattr_delete_inode()'s quota accounting
> loop to ext4_put_ea_inode() to eliminate a lockdep-reportable lock
> ordering violation (jbd2_handle -> iput -> s_writepages_rwsem).
> - Moved flush_work() before the if (sbi->s_journal) check in
> ext4_put_super() to cover nojournal mode.
>
> v6:
> - ext4_inline_data_truncate(): use local ea_inode_array instead of
> passing NULL, freed after ext4_journal_stop(). Fixes a deadlock
> reachable via crafted filesystem where inline data xattr entry has
> e_value_inum set: orphan cleanup -> ext4_truncate ->
> ext4_inline_data_truncate -> iput under !SB_ACTIVE.
>
> v5:
> - Split into 3 patches for easier review.
> - Add explicit !SB_ACTIVE early-return in ext4_try_to_expand_extra_isize()
> to block ALL mount-time paths (ext4_process_orphan -> ext4_truncate ->
> ext4_mark_inode_dirty), not just the eviction path. v4 only relied on
> EXT4_STATE_NO_EXPAND which doesn't cover orphan truncation.
>
> v4:
> - Comprehensive rewrite of the deferred iput mechanism.
> - Thread ea_inode_array through ext4_expand_extra_isize_ea() and
> ext4_xattr_move_to_block() so ALL ea_inode iputs in the expand
> path are deferred, not just those in ext4_xattr_block_set().
> - Add NULL safety to ext4_expand_inode_array(): when ea_inode_array
> pointer is NULL, fall back to synchronous iput (for callers like
> ext4_initxattrs that only run with SB_ACTIVE).
> - Use __GFP_NOFAIL to guarantee deferred array growth, eliminating
> fallback to synchronous iput under locks.
> - Update ext4_xattr_ibody_set() and ext4_xattr_set_entry() signatures
> to accept ea_inode_array, converting ALL iput(ea_inode) calls.
> - Set EXT4_STATE_NO_EXPAND in ext4_evict_inode() before
> ext4_mark_inode_dirty().
>
> v3:
> - Check ext4_expand_inode_array() return value; fallback to
> direct iput() on ENOMEM to prevent inode leak.
> - Make ext4_xattr_set_handle() take an optional ea_inode_array
> output parameter so callers can free after ext4_journal_stop(),
> avoiding the jbd2_handle vs s_writepages_rwsem AB-BA.
> - Pass ea_inode_array directly to ext4_xattr_release_block()
> instead of using a local array freed under xattr_sem.
> - Move ext4_xattr_inode_array_free() after ext4_journal_stop()
>
> v2:
> - Defer iput() in ext4_xattr_block_set() via ea_inode_array,
> freed after xattr_sem is released. Fixes the root cause.
>
> v1:
> - Set EXT4_STATE_NO_EXPAND in ext4_evict_inode() to skip expand
> on inodes being deleted. Only fixes the syzbot reproducer, not
> the underlying lock ordering violation.
>
> Yun Zhou (4):
> ext4: skip extra isize expansion during mount to prevent deadlock
> ext4: set EXT4_STATE_NO_EXPAND in ext4_evict_inode
> ext4: introduce ext4_put_ea_inode() for safe deferred iput
> ext4: convert remaining EA inode iput() calls to ext4_put_ea_inode()
>
> fs/ext4/ext4.h | 5 +++
> fs/ext4/inode.c | 11 +++++
> fs/ext4/super.c | 6 +++
> fs/ext4/xattr.c | 105 +++++++++++++++++++++++++++++++++++++++++++-----
> fs/ext4/xattr.h | 2 +
> 5 files changed, 120 insertions(+), 9 deletions(-)
>
> --
> 2.43.0
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR