[PATCH v7 0/4] ext4: fix xattr iput deadlock with s_writepages_rwsem
From: Yun Zhou
Date: Tue Jun 16 2026 - 11:30:09 EST
This series fixes a circular lock dependency reported by syzbot:
s_writepages_rwsem --> jbd2_handle --> xattr_sem --> s_writepages_rwsem
The deadlock occurs when iput() on an EA inode triggers write_inode_now()
while xattr_sem and a jbd2 handle are held. The triggering path is
during mount-time orphan cleanup (!SB_ACTIVE) where iput_final() calls
write_inode_now() synchronously.
Patch 1 blocks the deadlock by skipping extra isize expansion when
!SB_ACTIVE -- this prevents the xattr manipulation path from being
entered during mount.
Patch 2 is a belt-and-suspenders semantic improvement: an inode under
eviction never needs extra isize expansion.
Patches 3-4 are a structural improvement using a per-sb workqueue:
Patch 3 introduces ext4_put_ea_inode(), which does direct iput() when
SB_ACTIVE (zero overhead) and defers to a workqueue when !SB_ACTIVE.
It also converts the first call site (ext4_xattr_block_set release
path) which previously called iput under xattr_sem + jbd2 handle.
Patch 4 converts the remaining EA inode iput() calls that execute
under locks. Sites where direct iput() is provably safe (i_nlink=0
after dec_ref, or lookup-only paths) are left unchanged with comments.
Link: https://syzkaller.appspot.com/bug?extid=5d19358d7eb30ffb0cc5
v7:
- Replaced the deferred-iput array threading approach (v4-v6) with a
simpler per-sb workqueue + lock-free llist design. No function
signature changes needed. ext4_put_ea_inode() does direct iput when
SB_ACTIVE (zero overhead in normal operation) and defers to the
workqueue only during mount (!SB_ACTIVE).
- Converted the iput in ext4_xattr_delete_inode()'s quota accounting
loop to ext4_put_ea_inode() to eliminate a lockdep-reportable lock
ordering violation (jbd2_handle -> iput -> s_writepages_rwsem).
- Moved flush_work() before the if (sbi->s_journal) check in
ext4_put_super() to cover nojournal mode.
v6:
- ext4_inline_data_truncate(): use local ea_inode_array instead of
passing NULL, freed after ext4_journal_stop(). Fixes a deadlock
reachable via crafted filesystem where inline data xattr entry has
e_value_inum set: orphan cleanup -> ext4_truncate ->
ext4_inline_data_truncate -> iput under !SB_ACTIVE.
v5:
- Split into 3 patches for easier review.
- Add explicit !SB_ACTIVE early-return in ext4_try_to_expand_extra_isize()
to block ALL mount-time paths (ext4_process_orphan -> ext4_truncate ->
ext4_mark_inode_dirty), not just the eviction path. v4 only relied on
EXT4_STATE_NO_EXPAND which doesn't cover orphan truncation.
v4:
- Comprehensive rewrite of the deferred iput mechanism.
- Thread ea_inode_array through ext4_expand_extra_isize_ea() and
ext4_xattr_move_to_block() so ALL ea_inode iputs in the expand
path are deferred, not just those in ext4_xattr_block_set().
- Add NULL safety to ext4_expand_inode_array(): when ea_inode_array
pointer is NULL, fall back to synchronous iput (for callers like
ext4_initxattrs that only run with SB_ACTIVE).
- Use __GFP_NOFAIL to guarantee deferred array growth, eliminating
fallback to synchronous iput under locks.
- Update ext4_xattr_ibody_set() and ext4_xattr_set_entry() signatures
to accept ea_inode_array, converting ALL iput(ea_inode) calls.
- Set EXT4_STATE_NO_EXPAND in ext4_evict_inode() before
ext4_mark_inode_dirty().
v3:
- Check ext4_expand_inode_array() return value; fallback to
direct iput() on ENOMEM to prevent inode leak.
- Make ext4_xattr_set_handle() take an optional ea_inode_array
output parameter so callers can free after ext4_journal_stop(),
avoiding the jbd2_handle vs s_writepages_rwsem AB-BA.
- Pass ea_inode_array directly to ext4_xattr_release_block()
instead of using a local array freed under xattr_sem.
- Move ext4_xattr_inode_array_free() after ext4_journal_stop()
v2:
- Defer iput() in ext4_xattr_block_set() via ea_inode_array,
freed after xattr_sem is released. Fixes the root cause.
v1:
- Set EXT4_STATE_NO_EXPAND in ext4_evict_inode() to skip expand
on inodes being deleted. Only fixes the syzbot reproducer, not
the underlying lock ordering violation.
Yun Zhou (4):
ext4: skip extra isize expansion during mount to prevent deadlock
ext4: set EXT4_STATE_NO_EXPAND in ext4_evict_inode
ext4: introduce ext4_put_ea_inode() for safe deferred iput
ext4: convert remaining EA inode iput() calls to ext4_put_ea_inode()
fs/ext4/ext4.h | 5 +++
fs/ext4/inode.c | 11 +++++
fs/ext4/super.c | 6 +++
fs/ext4/xattr.c | 105 +++++++++++++++++++++++++++++++++++++++++++-----
fs/ext4/xattr.h | 2 +
5 files changed, 120 insertions(+), 9 deletions(-)
--
2.43.0