[PATCH v5 0/3] ext4: fix xattr iput deadlock with s_writepages_rwsem

From: Yun Zhou

Date: Mon Jun 15 2026 - 07:58:49 EST


This series fixes a circular lock dependency reported by syzbot:

s_writepages_rwsem --> jbd2_handle --> xattr_sem --> s_writepages_rwsem

The deadlock occurs when iput() on an ea_inode triggers write_inode_now()
while xattr_sem and a jbd2 handle are held. The triggering path is
during mount-time orphan cleanup (!SB_ACTIVE) where iput_final() calls
write_inode_now() synchronously.

Patch 1 blocks the deadlock by skipping extra isize expansion when
!SB_ACTIVE -- this prevents the xattr manipulation path from being
entered during mount.

Patch 2 is a belt-and-suspenders semantic improvement: an inode under
eviction never needs extra isize expansion.

Patch 3 is a structural improvement: defer all ea_inode iput() calls
until after xattr_sem is released, reducing lock nesting depth and
preventing future code paths from reintroducing the lock ordering issue.

Link: https://syzkaller.appspot.com/bug?extid=5d19358d7eb30ffb0cc5

v5:
- Split into 3 patches for easier review.
- Add explicit !SB_ACTIVE early-return in ext4_try_to_expand_extra_isize()
to block ALL mount-time paths (ext4_process_orphan -> ext4_truncate ->
ext4_mark_inode_dirty), not just the eviction path. v4 only relied on
EXT4_STATE_NO_EXPAND which doesn't cover orphan truncation.

v4:
- Comprehensive rewrite of the deferred iput mechanism.
- Thread ea_inode_array through ext4_expand_extra_isize_ea() and
ext4_xattr_move_to_block() so ALL ea_inode iputs in the expand
path are deferred, not just those in ext4_xattr_block_set().
- Add NULL safety to ext4_expand_inode_array(): when ea_inode_array
pointer is NULL, fall back to synchronous iput (for callers like
ext4_initxattrs that only run with SB_ACTIVE).
- Use __GFP_NOFAIL to guarantee deferred array growth, eliminating
fallback to synchronous iput under locks.
- Update ext4_xattr_ibody_set() and ext4_xattr_set_entry() signatures
to accept ea_inode_array, converting ALL iput(ea_inode) calls.
- Set EXT4_STATE_NO_EXPAND in ext4_evict_inode() before
ext4_mark_inode_dirty().

v3:
- Check ext4_expand_inode_array() return value; fallback to
direct iput() on ENOMEM to prevent inode leak.
- Make ext4_xattr_set_handle() take an optional ea_inode_array
output parameter so callers can free after ext4_journal_stop(),
avoiding the jbd2_handle vs s_writepages_rwsem AB-BA.
- Pass ea_inode_array directly to ext4_xattr_release_block()
instead of using a local array freed under xattr_sem.
- Move ext4_xattr_inode_array_free() after ext4_journal_stop()

v2:
- Defer iput() in ext4_xattr_block_set() via ea_inode_array,
freed after xattr_sem is released. Fixes the root cause.

v1:
- Set EXT4_STATE_NO_EXPAND in ext4_evict_inode() to skip expand
on inodes being deleted. Only fixes the syzbot reproducer, not
the underlying lock ordering violation.

Yun Zhou (3):
ext4: skip extra isize expansion during mount to prevent deadlock
ext4: set EXT4_STATE_NO_EXPAND in ext4_evict_inode
ext4: defer iput() on ea_inodes to reduce lock holding scope

fs/ext4/acl.c | 2 +-
fs/ext4/crypto.c | 4 +-
fs/ext4/inline.c | 8 ++--
fs/ext4/inode.c | 26 +++++++++--
fs/ext4/xattr.c | 93 ++++++++++++++++++++++++----------------
fs/ext4/xattr.h | 10 +++--
fs/ext4/xattr_security.c | 3 +-
7 files changed, 95 insertions(+), 51 deletions(-)

--
2.43.0