Re: [PATCH v6] ext4: fix ABBA deadlock in ext4_xattr_inode_cache_find()
From: Jan Kara
Date: Fri Jun 26 2026 - 10:32:12 EST
On Fri 26-06-26 05:48:21, Aditya Srivastava wrote:
> From: Aditya Prakash Srivastava <aditya.ansh182@xxxxxxxxx>
>
> Syzbot/stress-ng reported an ABBA deadlock in ext4 when exercising
> concurrent xattr workloads (using the ea_inode mount/format option).
>
> The deadlock occurs between the running transaction and the eviction
> thread:
> - Task 1 (stress-ng): Holds a reference to a shared mbcache_entry (ce)
> and calls ext4_xattr_inode_cache_find() -> ext4_iget() to retrieve
> the corresponding EA inode. Since the EA inode is currently being
> evicted, ext4_iget() blocks in __wait_on_freeing_inode() waiting for
> eviction to complete.
> - Task 2 (eviction thread): Currently evicting the same EA inode in
> ext4_evict_ea_inode(). It calls mb_cache_entry_wait_unused(oe) which
> blocks waiting for Task 1 to release the reference to the mbcache_entry.
>
> To break this deadlock, implement a new ext4_iget() configuration flag
> named EXT4_IGET_NOWAIT. When set, perform a non-blocking lookup of the
> inode via VFS's find_inode_nowait() API.
>
> If the inode is currently being evicted (marked with I_FREEING or
> I_WILL_FREE) or created (I_CREATING), or if it is not present in the VFS
> inode cache (cache miss), simply skip it (returning -ENOENT) rather than
> waiting for eviction/creation to complete, breaking the ABBA cycle.
>
> Since we return -ENOENT immediately on a cache miss, we never attempt to
> allocate a new inode or call iget_locked(), completely eliminating any
> TOCTOU race window.
>
> If the returned inode is I_NEW, wait for its initialization to clear via
> wait_on_new_inode(). If initialization fails and the inode is unhashed
> during wait_on_new_inode() waking up (e.g., due to an I/O read error in
> another thread), safely drop the reference and return -ENOENT. This
> unhashed check is executed unconditionally on all cache-hit pathways to
> properly handle concurrent initialization failures.
>
> Finally, standard validation checks (including is_bad_inode,
> EXT4_EA_INODE_FL, file_acl, and xattr flags) are executed as normal inside
> check_igot_inode() to fully guarantee VFS-layer safety.
>
> In ext4_xattr_inode_cache_find(), invoke ext4_iget() with the new
> EXT4_IGET_NOWAIT flag to perform the non-blocking cache search.
>
> Suggested-by: Jan Kara <jack@xxxxxxx>
> Reported-by: Colin Ian King <colin.i.king@xxxxxxxxx>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219283
> Fixes: 0a46ef234756 ("ext4: do not create EA inode under buffer lock")
> Signed-off-by: Aditya Prakash Srivastava <aditya.ansh182@xxxxxxxxx>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@xxxxxxx>
Honza
> ---
> Changes in v6:
> - Drop the redundant is_freeing tracking variable completely, as pointed
> out by Jan Kara.
> - Return -ENOENT instead of -ESTALE on cache misses and unhashed inode
> checks, as requested by Jan Kara.
>
> Changes in v5:
> - Address two critical concurrency issues flagged by the Sashiko AI bot in v4:
> 1. Resolve the Time-Of-Check to Time-Of-Use (TOCTOU) race window between
> find_inode_nowait() and iget_locked() by returning -ESTALE immediately
> on a VFS cache miss. This completely bypasses fallback to iget_locked()
> and prevents potential ABBA deadlocks.
> 2. Fix the improperly nested inode_unhashed() safety check by moving it
> outside the I_NEW condition block, ensuring it runs unconditionally
> on all cache-hit pathways to prevent false-positive filesystem
> corruption errors during concurrent initialization failures.
>
> Changes in v4:
> - Check if the inode was unhashed during wait_on_new_inode() waking up
> to handle transient initialization failures (like I/O read errors)
> gracefully. Dropping the reference and returning -ESTALE prevents
> false filesystem corruption errors (__ext4_error), as found by the
> Sashiko AI bot.
>
> Changes in v3:
> - Implement a new ext4_iget() configuration flag named EXT4_IGET_NOWAIT to
> fully contain the non-blocking lookup and VFS-level validations within
> inode.c, as requested by Jan Kara.
> - Skip inodes currently being created (I_CREATING), following Jan Kara's
> direct feedback.
> - Remove all open-coded match helpers and VFS state-checks from xattr.c.
>
> Changes in v2:
> - Read inode state locklessly using inode_state_read_once() to resolve
> a lockdep assertion on cache hit.
> - Manually restore essential inode/ea_inode validations on the retrieved
> inode (is_bad_inode, EXT4_EA_INODE_FL, file_acl, and xattr checks) to
> match VFS safety guarantees and prevent using corrupted/failed inodes.
>
> fs/ext4/ext4.h | 3 ++-
> fs/ext4/inode.c | 35 ++++++++++++++++++++++++++++++++---
> fs/ext4/xattr.c | 2 +-
> 3 files changed, 35 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index b37c136ea3ab..c76dd0bdd3d8 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3144,7 +3144,8 @@ typedef enum {
> EXT4_IGET_SPECIAL = 0x0001, /* OK to iget a system inode */
> EXT4_IGET_HANDLE = 0x0002, /* Inode # is from a handle */
> EXT4_IGET_BAD = 0x0004, /* Allow to iget a bad inode */
> - EXT4_IGET_EA_INODE = 0x0008 /* Inode should contain an EA value */
> + EXT4_IGET_EA_INODE = 0x0008, /* Inode should contain an EA value */
> + EXT4_IGET_NOWAIT = 0x0010 /* Non-blocking lookup (skip if freeing) */
> } ext4_iget_flags;
>
> extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index ce99807c5f5b..a091a43959a3 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5270,6 +5270,20 @@ void ext4_set_inode_mapping_order(struct inode *inode)
> mapping_set_folio_order_range(inode->i_mapping, min_order, max_order);
> }
>
> +static int ext4_iget_match(struct inode *inode, u64 ino, void *data)
> +{
> + if (inode->i_ino != ino)
> + return 0;
> + spin_lock(&inode->i_lock);
> + if (inode_state_read(inode) & (I_FREEING | I_WILL_FREE | I_CREATING)) {
> + spin_unlock(&inode->i_lock);
> + return -1;
> + }
> + __iget(inode);
> + spin_unlock(&inode->i_lock);
> + return 1;
> +}
> +
> struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> ext4_iget_flags flags, const char *function,
> unsigned int line)
> @@ -5298,9 +5312,24 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> return ERR_PTR(-EFSCORRUPTED);
> }
>
> - inode = iget_locked(sb, ino);
> - if (!inode)
> - return ERR_PTR(-ENOMEM);
> + if (flags & EXT4_IGET_NOWAIT) {
> + inode = find_inode_nowait(sb, ino, ext4_iget_match, NULL);
> + if (!inode)
> + return ERR_PTR(-ENOENT);
> +
> + if (inode_state_read_once(inode) & I_NEW)
> + wait_on_new_inode(inode);
> +
> + if (unlikely(inode_unhashed(inode))) {
> + iput(inode);
> + return ERR_PTR(-ENOENT);
> + }
> + } else {
> + inode = iget_locked(sb, ino);
> + if (!inode)
> + return ERR_PTR(-ENOMEM);
> + }
> +
> if (!(inode_state_read_once(inode) & I_NEW)) {
> ret = check_igot_inode(inode, flags, function, line);
> if (ret) {
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 982a1f831e22..21b5670d8503 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1550,7 +1550,7 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value,
>
> while (ce) {
> ea_inode = ext4_iget(inode->i_sb, ce->e_value,
> - EXT4_IGET_EA_INODE);
> + EXT4_IGET_EA_INODE | EXT4_IGET_NOWAIT);
> if (IS_ERR(ea_inode))
> goto next_entry;
> ext4_xattr_inode_set_class(ea_inode);
> --
> 2.47.3
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR