Re: [PATCH v5] ext4: fix ABBA deadlock in ext4_xattr_inode_cache_find()

From: Aditya Prakash Srivastava

Date: Thu Jun 25 2026 - 07:10:29 EST


Hi Jan,

My sincere apologies for the rapid succession of patches and the noise
on the list. You are completely right; I got over-excited trying to
quickly address bugs flagged by the Sashiko-AI bot in previous
iterations, and I should have thought them through more carefully. I
will be more careful going forward.

Regarding your feedback:

You are entirely correct that 'is_freeing' is redundant. Since
find_inode_nowait() already returns NULL if the match callback returns
-1, we can simply drop that tracking variable and check !inode.

I also agree that returning -ENOENT on cache misses and unhashed checks
is much more semantically accurate than -ESTALE.

Thank you again for your patience, invaluable reviews, and guidance.

Best regards,
Aditya


On Thu, Jun 25, 2026 at 3:53 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Thu 25-06-26 06:50:09, Aditya Srivastava wrote:
> > From: Aditya Prakash Srivastava <aditya.ansh182@xxxxxxxxx>
> >
> > Syzbot/stress-ng reported an ABBA deadlock in ext4 when exercising
> > concurrent xattr workloads (using the ea_inode mount/format option).
> >
> > The deadlock occurs between the running transaction and the eviction
> > thread:
> > - Task 1 (stress-ng): Holds a reference to a shared mbcache_entry (ce)
> > and calls ext4_xattr_inode_cache_find() -> ext4_iget() to retrieve
> > the corresponding EA inode. Since the EA inode is currently being
> > evicted, ext4_iget() blocks in __wait_on_freeing_inode() waiting for
> > eviction to complete.
> > - Task 2 (eviction thread): Currently evicting the same EA inode in
> > ext4_evict_ea_inode(). It calls mb_cache_entry_wait_unused(oe) which
> > blocks waiting for Task 1 to release the reference to the mbcache_entry.
> >
> > To break this deadlock, implement a new ext4_iget() configuration flag
> > named EXT4_IGET_NOWAIT. When set, perform a non-blocking lookup of the
> > inode via VFS's find_inode_nowait() API.
> >
> > If the inode is currently being evicted (marked with I_FREEING or
> > I_WILL_FREE) or created (I_CREATING), or if it is not present in the VFS
> > inode cache (cache miss), simply skip it (returning -ESTALE) rather than
> > waiting for eviction/creation to complete, breaking the ABBA cycle.
> >
> > Since we return -ESTALE immediately on a cache miss, we never attempt to
> > allocate a new inode or call iget_locked(), completely eliminating any
> > TOCTOU race window.
> >
> > If the returned inode is I_NEW, wait for its initialization to clear via
> > wait_on_new_inode(). If initialization fails and the inode is unhashed
> > during wait_on_new_inode() waking up (e.g., due to an I/O read error in
> > another thread), safely drop the reference and return -ESTALE. This
> > unhashed check is executed unconditionally on all cache-hit pathways to
> > properly handle concurrent initialization failures.
> >
> > Finally, standard validation checks (including is_bad_inode,
> > EXT4_EA_INODE_FL, file_acl, and xattr flags) are executed as normal inside
> > check_igot_inode() to fully guarantee VFS-layer safety.
> >
> > In ext4_xattr_inode_cache_find(), invoke ext4_iget() with the new
> > EXT4_IGET_NOWAIT flag to perform the non-blocking cache search.
> >
> > Suggested-by: Jan Kara <jack@xxxxxxx>
> > Reported-by: Colin Ian King <colin.i.king@xxxxxxxxx>
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219283
> > Fixes: 0a46ef234756 ("ext4: do not create EA inode under buffer lock")
> > Signed-off-by: Aditya Prakash Srivastava <aditya.ansh182@xxxxxxxxx>
>
> Three patch submissions in a day? I think you are overdoing it. Please test
> more, think more about what you do (i.e., really understand what the code
> does, not just whatever some AI agent suggested) and submit less.
>
> > +static int ext4_iget_match(struct inode *inode, u64 ino, void *data)
> > +{
> > + bool *is_freeing = data;
> > +
> > + if (inode->i_ino != ino)
> > + return 0;
> > + spin_lock(&inode->i_lock);
> > + if (inode_state_read(inode) & (I_FREEING | I_WILL_FREE | I_CREATING)) {
> > + if (is_freeing)
> > + *is_freeing = true;
> > + spin_unlock(&inode->i_lock);
> > + return -1;
> > + }
> > + __iget(inode);
> > + spin_unlock(&inode->i_lock);
> > + return 1;
> > +}
> > +
> > struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> > ext4_iget_flags flags, const char *function,
> > unsigned int line)
> > @@ -5298,9 +5316,26 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> > return ERR_PTR(-EFSCORRUPTED);
> > }
> >
> > - inode = iget_locked(sb, ino);
> > - if (!inode)
> > - return ERR_PTR(-ENOMEM);
> > + if (flags & EXT4_IGET_NOWAIT) {
> > + bool is_freeing = false;
> > +
> > + inode = find_inode_nowait(sb, ino, ext4_iget_match, &is_freeing);
> > + if (is_freeing || !inode)
>
> is_freeing is pointless. Just drop it and use !inode.
>
> > + return ERR_PTR(-ESTALE);
>
> I'd prefer -ENOENT here instead of -ESTALE. It's kind of more specific.
>
> > + if (inode_state_read_once(inode) & I_NEW)
> > + wait_on_new_inode(inode);
> > +
> > + if (unlikely(inode_unhashed(inode))) {
> > + iput(inode);
> > + return ERR_PTR(-ESTALE);
>
> Here -ENOENT as well please.
>
> Otherwise the patch looks good to me.
>
> Honza
>
> > + }
> > + } else {
> > + inode = iget_locked(sb, ino);
> > + if (!inode)
> > + return ERR_PTR(-ENOMEM);
> > + }
> > +
> > if (!(inode_state_read_once(inode) & I_NEW)) {
> > ret = check_igot_inode(inode, flags, function, line);
> > if (ret) {
> > diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> > index 982a1f831e22..21b5670d8503 100644
> > --- a/fs/ext4/xattr.c
> > +++ b/fs/ext4/xattr.c
> > @@ -1550,7 +1550,7 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value,
> >
> > while (ce) {
> > ea_inode = ext4_iget(inode->i_sb, ce->e_value,
> > - EXT4_IGET_EA_INODE);
> > + EXT4_IGET_EA_INODE | EXT4_IGET_NOWAIT);
> > if (IS_ERR(ea_inode))
> > goto next_entry;
> > ext4_xattr_inode_set_class(ea_inode);
> > --
> > 2.47.3
> >
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR