Re: [PATCH v10 5/5] ext4: prevent deadlock from duplicate EA inode references on corrupted fs

From: Zhou, Yun

Date: Sun Jun 28 2026 - 23:18:28 EST



On 6/29/26 02:57, XIAO WU wrote:

Hi,

I've been following the sashiko-bot reviews on this series and was able
to reproduce the llist corruption issue that the bot has flagged — it
triggers a kernel BUG at ext4_put_super() when an EA inode is leaked
onto the orphan list at unmount.

The sashiko review is at:
https://sashiko.dev/#/patchset/20260625152941.24788-1- yun.zhou@xxxxxxxxxxxxx

> +/* Put all EA inodes on a processed llist via ext4_put_ea_inode. */
> +static void ext4_put_ea_inode_llist(struct super_block *sb,
> +                                    struct llist_head *processed)
> +{
> +    struct llist_node *node = llist_del_all(processed);
> +    struct llist_node *next;
> +
> +    while (node) {
> +        struct ext4_inode_info *ei = container_of(node,
> +                        struct ext4_inode_info, i_ea_iput_node);
> +        next = node->next;
> +        ext4_put_ea_inode(sb, &ei->vfs_inode);
> +        node = next;
> +    }
> +}

The per-call `processed` llist is declared on the stack of
ext4_xattr_delete_inode().  If two threads concurrently evict files
that share the same EA inode (same large xattr value), both threads
call llist_add() on the SAME embedded i_ea_iput_node, each trying to
add it to their own stack-local llist head.

Since llist_add() unconditionally writes `node->next = head->first`
(which is a stack address from the caller's frame), the two threads
corrupt each other's `node->next` pointer.  When
ext4_put_ea_inode_llist() later traverses the list, it follows a
dangling next pointer into freed/concurrent stack memory, causing the
EA inode to be silently skipped during deferred iput processing.

=== Reproduction ===

Kernel: 7.1.0-next-20260624-gb27bd6a65c17 #1 SMP PREEMPT_RT
Config: CONFIG_EXT4_FS=y, CONFIG_EXT4_FS_POSIX_ACL=y, CONFIG_KASAN=y
QEMU:   QEMU Standard PC (Q35 + ICH9, 2009)

The PoC creates two files sharing the same large xattr value (thus
sharing the same EA inode), then concurrently unlinks them from two
pthreads synchronized by a barrier on the same CPU.  This triggers the
llist_add() race on the shared i_ea_iput_node, leaving the EA inode
unprocessed.  The EA inode (nlink=0) sits on the orphan list, and
umount hits the BUG() assertion.


Thank you very much for testing and sharing the PoC. I used it to verify the issue and found a pre-existing bug (fixed by a new patch), but the issue reported by sashiko-ai could not be reproduced.

# ./repro-xiaowu
=== ext4 EA inode llist Race PoC ===
Iterations: 2000
Discarding device blocks: done
Creating filesystem with 16384 4k blocks and 16384 inodes

Allocating group tables: done
Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

Filesystem ready at /mnt/ea_race
Starting race threads...
[ 44.741752][ T3761] EXT4-fs error (device loop0): ext4_xattr_inode_cache_find:1616: inode #14: comm repro-xiaowu: missing EA_INODE flag
[ 44.743426][ T3762] EXT4-fs error (device loop0): ext4_xattr_inode_cache_find:1616: inode #14: comm repro-xiaowu: missing EA_INODE flag
[ 44.923046][ T3762] EXT4-fs error (device loop0): ext4_xattr_inode_cache_find:1616: inode #14: comm repro-xiaowu: missing EA_INODE flag
[ 44.924562][ T3761] EXT4-fs error (device loop0): ext4_xattr_inode_cache_find:1616: inode #14: comm repro-xiaowu: missing EA_INODE flag

Race loop complete.
Check dmesg for crash evidence.
(none)
Done.

Thanks again,
Yun