Re: [PATCH v10 5/5] ext4: prevent deadlock from duplicate EA inode references on corrupted fs
From: Zhou, Yun
Date: Sun Jun 28 2026 - 23:18:28 EST
On 6/29/26 02:57, XIAO WU wrote:
Hi,
I've been following the sashiko-bot reviews on this series and was able
to reproduce the llist corruption issue that the bot has flagged — it
triggers a kernel BUG at ext4_put_super() when an EA inode is leaked
onto the orphan list at unmount.
The sashiko review is at:
https://sashiko.dev/#/patchset/20260625152941.24788-1- yun.zhou@xxxxxxxxxxxxx
> +/* Put all EA inodes on a processed llist via ext4_put_ea_inode. */
> +static void ext4_put_ea_inode_llist(struct super_block *sb,
> + struct llist_head *processed)
> +{
> + struct llist_node *node = llist_del_all(processed);
> + struct llist_node *next;
> +
> + while (node) {
> + struct ext4_inode_info *ei = container_of(node,
> + struct ext4_inode_info, i_ea_iput_node);
> + next = node->next;
> + ext4_put_ea_inode(sb, &ei->vfs_inode);
> + node = next;
> + }
> +}
The per-call `processed` llist is declared on the stack of
ext4_xattr_delete_inode(). If two threads concurrently evict files
that share the same EA inode (same large xattr value), both threads
call llist_add() on the SAME embedded i_ea_iput_node, each trying to
add it to their own stack-local llist head.
Since llist_add() unconditionally writes `node->next = head->first`
(which is a stack address from the caller's frame), the two threads
corrupt each other's `node->next` pointer. When
ext4_put_ea_inode_llist() later traverses the list, it follows a
dangling next pointer into freed/concurrent stack memory, causing the
EA inode to be silently skipped during deferred iput processing.
=== Reproduction ===
Kernel: 7.1.0-next-20260624-gb27bd6a65c17 #1 SMP PREEMPT_RT
Config: CONFIG_EXT4_FS=y, CONFIG_EXT4_FS_POSIX_ACL=y, CONFIG_KASAN=y
QEMU: QEMU Standard PC (Q35 + ICH9, 2009)
The PoC creates two files sharing the same large xattr value (thus
sharing the same EA inode), then concurrently unlinks them from two
pthreads synchronized by a barrier on the same CPU. This triggers the
llist_add() race on the shared i_ea_iput_node, leaving the EA inode
unprocessed. The EA inode (nlink=0) sits on the orphan list, and
umount hits the BUG() assertion.
Thank you very much for testing and sharing the PoC. I used it to verify the issue and found a pre-existing bug (fixed by a new patch), but the issue reported by sashiko-ai could not be reproduced.
# ./repro-xiaowu
=== ext4 EA inode llist Race PoC ===
Iterations: 2000
Discarding device blocks: done
Creating filesystem with 16384 4k blocks and 16384 inodes
Allocating group tables: done
Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done
Filesystem ready at /mnt/ea_race
Starting race threads...
[ 44.741752][ T3761] EXT4-fs error (device loop0): ext4_xattr_inode_cache_find:1616: inode #14: comm repro-xiaowu: missing EA_INODE flag
[ 44.743426][ T3762] EXT4-fs error (device loop0): ext4_xattr_inode_cache_find:1616: inode #14: comm repro-xiaowu: missing EA_INODE flag
[ 44.923046][ T3762] EXT4-fs error (device loop0): ext4_xattr_inode_cache_find:1616: inode #14: comm repro-xiaowu: missing EA_INODE flag
[ 44.924562][ T3761] EXT4-fs error (device loop0): ext4_xattr_inode_cache_find:1616: inode #14: comm repro-xiaowu: missing EA_INODE flag
Race loop complete.
Check dmesg for crash evidence.
(none)
Done.
Thanks again,
Yun