Re: WARNING in btrfs_run_delayed_refs

From: Qu Wenruo
Date: Tue Sep 14 2021 - 23:13:07 EST




On 2021/9/15 上午10:56, Hao Sun wrote:
Qu Wenruo <quwenruo.btrfs@xxxxxxx> 于2021年9月15日周三 上午10:20写道:



On 2021/9/15 上午10:14, Hao Sun wrote:
Hello,

When using Healer to fuzz the latest Linux kernel, the following crash
was triggered.

HEAD commit: 6880fa6c5660 Linux 5.15-rc1
git tree: upstream
console output:
https://drive.google.com/file/d/1gd0dl74MyvvVAYqsCDKSGmcfpZszD0kt/view?usp=sharing
kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing
C reproducer: https://drive.google.com/file/d/1WKQukijOJ7D0NYk1iKf47FESjYfAjrlz/view?usp=sharing
Syzlang reproducer:
https://drive.google.com/file/d/1Gi9-Mgbrjw1OI-ymO4zDVIFej2Qf4ppL/view?usp=sharing

If you fix this issue, please add the following tag to the commit:
Reported-by: Hao Sun <sunhao.th@xxxxxxxxx>

loop11: detected capacity change from 0 to 32768
BTRFS info (device loop11): disk space caching is enabled
BTRFS info (device loop11): has skinny extents
BTRFS info (device loop11): enabling ssd optimizations
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 0 PID: 7769 Comm: syz-executor Not tainted 5.15.0-rc1 #16
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
fail_dump lib/fault-inject.c:52 [inline]
should_fail+0x13c/0x160 lib/fault-inject.c:146
should_failslab+0x5/0x10 mm/slab_common.c:1328
slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494
slab_alloc_node mm/slub.c:3120 [inline]
slab_alloc mm/slub.c:3214 [inline]
kmem_cache_alloc+0x44/0x280 mm/slub.c:3219
__btrfs_free_extent.isra.53+0x7b/0x1180 fs/btrfs/extent-tree.c:2942
run_delayed_tree_ref fs/btrfs/extent-tree.c:1687 [inline]
run_one_delayed_ref fs/btrfs/extent-tree.c:1711 [inline]
btrfs_run_delayed_refs_for_head fs/btrfs/extent-tree.c:1952 [inline]
__btrfs_run_delayed_refs+0x83e/0x1a00 fs/btrfs/extent-tree.c:2017
btrfs_run_delayed_refs+0xb1/0x2b0 fs/btrfs/extent-tree.c:2148
btrfs_commit_transaction+0x7d/0x1430 fs/btrfs/transaction.c:2065
btrfs_sync_fs+0x9a/0x430 fs/btrfs/super.c:1426
btrfs_ioctl+0x209b/0x3be0 fs/btrfs/ioctl.c:4970
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0xb6/0x100 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x46ae99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f8ac08c7c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000000078c0a0 RCX: 000000000046ae99
RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003
RBP: 00007f8ac08c7c80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000000 R14: 000000000078c0a0 R15: 00007ffccc1d6390
------------[ cut here ]------------
WARNING: CPU: 0 PID: 7769 at fs/btrfs/extent-tree.c:2150
btrfs_run_delayed_refs+0x245/0x2b0 fs/btrfs/extent-tree.c:2150

This is again btrfs_abort_transaction().

This makes me wonder, should we add ENOMEM to abort transaction warning
condition to make the ENOMEM injection code happy.

Mind to test the following diff?

Thanks,
Qu

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8c6ee947a68d..6bc79f6716fa 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3548,7 +3548,8 @@ do {
\
/* Report first abort since mount */ \
if (!test_and_set_bit(BTRFS_FS_STATE_TRANS_ABORTED, \
&((trans)->fs_info->fs_state))) { \
- if ((errno) != -EIO && (errno) != -EROFS) { \
+ if ((errno) != -EIO && (errno) != -EROFS && \
+ (errno) != -ENOMEM) { \
WARN(1, KERN_DEBUG \
"BTRFS: Transaction aborted (error %d)\n", \
(errno)); \


Just tested it. This did fixed most `WARNING` reports, e.g., "WARNING
in btrfs_add_link", "WARNING in btrfs_run_delayed_refs".
I think it would be better if we can judge whether the `ENOMEM` is
caused by `fault injection` or not.

This is really hard to distinguish.

If the fuzzer test tool can do it by relating the transaction abort
message with error injection log, it would save us a lot of time and
prevent false alerts.

For now, I guess the above diff would be a quick and dirty filter for
ENOMEM injection tests.

Thanks,
Qu