Re: kernel BUG in __clear_extent_bit
From: Hao Sun
Date: Wed Sep 22 2021 - 22:25:12 EST
Qu Wenruo <quwenruo.btrfs@xxxxxxx> 于2021年9月15日周三 下午1:33写道:
>
>
>
> On 2021/9/15 上午10:20, Hao Sun wrote:
> > Hello,
> >
> > When using Healer to fuzz the latest Linux kernel, the following crash
> > was triggered.
> >
> > HEAD commit: 6880fa6c5660 Linux 5.15-rc1
> > git tree: upstream
> > console output:
> > https://drive.google.com/file/d/1-9wwV6-OmBcJvHGCbMbP5_uCVvrUdTp3/view?usp=sharing
> > kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing
> > C reproducer: https://drive.google.com/file/d/1eXePTqMQ5ZA0TWtgpTX50Ez4q9ZKm_HE/view?usp=sharing
> > Syzlang reproducer:
> > https://drive.google.com/file/d/11s13louoKZ7Uz0mdywM2jmE9B1JEIt8U/view?usp=sharing
> >
> > If you fix this issue, please add the following tag to the commit:
> > Reported-by: Hao Sun <sunhao.th@xxxxxxxxx>
> >
> > loop1: detected capacity change from 0 to 32768
> > BTRFS info (device loop1): disk space caching is enabled
> > BTRFS info (device loop1): has skinny extents
> > BTRFS info (device loop1): enabling ssd optimizations
> > FAULT_INJECTION: forcing a failure.
> > name failslab, interval 1, probability 0, space 0, times 0
> > CPU: 1 PID: 25852 Comm: syz-executor Not tainted 5.15.0-rc1 #16
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
> > fail_dump lib/fault-inject.c:52 [inline]
> > should_fail+0x13c/0x160 lib/fault-inject.c:146
> > should_failslab+0x5/0x10 mm/slab_common.c:1328
> > slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494
> > slab_alloc_node mm/slub.c:3120 [inline]
> > slab_alloc mm/slub.c:3214 [inline]
> > kmem_cache_alloc+0x44/0x280 mm/slub.c:3219
> > alloc_extent_state+0x1e/0x1c0 fs/btrfs/extent_io.c:340
>
> This is the one of the core systems btrfs uses, and we really don't want
> that to fail.
>
> Thus in fact it does some preallocation to prevent failure.
>
> But for error injection case, we can still hit BUG_ON() which is used to
> catch ENOMEM.
>
Hello,
Fuzzer triggered following crashes repeatedly when the `fault
injection` was enabled.
HEAD commit: 92477dd1faa6 Merge tag 's390-5.15-ebpf-jit-fixes'
git tree: upstream
kernel config: https://drive.google.com/file/d/1KgvcM8i_3hQiOL3fUh3JFpYNQM4itvV4/view?usp=sharing
[1] kernel BUG in btrfs_free_tree_block (fs/btrfs/extent-tree.c:3297):
https://paste.ubuntu.com/p/ZtzVKWbcGm/
[2] kernel BUG in clear_state_bit (fs/btrfs/extent_io.c:658!):
https://paste.ubuntu.com/p/hps2wXPG2b/
[3] kernel BUG in set_extent_bit (fs/btrfs/extent_io.c:1021):
https://paste.ubuntu.com/p/dcptjYYxgd/
[4] kernel BUG in set_state_bits (fs/btrfs/extent_io.c:939):
https://paste.ubuntu.com/p/NV9qtKB4KZ/
All the above crashes were triggered directly by the `BUG_ON()` macro
in the corresponding location.
Most `BUG_ON()` was hit due to `ENOMEM` when fault injected.
Would it be better for btrfs to handle the `ENOMEM` error, e.g.,
gracefully return, rather than panic the kernel?
Regards
Hao