Re: kernel BUG in __clear_extent_bit
From: David Sterba
Date: Fri Sep 24 2021 - 11:16:31 EST
On Thu, Sep 23, 2021 at 10:24:51AM +0800, Hao Sun wrote:
> Qu Wenruo <quwenruo.btrfs@xxxxxxx> 于2021年9月15日周三 下午1:33写道:
> >
> >
> >
> > On 2021/9/15 上午10:20, Hao Sun wrote:
> > > Hello,
> > >
> > > When using Healer to fuzz the latest Linux kernel, the following crash
> > > was triggered.
> > >
> > > HEAD commit: 6880fa6c5660 Linux 5.15-rc1
> > > git tree: upstream
> > > console output:
> > > https://drive.google.com/file/d/1-9wwV6-OmBcJvHGCbMbP5_uCVvrUdTp3/view?usp=sharing
> > > kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing
> > > C reproducer: https://drive.google.com/file/d/1eXePTqMQ5ZA0TWtgpTX50Ez4q9ZKm_HE/view?usp=sharing
> > > Syzlang reproducer:
> > > https://drive.google.com/file/d/11s13louoKZ7Uz0mdywM2jmE9B1JEIt8U/view?usp=sharing
> > >
> > > If you fix this issue, please add the following tag to the commit:
> > > Reported-by: Hao Sun <sunhao.th@xxxxxxxxx>
> > >
> > > loop1: detected capacity change from 0 to 32768
> > > BTRFS info (device loop1): disk space caching is enabled
> > > BTRFS info (device loop1): has skinny extents
> > > BTRFS info (device loop1): enabling ssd optimizations
> > > FAULT_INJECTION: forcing a failure.
> > > name failslab, interval 1, probability 0, space 0, times 0
> > > CPU: 1 PID: 25852 Comm: syz-executor Not tainted 5.15.0-rc1 #16
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> > > Call Trace:
> > > __dump_stack lib/dump_stack.c:88 [inline]
> > > dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
> > > fail_dump lib/fault-inject.c:52 [inline]
> > > should_fail+0x13c/0x160 lib/fault-inject.c:146
> > > should_failslab+0x5/0x10 mm/slab_common.c:1328
> > > slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494
> > > slab_alloc_node mm/slub.c:3120 [inline]
> > > slab_alloc mm/slub.c:3214 [inline]
> > > kmem_cache_alloc+0x44/0x280 mm/slub.c:3219
> > > alloc_extent_state+0x1e/0x1c0 fs/btrfs/extent_io.c:340
> >
> > This is the one of the core systems btrfs uses, and we really don't want
> > that to fail.
> >
> > Thus in fact it does some preallocation to prevent failure.
> >
> > But for error injection case, we can still hit BUG_ON() which is used to
> > catch ENOMEM.
> >
>
> Hello,
>
> Fuzzer triggered following crashes repeatedly when the `fault
> injection` was enabled.
>
> HEAD commit: 92477dd1faa6 Merge tag 's390-5.15-ebpf-jit-fixes'
> git tree: upstream
> kernel config: https://drive.google.com/file/d/1KgvcM8i_3hQiOL3fUh3JFpYNQM4itvV4/view?usp=sharing
> [1] kernel BUG in btrfs_free_tree_block (fs/btrfs/extent-tree.c:3297):
> https://paste.ubuntu.com/p/ZtzVKWbcGm/
> [2] kernel BUG in clear_state_bit (fs/btrfs/extent_io.c:658!):
> https://paste.ubuntu.com/p/hps2wXPG2b/
> [3] kernel BUG in set_extent_bit (fs/btrfs/extent_io.c:1021):
> https://paste.ubuntu.com/p/dcptjYYxgd/
> [4] kernel BUG in set_state_bits (fs/btrfs/extent_io.c:939):
> https://paste.ubuntu.com/p/NV9qtKB4KZ/
>
> All the above crashes were triggered directly by the `BUG_ON()` macro
> in the corresponding location.
> Most `BUG_ON()` was hit due to `ENOMEM` when fault injected.
> Would it be better for btrfs to handle the `ENOMEM` error, e.g.,
> gracefully return, rather than panic the kernel?
If it would be so easy we would have done it already. Unfortunatelly in
some deep call chains or under locks or from contexts where the whole
operation is split accross subsystems or threads it's not always
possible to roll back. Some tricks like preallocation can bail out early
but we can't preallocate everything. The allocations are done under
GFP_NOFS that still has the no-fail semantics. The error you report do
not normally happen because allocator tries hard to return some memory.