Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

From: David Sterba
Date: Thu Aug 12 2021 - 06:41:54 EST


On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
> When removing a writeable device in __btrfs_free_extra_devids, the rw
> device count should be decremented.
>
> This error was caught by Syzbot which reported a warning in
> close_fs_devices because fs_devices->rw_devices was not 0 after
> closing all devices. Here is the call trace that was observed:
>
> btrfs_mount_root():
> btrfs_scan_one_device():
> device_list_add(); <---------------- device added
> btrfs_open_devices():
> open_fs_devices():
> btrfs_open_one_device(); <-------- writable device opened,
> rw device count ++
> btrfs_fill_super():
> open_ctree():
> btrfs_free_extra_devids():
> __btrfs_free_extra_devids(); <--- writable device removed,
> rw device count not decremented
> fail_tree_roots:
> btrfs_close_devices():
> close_fs_devices(); <------- rw device count off by 1
>
> As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
> mount if we don't have replace item with target device"), rw_devices
> was decremented on removing a writable device in
> __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
> was not set for the device. However, this check does not need to be
> reinstated as it is now redundant and incorrect.
>
> In __btrfs_free_extra_devids, we skip removing the device if it is the
> target for replacement. This is done by checking whether device->devid
> == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
> only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
> should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
> and so it's redundant to test for that bit.
>
> Additionally, following commit 82372bc816d7 ("Btrfs: make
> the logic of source device removing more clear"), rw_devices is
> incremented whenever a writeable device is added to the alloc
> list (including the target device in btrfs_dev_replace_finishing), so
> all removals of writable devices from the alloc list should also be
> accompanied by a decrement to rw_devices.
>
> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
> Reported-by: syzbot+a70e2ad0879f160b9217@xxxxxxxxxxxxxxxxxxxxxxxxx
> Tested-by: syzbot+a70e2ad0879f160b9217@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@xxxxxxxxx>
> Reviewed-by: Anand Jain <anand.jain@xxxxxxxxxx>
> ---
> fs/btrfs/volumes.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 807502cd6510..916c25371658 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
> list_del_init(&device->dev_alloc_list);
> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> + fs_devices->rw_devices--;
> }
> list_del_init(&device->dev_list);
> fs_devices->num_devices--;

I've hit a crash on master branch with stacktrace very similar to one
this bug was supposed to fix. It's a failed assertion on device close.
This patch was the last one to touch it and it matches some of the
keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
the original patch but was not reinstated in your fix.

I'm not sure how reproducible it is, right now I have only one instance
and am hunting another strange problem. They could be related.

assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150

https://susepaste.org/view/raw/18223056 full log with other stacktraces,
possibly relatedg

[ 3310.383268] kernel BUG at fs/btrfs/ctree.h:3431!
[ 3310.384586] invalid opcode: 0000 [#1] PREEMPT SMP
[ 3310.385848] CPU: 1 PID: 3982 Comm: umount Tainted: G W 5.14.0-rc5-default+ #1532
[ 3310.388216] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 3310.391054] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
[ 3310.397628] RSP: 0018:ffffb7a5454c7db8 EFLAGS: 00010246
[ 3310.399079] RAX: 0000000000000068 RBX: ffff978364b91c00 RCX: 0000000000000000
[ 3310.400990] RDX: 0000000000000000 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
[ 3310.402504] RBP: ffff9783523a4c00 R08: 0000000000000001 R09: 0000000000000001
[ 3310.404025] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9783523a4d18
[ 3310.405074] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000003
[ 3310.406130] FS: 00007f61c8f42800(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
[ 3310.407649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3310.409022] CR2: 000056190cffa810 CR3: 0000000030b96002 CR4: 0000000000170ea0
[ 3310.410111] Call Trace:
[ 3310.410561] btrfs_close_one_device.cold+0x11/0x55 [btrfs]
[ 3310.411788] close_fs_devices+0x44/0xb0 [btrfs]
[ 3310.412654] btrfs_close_devices+0x48/0x160 [btrfs]
[ 3310.413449] generic_shutdown_super+0x69/0x100
[ 3310.414155] kill_anon_super+0x14/0x30
[ 3310.414802] btrfs_kill_super+0x12/0x20 [btrfs]
[ 3310.415767] deactivate_locked_super+0x2c/0xa0
[ 3310.416562] cleanup_mnt+0x144/0x1b0
[ 3310.417153] task_work_run+0x59/0xa0
[ 3310.417744] exit_to_user_mode_loop+0xe7/0xf0
[ 3310.418440] exit_to_user_mode_prepare+0xaf/0xf0
[ 3310.419425] syscall_exit_to_user_mode+0x19/0x50
[ 3310.420281] do_syscall_64+0x4a/0x90
[ 3310.420934] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3310.421718] RIP: 0033:0x7f61c91654db