Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids
From: David Sterba
Date: Thu Aug 12 2021 - 11:53:30 EST
On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
> On 12/8/21 6:38 pm, David Sterba wrote:
> > On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
> >> When removing a writeable device in __btrfs_free_extra_devids, the rw
> >> device count should be decremented.
> >>
> >> This error was caught by Syzbot which reported a warning in
> >> close_fs_devices because fs_devices->rw_devices was not 0 after
> >> closing all devices. Here is the call trace that was observed:
> >>
> >> btrfs_mount_root():
> >> btrfs_scan_one_device():
> >> device_list_add(); <---------------- device added
> >> btrfs_open_devices():
> >> open_fs_devices():
> >> btrfs_open_one_device(); <-------- writable device opened,
> >> rw device count ++
> >> btrfs_fill_super():
> >> open_ctree():
> >> btrfs_free_extra_devids():
> >> __btrfs_free_extra_devids(); <--- writable device removed,
> >> rw device count not decremented
> >> fail_tree_roots:
> >> btrfs_close_devices():
> >> close_fs_devices(); <------- rw device count off by 1
> >>
> >> As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
> >> mount if we don't have replace item with target device"), rw_devices
> >> was decremented on removing a writable device in
> >> __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
> >> was not set for the device. However, this check does not need to be
> >> reinstated as it is now redundant and incorrect.
> >>
> >> In __btrfs_free_extra_devids, we skip removing the device if it is the
> >> target for replacement. This is done by checking whether device->devid
> >> == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
> >> only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
> >> should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
> >> and so it's redundant to test for that bit.
> >>
> >> Additionally, following commit 82372bc816d7 ("Btrfs: make
> >> the logic of source device removing more clear"), rw_devices is
> >> incremented whenever a writeable device is added to the alloc
> >> list (including the target device in btrfs_dev_replace_finishing), so
> >> all removals of writable devices from the alloc list should also be
> >> accompanied by a decrement to rw_devices.
> >>
> >> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
> >> Reported-by: syzbot+a70e2ad0879f160b9217@xxxxxxxxxxxxxxxxxxxxxxxxx
> >> Tested-by: syzbot+a70e2ad0879f160b9217@xxxxxxxxxxxxxxxxxxxxxxxxx
> >> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@xxxxxxxxx>
> >> Reviewed-by: Anand Jain <anand.jain@xxxxxxxxxx>
> >> ---
> >> fs/btrfs/volumes.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index 807502cd6510..916c25371658 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
> >> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
> >> list_del_init(&device->dev_alloc_list);
> >> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> >> + fs_devices->rw_devices--;
> >> }
> >> list_del_init(&device->dev_list);
> >> fs_devices->num_devices--;
> >
> > I've hit a crash on master branch with stacktrace very similar to one
> > this bug was supposed to fix. It's a failed assertion on device close.
> > This patch was the last one to touch it and it matches some of the
> > keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
> > the original patch but was not reinstated in your fix.
> >
> > I'm not sure how reproducible it is, right now I have only one instance
> > and am hunting another strange problem. They could be related.
> >
> > assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
> >
> > https://susepaste.org/view/raw/18223056 full log with other stacktraces,
> > possibly relatedg
> >
>
> Looking at the logs, it seems that a dev_replace was started, then
> suspended. But it wasn't canceled or resumed before the fs devices were
> closed.
>
> I'll investigate further, just throwing some observations out there.
Thanks. I'm testing the patch revert, no crash after first loop, I'll
run a few more to be sure as it's not entirely reliable.
Sending the revert is option of last resort as we're approaching end of
5.14 dev cycle and the crash prevents testing (unlike the fuzzer
warning).