Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

From: Desmond Cheong Zhi Xi
Date: Fri Aug 13 2021 - 05:57:45 EST


On 13/8/21 4:51 pm, David Sterba wrote:
On Fri, Aug 13, 2021 at 01:31:25AM +0800, Desmond Cheong Zhi Xi wrote:
On 12/8/21 11:50 pm, David Sterba wrote:
On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
On 12/8/21 6:38 pm, David Sterba wrote:
On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
list_del_init(&device->dev_alloc_list);
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+ fs_devices->rw_devices--;
}
list_del_init(&device->dev_list);
fs_devices->num_devices--;

I've hit a crash on master branch with stacktrace very similar to one
this bug was supposed to fix. It's a failed assertion on device close.
This patch was the last one to touch it and it matches some of the
keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
the original patch but was not reinstated in your fix.

I'm not sure how reproducible it is, right now I have only one instance
and am hunting another strange problem. They could be related.

assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150

https://susepaste.org/view/raw/18223056 full log with other stacktraces,
possibly relatedg


Looking at the logs, it seems that a dev_replace was started, then
suspended. But it wasn't canceled or resumed before the fs devices were
closed.

I'll investigate further, just throwing some observations out there.

Thanks. I'm testing the patch revert, no crash after first loop, I'll
run a few more to be sure as it's not entirely reliable.

Sending the revert is option of last resort as we're approaching end of
5.14 dev cycle and the crash prevents testing (unlike the fuzzer
warning).


I might be missing something, so any thoughts would be appreciated. But
I don't think the assertion in btrfs_close_one_device is correct.

From what I see, this crash happens when close_ctree is called while a
dev_replace hasn't completed. In close_ctree, we suspend the
dev_replace, but keep the replace target around so that we can resume
the dev_replace procedure when we mount the root again. This is the call
trace:

close_ctree():
btrfs_dev_replace_suspend_for_unmount();
btrfs_close_devices():
btrfs_close_fs_devices():
btrfs_close_one_device():
ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
&device->dev_state));

However, since the replace target sticks around, there is a device with
BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in
btrfs_close_one_device.

Two options I can think of:

- We could remove the assertion.

- Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in
btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set
again in btrfs_init_dev_replace if the dev_replace->replace_state is
BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as
a little odd because the device is still the replace target when
mounting in the future.

The option #2 does not sound safe because the TGT bit is checked in
several places where device list is queried for various reasons, even
without a mounted filesystem.

Removing the assertion makes more sense but I'm still not convinced that
the this is expected/allowed state of a closed device.


Would it be better if we cleared the REPLACE_TGT bit only when closing
the device where device->devid == BTRFS_DEV_REPLACE_DEVID?

The first conditional in btrfs_close_one_device assumes that we can come
across such a device. If we come across it, we should properly reset it.

If other devices has this bit set, the ASSERT will still catch it and
let us know something is wrong.

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 70f94b75f25a..a5afebb78ecf 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
fs_devices->rw_devices--;
}
+ if (device->devid == BTRFS_DEV_REPLACE_DEVID)
+ clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
+
if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
fs_devices->missing_devices--;