Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

From: Desmond Cheong Zhi Xi
Date: Thu Aug 19 2021 - 23:09:14 EST


On 20/8/21 1:34 am, David Sterba wrote:
On Fri, Aug 20, 2021 at 01:11:58AM +0800, Desmond Cheong Zhi Xi wrote:
The option #2 does not sound safe because the TGT bit is checked in
several places where device list is queried for various reasons, even
without a mounted filesystem.

Removing the assertion makes more sense but I'm still not convinced that
the this is expected/allowed state of a closed device.


Would it be better if we cleared the REPLACE_TGT bit only when closing
the device where device->devid == BTRFS_DEV_REPLACE_DEVID?

The first conditional in btrfs_close_one_device assumes that we can come
across such a device. If we come across it, we should properly reset it.

If other devices has this bit set, the ASSERT will still catch it and
let us know something is wrong.

That sounds great.

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 70f94b75f25a..a5afebb78ecf 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
fs_devices->rw_devices--;
}
+ if (device->devid == BTRFS_DEV_REPLACE_DEVID)
+ clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
+
if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
fs_devices->missing_devices--;

I'll do a few test rounds, thanks.

Just following up. Did that resolve the issue or is further
investigation needed?

The fix seems to work, I haven't seen the assertion fail anymore,
incidentally the crash also stopped to show up on an unpatched branch.


Sounds good, thanks for the update. If there's anything else I can help with, please let me know.