[PATCH 5.10 127/137] btrfs: fix unmountable seed device after fstrim

From: Greg Kroah-Hartman
Date: Tue Jun 08 2021 - 15:10:41 EST


From: Anand Jain <anand.jain@xxxxxxxxxx>

commit 5e753a817b2d5991dfe8a801b7b1e8e79a1c5a20 upstream.

The following test case reproduces an issue of wrongly freeing in-use
blocks on the readonly seed device when fstrim is called on the rw sprout
device. As shown below.

Create a seed device and add a sprout device to it:

$ mkfs.btrfs -fq -dsingle -msingle /dev/loop0
$ btrfstune -S 1 /dev/loop0
$ mount /dev/loop0 /btrfs
$ btrfs dev add -f /dev/loop1 /btrfs
BTRFS info (device loop0): relocating block group 290455552 flags system
BTRFS info (device loop0): relocating block group 1048576 flags system
BTRFS info (device loop0): disk added /dev/loop1
$ umount /btrfs

Mount the sprout device and run fstrim:

$ mount /dev/loop1 /btrfs
$ fstrim /btrfs
$ umount /btrfs

Now try to mount the seed device, and it fails:

$ mount /dev/loop0 /btrfs
mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

Block 5292032 is missing on the readonly seed device:

$ dmesg -kt | tail
<snip>
BTRFS error (device loop0): bad tree block start, want 5292032 have 0
BTRFS warning (device loop0): couldn't read-tree root
BTRFS error (device loop0): open_ctree failed

>From the dump-tree of the seed device (taken before the fstrim). Block
5292032 belonged to the block group starting at 5242880:

$ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP
<snip>
item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
block group used 114688 chunk_objectid 256 flags METADATA
<snip>

>From the dump-tree of the sprout device (taken before the fstrim).
fstrim used block-group 5242880 to find the related free space to free:

$ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP
<snip>
item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
block group used 32768 chunk_objectid 256 flags METADATA
<snip>

BPF kernel tracing the fstrim command finds the missing block 5292032
within the range of the discarded blocks as below:

kprobe:btrfs_discard_extent {
printf("freeing start %llu end %llu num_bytes %llu:\n",
arg1, arg1+arg2, arg2);
}

freeing start 5259264 end 5406720 num_bytes 147456
<snip>

Fix this by avoiding the discard command to the readonly seed device.

Reported-by: Chris Murphy <lists@xxxxxxxxxxxxxxxxx>
CC: stable@xxxxxxxxxxxxxxx # 4.4+
Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>
Signed-off-by: Anand Jain <anand.jain@xxxxxxxxxx>
Signed-off-by: David Sterba <dsterba@xxxxxxxx>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@xxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
---
fs/btrfs/extent-tree.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1297,16 +1297,20 @@ int btrfs_discard_extent(struct btrfs_fs
for (i = 0; i < bbio->num_stripes; i++, stripe++) {
u64 bytes;
struct request_queue *req_q;
+ struct btrfs_device *device = stripe->dev;

- if (!stripe->dev->bdev) {
+ if (!device->bdev) {
ASSERT(btrfs_test_opt(fs_info, DEGRADED));
continue;
}
- req_q = bdev_get_queue(stripe->dev->bdev);
+ req_q = bdev_get_queue(device->bdev);
if (!blk_queue_discard(req_q))
continue;

- ret = btrfs_issue_discard(stripe->dev->bdev,
+ if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
+ continue;
+
+ ret = btrfs_issue_discard(device->bdev,
stripe->physical,
stripe->length,
&bytes);