Re: [PATCH v2 3/3] btrfs: update stripe_extent delete loop assumptions

From: Qu Wenruo
Date: Thu Jul 11 2024 - 03:55:21 EST




在 2024/7/11 17:14, Qu Wenruo 写道:


在 2024/7/11 16:25, Qu Wenruo 写道:


在 2024/7/11 15:51, Johannes Thumshirn 写道:
From: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>

btrfs_delete_raid_extent() was written under the assumption, that it's
call-chain always passes a start, length tuple that matches a single
extent. But btrfs_delete_raid_extent() is called by
do_free_extent_acounting() which in term is called by > __btrfs_free_extent().

But from the call site __btrfs_free_extent(), it is still called for a single extent.

Or we will get an error and abort the current transaction.

Or does it mean, one data extent can have multiple RST entries?

Is that a non-zoned RST specific behavior?
As I still remember that we split ordered extents for zoned devices, so that it should always have one extent for each split OE.


OK, it's indeed an RST specific behavior (at least for RST with non-zoned devices).

I can have the following layout:

item 15 key (258 EXTENT_DATA 419430400) itemoff 15306 itemsize 53
generation 10 type 1 (regular)
extent data disk byte 1808793600 nr 117440512
extent data offset 0 nr 117440512 ram 117440512
extent compression 0 (none)

Which is a large data extent with 112MiB length.

Meanwhile for the RST entries there are 3 split ones:

item 13 key (1808793600 RAID_STRIPE 33619968) itemoff 15835 itemsize 32
stripe 0 devid 2 physical 1787822080
stripe 1 devid 1 physical 1808793600
item 14 key (1842413568 RAID_STRIPE 58789888) itemoff 15803 itemsize 32
stripe 0 devid 2 physical 1821442048
stripe 1 devid 1 physical 1842413568
item 15 key (1901203456 RAID_STRIPE 25030656) itemoff 15771 itemsize 32
stripe 0 devid 2 physical 1880231936
stripe 1 devid 1 physical 1901203456

So yes, it's possible to have multiple RST entries for a single data extent, it's no longer the old zoned behavior.

In that case, the patch looks fine to me.

Reviewed-by: Qu Wenruo <wqu@xxxxxxxx>

Thanks,
Qu



Thanks,
Qu


But this call-chain passes in a start address and a length that can
possibly match multiple on-disk extents.

Mind to give a more detailed example on this?

Thanks,
Qu


To make this possible, we have to adjust the start and length of each
btree node lookup, to not delete beyond the requested range.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
---
  fs/btrfs/raid-stripe-tree.c | 5 +++++
  1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index fd56535b2289..6f65be334637 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -66,6 +66,11 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
          if (ret)
              break;
+        start += key.offset;
+        length -= key.offset;
+        if (length == 0)
+            break;
+
          btrfs_release_path(path);
      }