Re: [PATCH v2 06/14] btrfs: fix deletion of a range spanning parts two RAID stripe extents
From: Johannes Thumshirn
Date: Fri Jan 10 2025 - 06:34:45 EST
On 09.01.25 16:24, Filipe Manana wrote:
> On Tue, Jan 7, 2025 at 12:50 PM Johannes Thumshirn <jth@xxxxxxxxxx> wrote:
>>
>> From: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>>
>> When a user requests the deletion of a range that spans multiple stripe
>> extents and btrfs_search_slot() returns us the second RAID stripe extent,
>> we need to pick the previous item and truncate it, if there's still a
>> range to delete left, move on to the next item.
>>
>> The following diagram illustrates the operation:
>>
>> |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---|
>> |--- keep ---|--- drop ---|
>>
>> While at it, comment the trivial case of a whole item delete as well.
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>> ---
>> fs/btrfs/raid-stripe-tree.c | 28 ++++++++++++++++++++++++++++
>> 1 file changed, 28 insertions(+)
>>
>> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
>> index 79f8f692aaa8f6df2c9482fbd7777c2812528f65..893d963951315abfc734e1ca232b3087b7889431 100644
>> --- a/fs/btrfs/raid-stripe-tree.c
>> +++ b/fs/btrfs/raid-stripe-tree.c
>> @@ -103,6 +103,31 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>> found_end = found_start + key.offset;
>> ret = 0;
>>
>> + /*
>> + * The stripe extent starts before the range we want to delete,
>> + * but the range spans more than one stripe extent:
>> + *
>> + * |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---|
>> + * |--- keep ---|--- drop ---|
>> + *
>> + * This means we have to get the previous item, truncate its
>> + * length and then restart the search.
>> + */
>> + if (found_start > start) {
>> +
>> + ret = btrfs_previous_item(stripe_root, path, start,
>> + BTRFS_RAID_STRIPE_KEY);
>> + if (ret < 0)
>> + break;
>> + ret = 0;
>> +
>> + leaf = path->nodes[0];
>> + slot = path->slots[0];
>> + btrfs_item_key_to_cpu(leaf, &key, slot);
>> + found_start = key.objectid;
>> + found_end = found_start + key.offset;
>
> Hum, this isn't safe, ignoring the case where btrfs_previous_item()
> returns 1, meaning there's no previous item.
>
> In that case previous_item() returns pointing to the same leaf and
> slot, and then below we delete the item instead of trimming it
> (increasing its range start and decreasing its length).
Good catch!
But what should we do when we end up in this situation? Doesn't that
mean that either do_free_extent_accounting() passed in a bogus range or
btrfs_previous_item() should've done one more call to btrfs_pref_leaf()?