Re: [PATCH v8 03/11] btrfs: add support for inserting raid stripe extents

From: Qu Wenruo
Date: Thu Sep 14 2023 - 06:07:19 EST




On 2023/9/14 19:21, Johannes Thumshirn wrote:
On 14.09.23 11:25, Qu Wenruo wrote:
+static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
+ int num_stripes,
+ struct btrfs_io_context *bioc)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_key stripe_key;
+ struct btrfs_root *stripe_root = btrfs_stripe_tree_root(fs_info);
+ u8 encoding = btrfs_bg_type_to_raid_encoding(bioc->map_type);
+ struct btrfs_stripe_extent *stripe_extent;
+ size_t item_size;
+ int ret;
+
+ item_size = struct_size(stripe_extent, strides, num_stripes);

I guess David has already pointed out this can be done at initialization
and make it const.

Will do


+
+ stripe_extent = kzalloc(item_size, GFP_NOFS);
+ if (!stripe_extent) {
+ btrfs_abort_transaction(trans, -ENOMEM);
+ btrfs_end_transaction(trans);
+ return -ENOMEM;
+ }
+
+ btrfs_set_stack_stripe_extent_encoding(stripe_extent, encoding);
+ for (int i = 0; i < num_stripes; i++) {
+ u64 devid = bioc->stripes[i].dev->devid;
+ u64 physical = bioc->stripes[i].physical;
+ u64 length = bioc->stripes[i].length;
+ struct btrfs_raid_stride *raid_stride =
+ &stripe_extent->strides[i];
+
+ if (length == 0)
+ length = bioc->size;
+
+ btrfs_set_stack_raid_stride_devid(raid_stride, devid);
+ btrfs_set_stack_raid_stride_physical(raid_stride, physical);
+ btrfs_set_stack_raid_stride_length(raid_stride, length);
+ }
+
+ stripe_key.objectid = bioc->logical;
+ stripe_key.type = BTRFS_RAID_STRIPE_KEY;
+ stripe_key.offset = bioc->size;
+
+ ret = btrfs_insert_item(trans, stripe_root, &stripe_key, stripe_extent,
+ item_size);

Have you tested in near-real-world on how continous the RST items could
be for RAID0/RAID10?

My concern here is, we may want to try our best to reduce the size of
RST, due to the 64K BTRFS_STRIPE_LEN.


There are two things I can do for it. First is trying to merge contiguus
RST items

This is much easier, as the RST lookup code is already taking the length
into consideration, thus only the add path need some work.

Although I'm not sure how effective it would be in real world.
As if the merge rate is only 5%, then it barely makes a difference.

Maybe you don't need to implement a full merge in this version, but just
do some trace events to see the merge rate?

and second make BTRFS_STRIPE_LEN a mkfs time constant instead
of a compile time constant.

Please be very careful about this, we have quite some bitmap relying on
this. (IIRC RAID56 and scrub)

Currently unsigned long can only support up to 64 bits, thus the maximum
stripe length would be 256K, but I'm pretty sure there would be other
hidden traps somewhere else.

Otherwise the main workflow of RST looks good to me.

Thanks,
Qu

But both can be done in a second step.

+ switch (map_type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+ case BTRFS_BLOCK_GROUP_DUP:
+ case BTRFS_BLOCK_GROUP_RAID1:
+ case BTRFS_BLOCK_GROUP_RAID1C3:
+ case BTRFS_BLOCK_GROUP_RAID1C4:
+ ret = btrfs_insert_mirrored_raid_extents(trans, ordered_extent,
+ map_type);
+ break;
+ case BTRFS_BLOCK_GROUP_RAID0:
+ ret = btrfs_insert_striped_raid_extents(trans, ordered_extent,
+ map_type);
+ break;
+ case BTRFS_BLOCK_GROUP_RAID10:
+ ret = btrfs_insert_striped_mirrored_raid_extents(trans, ordered_extent, map_type);
+ break;
+ default:
+ ret = -EINVAL;

Maybe we want to be a little more noisy?

OK.