Re: [PATCH v5 1/3] btrfs: don't try to relocate the data relocation block-group
From: Naohiro Aota
Date: Wed May 29 2024 - 00:48:30 EST
On Fri, May 24, 2024 at 06:29:09PM GMT, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>
> When relocating block-groups, either via auto reclaim or manual
> balance, skip the data relocation block-group.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
> ---
Thinking of this patch gave me one question. What happens when we manually
relocate a data relocation block group with a RAID-type conversion?
prepare_allocation_zoned() give the data_reloc_bg hint which points to
pre-conversion RAID level. That block group is rejected because of
ffe_ctl->flags mismatch. And, find_free_extent() begins its loop with
ffe_ctl->index = (post-conversion RAID index). All the BGs with that RAID
level are rejected because they are not fs_info->data_reloc_bg.
When ffe_ctl->index goes to the pre-conversion RAID index, the data
relocation BG could be selected. But, as we reject a SINGLE BG for RAID
(DUP) allocation, that won't work for SINGLE to RAID conversion.
The loop will eventually end up allocating a new chunk with a
post-conversion RAID level. However, it is still not eligible for an
allocation because it is not fs_info->data_reloc_bg.
This question may be out of the scope of this patch. But, that scenario
should be addressed in this series or another series.
Considering this scenario, I'm not sure just skipping a data relocation BG
is a good solution. It will make it impossible to convert a data relocation
BG.
> fs/btrfs/block-group.c | 2 ++
> fs/btrfs/relocation.c | 7 +++++++
> fs/btrfs/volumes.c | 2 ++
> 3 files changed, 11 insertions(+)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 9910bae89966..9a01bbad45f6 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1921,6 +1921,8 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
> div64_u64(zone_unusable * 100, bg->length));
> trace_btrfs_reclaim_block_group(bg);
> ret = btrfs_relocate_chunk(fs_info, bg->start);
> + if (ret == -EBUSY)
> + ret = 0;
> if (ret) {
> btrfs_dec_block_group_ro(bg);
> btrfs_err(fs_info, "error relocating chunk %llu",
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 5f1a909a1d91..39e2db9af64f 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4037,6 +4037,13 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
> int rw = 0;
> int err = 0;
>
> + spin_lock(&fs_info->relocation_bg_lock);
> + if (group_start == fs_info->data_reloc_bg) {
> + spin_unlock(&fs_info->relocation_bg_lock);
> + return -EBUSY;
> + }
> + spin_unlock(&fs_info->relocation_bg_lock);
> +
> /*
> * This only gets set if we had a half-deleted snapshot on mount. We
> * cannot allow relocation to start while we're still trying to clean up
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 3f70f727dacf..75da3a32885b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3367,6 +3367,8 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> btrfs_scrub_pause(fs_info);
> ret = btrfs_relocate_block_group(fs_info, chunk_offset);
> btrfs_scrub_continue(fs_info);
> + if (ret == -EBUSY)
> + return 0;
> if (ret) {
> /*
> * If we had a transaction abort, stop all running scrubs.
>
> --
> 2.43.0
>