Re: [PATCH v4 2/3] btrfs: Split remaining space to discard in chunks

From: Qu Wenruo
Date: Mon Sep 16 2024 - 06:39:20 EST




在 2024/9/16 19:46, Luca Stefani 写道:
Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
mostly empty although we will do the split according to our super block
locations, the last super block ends at 256G, we can submit a huge
discard for the range [256G, 8T), causing a super large delay.

We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE
in preparation of introduction of cancellation signals handling.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
Signed-off-by: Luca Stefani <luca.stefani.ge1@xxxxxxxxx>
---
fs/btrfs/extent-tree.c | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a5966324607d..cbe66d0acff8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
u64 *discarded_bytes)
{
int j, ret = 0;
- u64 bytes_left, end;
+ u64 bytes_left, bytes_to_discard, end;
u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
/* Adjust the range to be aligned to 512B sectors if necessary. */
@@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
bytes_left = end - start;
}
- if (bytes_left) {
+ while (bytes_left) {
+ if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE)
+ bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE;

That MAX_DATA_CHUNK_SIZE is only possible for RAID0/RAID10/RAID5/RAID6, by spanning the device extents across multiple devices.

For each device, the maximum size is limited to 1G (check init_alloc_chunk_ctl_policy_regular()).

So you can just limit it to 1G instead.
(If you want, you can also extract that into a macro as a cleanup).

Furthermore, you can use min() instead of a if ().

So you only need:

bytes_to_discard = min(SZ_1G, bytes_left);

Otherwise this looks good enough to me.
If the 1G size is not good enough, we can later tune it to smaller values.

Personally speaking I think 1G would be enough.

Thanks,
Qu
+ else
+ bytes_to_discard = bytes_left;
+
ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
- bytes_left >> SECTOR_SHIFT,
+ bytes_to_discard >> SECTOR_SHIFT,
GFP_NOFS);
- if (!ret)
- *discarded_bytes += bytes_left;
+
+ if (ret) {
+ if (ret != -EOPNOTSUPP)
+ break;
+ continue;
+ }
+
+ start += bytes_to_discard;
+ bytes_left -= bytes_to_discard;
+ *discarded_bytes += bytes_to_discard;
}
+
return ret;
}