On 2025/4/8 23:23, Heming Zhao wrote:
The commit 4eb7b93e0310 ("ocfs2: improve write IO performance when
fragmentation is high") introduced another regression.
The following ocfs2-test case can trigger this issue:
${RESV_UNWRITTEN_BIN} -f ${WORK_PLACE}/large_testfile -s 0 -l \
$((${FILE_MAJOR_SIZE_M}*1024*1024))
In my env, test disk size (by "fdisk -l <dev>"):
53687091200 bytes, 104857600 sectors.
Above command is:
/usr/local/ocfs2-test/bin/resv_unwritten -f \
/mnt/ocfs2/ocfs2-activate-discontig-bg-dir/large_testfile -s 0 -l \
53187969024
Error log:
[*] Reserve 50724M space for a LARGE file, reserve 200M space for future test.
ioctl error 28: "No space left on device"
resv allocation failed Unknown error -1
reserve unwritten region from 0 to 53187969024.
Call flow:
__ocfs2_change_file_space //by ioctl OCFS2_IOC_RESVSP64
ocfs2_allocate_unwritten_extents //start:0 len:53187969024
while()
+ ocfs2_get_clusters //cpos:0, alloc_size:1623168 (cluster number)
+ ocfs2_extend_allocation
+ ocfs2_lock_allocators
| + choose OCFS2_AC_USE_MAIN & ocfs2_cluster_group_search
|
+ ocfs2_add_inode_data
ocfs2_add_clusters_in_btree
__ocfs2_claim_clusters
ocfs2_claim_suballoc_bits
+ During the allocation of the final part of the large file
(around 47GB), no chain had the required contiguous
bits_wanted. Consequently, the allocation failed.
How to fix:
When OCFS2 is encountering fragmented allocation, the file system should
stop attempting bits_wanted contiguous allocation and instead provide the
largest available contiguous free bits from the cluster groups.
Signed-off-by: Heming Zhao <heming.zhao@xxxxxxxx>
Fixes: 4eb7b93e0310 ("ocfs2: improve write IO performance when fragmentation is high")
---
fs/ocfs2/suballoc.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index fde75f2af37a..2e1689fc6cf7 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1796,6 +1796,7 @@ static int ocfs2_search_chain(struct ocfs2_alloc_context *ac,
{
int status;
u16 chain;
+ u32 contig_bits;
u64 next_group;
struct inode *alloc_inode = ac->ac_inode;
struct buffer_head *group_bh = NULL;
@@ -1821,10 +1822,23 @@ static int ocfs2_search_chain(struct ocfs2_alloc_context *ac,
status = -ENOSPC;
/* for now, the chain search is a bit simplistic. We just use
* the 1st group with any empty bits. */
- while ((status = ac->ac_group_search(alloc_inode, group_bh,
- bits_wanted, min_bits,
- ac->ac_max_block,
- res)) == -ENOSPC) {
+ while (1) {
+ if (ac->ac_which == OCFS2_AC_USE_MAIN_DISCONTIG) {
+ contig_bits = le16_to_cpu(bg->bg_contig_free_bits);
+ if (!contig_bits)
+ contig_bits = ocfs2_find_max_contig_free_bits(bg->bg_bitmap,
+ le16_to_cpu(bg->bg_bits), 0);
+ if (bits_wanted > contig_bits)
+ bits_wanted = contig_bits;
+ if (bits_wanted < min_bits)
+ bits_wanted = min_bits;
This seems only valid when bits_wanted changed?
BTW, the previous fix hasn't been merged yet:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/fs/ocfs2?h=next-20250411&id=767ba8b7cba3ca7a560d632f267f7aea35d54810
So should we drop that first and fold it into a whole one?