Re: [PATCH] ext4: correct best extent lstart adjustment logic

From: Baokun Li
Date: Wed Jan 31 2024 - 22:31:25 EST


On 2024/1/31 20:46, Jan Kara wrote:
[Added Ojaswin to CC as an author of the discussed patch]

On Mon 22-01-24 20:33:32, Baokun Li wrote:
When yangerkun review commit 93cdf49f6eca ("ext4: Fix best extent lstart
adjustment logic in ext4_mb_new_inode_pa()"), it was found that the best
extent did not completely cover the original request after adjusting the
best extent lstart in ext4_mb_new_inode_pa() as follows:

original request: 2/10(8)
normalized request: 0/64(64)
best extent: 0/9(9)

When we check if best ex can be kept at start of goal, ac_o_ex.fe_logical
is 2 less than the adjusted best extent logical end 9, so we think the
adjustment is done. But obviously 0/9(9) doesn't cover 2/10(8), so we
should determine here if the original request logical end is less than or
equal to the adjusted best extent logical end.

Hello Jan,

Thanks for the detailed explanation! 😉

I'm sorry for a bit delayed reply. Why do you think it is a problem if the
resulting extent doesn't cover the full original range?

We adjust lstart when ac_o_ex.fe_len < ac_b_ex.fe_len and
ac_b_ex.fe_len < ac->ac_orig_goal_len, in which case the length of
the allocation is greater than the length of the original request,
and we would normally assume that this allocation would satisfy
the request for the block allocation without the need for an
additional allocation.

     /* we can't allocate as much as normalizer wants.
      * so, found space must get proper lstart
      * to cover original request */

And the comment in the code states that we need to "cover original
request", but this logic is not fulfilled in the code below, so yangerkun
is very puzzled and presents the above counterexample, so we think
it's a problem.

We must always
cover the first block of the original extent so that the allocation makes
forward progress. But otherwise we choose to align to the start / end of
the goal range to reduce fragmentation even if we don't cover the whole
requested range - the rest of the range will be covered by the next
allocation.
Totally agree, for the example above, if we end up with a total of 64
blocks, then the final extent distribution might look like this:

Before:  [0/9(9)], [9/64(55)]
Patched: [0/2(2)], [2/11(9)], [11/64(53)]

So the question is really whether we expect fewer allocations currently
or fewer fragments later.
Also there is a problem with trying to cover the whole original
range described in [1]. Essentially the goal range does not need to cover
the whole original range and if we try to align the allocated range to
cover the whole original range, it may result in exceeding the goal range
and thus overlapping preallocations and triggering asserts in the prealloc
code.

So if we decided we want to handle the case you describe in a better way,
we'd need something making sure we don't exceed the goal range.

Honza

[1] https://lore.kernel.org/all/Y+UzQJRIJEiAr4Z4@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
goal_start          B    original_start   A              goal_end
  |-----------------|----------*----------|-----------------|
     best_ex_len                              best_ex_len

The current logic guarantees that the goal range will not be exceeded.
If original_start + best_ex_len > goal_end, then in case1 the ex_end
will be adjusted to align with the goal_end, and if the
goal_end < original_end, then another block allocation will be triggered,
which is fine. But in other cases, we can guarantee that the original
request will be covered by the adjusted best ex.

The problem is that in case2, when we aligned ex_fe_start with
goal_start, we exited the alignment as soon as we contained the
original_start, which may not have contained the original_end and
triggered an additional block allocation, but if we jumped to case3
we could cover the entire original request.

In general, this patch will not cause the goal range to be exceeded.
Moreover, the best extent len is not modified during the adjustment
process, and it is already checked by the previous assertion, so replace
the check for fe_len with a check for the best extent logical end.

Cc: stable@xxxxxxxxxx
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: yangerkun <yangerkun@xxxxxxxxxx>
Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>
---
fs/ext4/mballoc.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f44f668e407f..fa5977fe8d72 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5146,6 +5146,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
.fe_len = ac->ac_orig_goal_len,
};
loff_t orig_goal_end = extent_logical_end(sbi, &ex);
+ loff_t o_ex_end = extent_logical_end(sbi, &ac->ac_o_ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5161,7 +5162,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* 1. Check if best ex can be kept at end of goal (before
* cr_best_avail trimmed it) and still cover original start
* 2. Else, check if best ex can be kept at start of goal and
- * still cover original start
+ * still cover original end
* 3. Else, keep the best ex at start of original request.
*/
ex.fe_len = ac->ac_b_ex.fe_len;
@@ -5171,7 +5172,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
goto adjust_bex;
ex.fe_logical = ac->ac_g_ex.fe_logical;
- if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ if (o_ex_end <= extent_logical_end(sbi, &ex))
goto adjust_bex;
ex.fe_logical = ac->ac_o_ex.fe_logical;
@@ -5179,7 +5180,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
- BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
+ BUG_ON(o_ex_end > extent_logical_end(sbi, &ex));
BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
--
2.31.1

Cheers!
--
With Best Regards,
Baokun Li
.