Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()

From: Kemeng Shi
Date: Sun Sep 03 2023 - 23:00:36 EST




on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> writes:
>
>> This patch separates block bitmap and buddy bitmap freeing in order to
>> udpate block bitmap with ext4_mb_mark_context in following patch.
> ^^^ update.
>
>>
>> Separated freeing is safe with concurrent allocation as long as:
>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>> Then freed block will only be available to allocation when both buddy
>> bitmap and block bitmap are updated by freeing.
>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>
> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
> bitmap right. Continue below...
>
>>
>> Separated freeing has no race with generate_buddy as:
>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>> initialization of the buddy page will be executed concurrently until
>> buddy page is unloaded. As we always do free in "load buddy, free,
>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx>
>> ---
>> fs/ext4/mballoc.c | 18 ++++++++----------
>> 1 file changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index e650eac22237..01ad36a1cc96 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>> if (err)
>> goto error_return;
>
>
> Let me add the a piece of code before the added changes and continue...
>
> err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
> GFP_NOFS|__GFP_NOFAIL);
> if (err)
> goto error_return;
>>
>> + ext4_lock_group(sb, block_group);
>> + mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>> + ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>> + ext4_free_group_clusters_set(sb, gdp, ret);
>> + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> + ext4_group_desc_csum_set(sb, block_group, gdp);
>> + ext4_unlock_group(sb, block_group);
>> +
>
> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
> freeing on-disk bitmap blocks? Can you explain why please?
> At least it is not very clear to me that why do we need
> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
> needed then I think we should separate out bitmap freeing logic and
> buddy freeing logic even further.
Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
it before bit clearing for catching error and aborting mark early
to reduce functional change.
>
> Thoughts?
>
>> /*
>> * We need to make sure we don't reuse the freed block until after the
>> * transaction is committed. We make an exception if the inode is to be
>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>> new_entry->efd_tid = handle->h_transaction->t_tid;
>>
>> ext4_lock_group(sb, block_group);
>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>> ext4_mb_free_metadata(handle, &e4b, new_entry);
>> } else {
>> - /* need to update group_info->bb_free and bitmap
>> - * with group lock held. generate_buddy look at
>> - * them with group lock_held
>> - */
>> if (test_opt(sb, DISCARD)) {
>> err = ext4_issue_discard(sb, block_group, bit,
>> count_clusters, NULL);
>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>> EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>
>> ext4_lock_group(sb, block_group);
>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>> mb_free_blocks(inode, &e4b, bit, count_clusters);
>> }
>>
>> - ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>> - ext4_free_group_clusters_set(sb, gdp, ret);
>> - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> - ext4_group_desc_csum_set(sb, block_group, gdp);
>> ext4_unlock_group(sb, block_group);
>>
>> if (sbi->s_log_groups_per_flex) {
>
>
> Adding piece of code here...
>
> if (sbi->s_log_groups_per_flex) {
> ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
> atomic64_add(count_clusters,
> &sbi_array_rcu_deref(sbi, s_flex_groups,
> flex_group)->free_clusters);
> }
>
> <...>
>
> /* We dirtied the bitmap block */
> BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
> /* And the group descriptor block */
> BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
> ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
> if (!err)
> err = ret;
>
> I was thinking even this can go around bitmap freeing logic above. So
> the next patch becomes very clear that all of the bitmap freeing logic
> is just simply moved into ext4_mb_mark_context() function.
>
> -ritesh
>