Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()

From: Kemeng Shi
Date: Tue Sep 12 2023 - 07:33:01 EST




on 9/12/2023 6:13 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> writes:
>
>> on 9/4/2023 11:00 AM, Kemeng Shi wrote:
>>>
>>>
>>> on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
>>>> Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> writes:
>>>>
>>>>> This patch separates block bitmap and buddy bitmap freeing in order to
>>>>> udpate block bitmap with ext4_mb_mark_context in following patch.
>>>> ^^^ update.
>>>>
>>>>>
>>>>> Separated freeing is safe with concurrent allocation as long as:
>>>>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>>>>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>>>>> Then freed block will only be available to allocation when both buddy
>>>>> bitmap and block bitmap are updated by freeing.
>>>>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>>>>
>>>> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
>>>> bitmap right. Continue below...
>>>>
>>>>>
>>>>> Separated freeing has no race with generate_buddy as:
>>>>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>>>>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>>>>> initialization of the buddy page will be executed concurrently until
>>>>> buddy page is unloaded. As we always do free in "load buddy, free,
>>>>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx>
>>>>> ---
>>>>> fs/ext4/mballoc.c | 18 ++++++++----------
>>>>> 1 file changed, 8 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>>> index e650eac22237..01ad36a1cc96 100644
>>>>> --- a/fs/ext4/mballoc.c
>>>>> +++ b/fs/ext4/mballoc.c
>>>>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>> if (err)
>>>>> goto error_return;
>>>>
>>>>
>>>> Let me add the a piece of code before the added changes and continue...
>>>>
>>>> err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
>>>> GFP_NOFS|__GFP_NOFAIL);
>>>> if (err)
>>>> goto error_return;
>>>>>
>>>>> + ext4_lock_group(sb, block_group);
>>>>> + mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>> + ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>>> + ext4_free_group_clusters_set(sb, gdp, ret);
>>>>> + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>>> + ext4_group_desc_csum_set(sb, block_group, gdp);
>>>>> + ext4_unlock_group(sb, block_group);
>>>>> +
>>>>
>>>> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
>>>> freeing on-disk bitmap blocks? Can you explain why please?
>>>> At least it is not very clear to me that why do we need
>>>> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
>>>> needed then I think we should separate out bitmap freeing logic and
>>>> buddy freeing logic even further.
>>> Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
>
> Earlier you mentioned here that there is no dependency.
>
>
>>> it before bit clearing for catching error and aborting mark early
>>> to reduce functional change.
>> Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer
>> to call ext4_mb_load_buddy_gfp early to catch potential error in advance.
>> Just wonder is this good to you. Like to hear any advice. Thanks!
>
> Isn't there actually a dependency that we should first always call
> ext4_mb_load_buddy_gfp() before clearing the bitmap?
>
> Because say if we take care of the bitmap handling first, i.e. we clear
> the bitmap and also call ext4_handle_dirty_metadata() first. Now assume
> we later call ext4_mb_load_buddy_gfp(). Now also assume the buddy pages
> were not already loaded in memory (reclaimed due to memory pressure),
> in that case we will read the block bitmap of that group from the
> on-disk bitmap copy, which we already changed above.
> So that means the buddy handling logic will already find the block
> bitmap as cleared and should report problems right?
>
> Then isn't there an actual dependency here, meaning
> ext4_mb_load_buddy_gfp() should always be called first before bitmap
> handling logic. Thoughts?
>
My fault, ext4_mb_load_buddy_gfp should be called before on-disk bitmap
handling if buddy page was not loaded...

> -ritesh
>
>
>>>>
>>>> Thoughts?
>>>>
>>>>> /*
>>>>> * We need to make sure we don't reuse the freed block until after the
>>>>> * transaction is committed. We make an exception if the inode is to be
>>>>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>> new_entry->efd_tid = handle->h_transaction->t_tid;
>>>>>
>>>>> ext4_lock_group(sb, block_group);
>>>>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>> ext4_mb_free_metadata(handle, &e4b, new_entry);
>>>>> } else {
>>>>> - /* need to update group_info->bb_free and bitmap
>>>>> - * with group lock held. generate_buddy look at
>>>>> - * them with group lock_held
>>>>> - */
>>>>> if (test_opt(sb, DISCARD)) {
>>>>> err = ext4_issue_discard(sb, block_group, bit,
>>>>> count_clusters, NULL);
>>>>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>> EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>>>>
>>>>> ext4_lock_group(sb, block_group);
>>>>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>> mb_free_blocks(inode, &e4b, bit, count_clusters);
>>>>> }
>>>>>
>>>>> - ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>>> - ext4_free_group_clusters_set(sb, gdp, ret);
>>>>> - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>>> - ext4_group_desc_csum_set(sb, block_group, gdp);
>>>>> ext4_unlock_group(sb, block_group);
>>>>>
>>>>> if (sbi->s_log_groups_per_flex) {
>>>>
>>>>
>>>> Adding piece of code here...
>>>>
>>>> if (sbi->s_log_groups_per_flex) {
>>>> ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
>>>> atomic64_add(count_clusters,
>>>> &sbi_array_rcu_deref(sbi, s_flex_groups,
>>>> flex_group)->free_clusters);
>>>> }
>>>>
>>>> <...>
>>>>
>>>> /* We dirtied the bitmap block */
>>>> BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
>>>> err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>>>> /* And the group descriptor block */
>>>> BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
>>>> ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
>>>> if (!err)
>>>> err = ret;
>>>>
>>>> I was thinking even this can go around bitmap freeing logic above. So
>>>> the next patch becomes very clear that all of the bitmap freeing logic
>>>> is just simply moved into ext4_mb_mark_context() function.
>>>>
>>>> -ritesh
>>>>
>>>
>>>
>