Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
From: Wang Jianchao
Date: Thu Aug 26 2021 - 03:16:11 EST
On 2021/8/4 11:45 PM, Jan Kara wrote:
> On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
>> From: Wang Jianchao <wangjianchao@xxxxxxxxxxxx>
>>
>> Right now, discard is issued and waited to be completed in jbd2
>> commit kthread context after the logs are committed. When large
>> amount of files are deleted and discard is flooding, jbd2 commit
>> kthread can be blocked for long time. Then all of the metadata
>> operations can be blocked to wait the log space.
>>
>> One case is the page fault path with read mm->mmap_sem held, which
>> wants to update the file time but has to wait for the log space.
>> When other threads in the task wants to do mmap, then write mmap_sem
>> is blocked. Finally all of the following read mmap_sem requirements
>> are blocked, even the ps command which need to read the /proc/pid/
>> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
>> used to be blocked for 5 mins.
>>
>> This patch frees the blocks back to buddy after commit and then do
>> discard in a async kworker context in fstrim fashion, namely,
>> - mark blocks to be discarded as used if they have not been allocated
>> - do discard
>> - mark them free
>> After this, jbd2 commit kthread won't be blocked any more by discard
>> and we won't get NOSPC even if the discard is slow or throttled.
>>
>> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
>> Suggested-by: Theodore Ts'o <tytso@xxxxxxx>
>> Signed-off-by: Wang Jianchao <wangjianchao@xxxxxxxxxxxx>
>
> Looks good to me. Just one small comment below. With that addressed feel
> free to add:
>
> Reviewed-by: Jan Kara <jack@xxxxxxx>
>
>
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>> int count;
>>
>> + if (test_opt(sb, DISCARD)) {
>> + /*
>> + * wait the discard work to drain all of ext4_free_data
>> + */
>> + queue_work(ext4_discard_wq, &sbi->s_discard_work);
>
> Do we really need to queue the work here? The filesystem should be
> quiescent by now, we take care to queue the work whenever we add item to
> empty list. So it should be enough to have flush_work() here and then
> possibly
>
> WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))
>
> Or am I missing something?
queue_work here is indeed redundant.
Thanks so much for you point out this.
Jianchao
>
> Honza
>
>> + flush_work(&sbi->s_discard_work);
>> + }
>> +