Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

From: Coly Li
Date: Thu Mar 07 2019 - 21:21:33 EST


On 2019/3/8 8:35 äå, Shile Zhang wrote:
>
> On 2019/3/7 23:44, Vojtech Pavlik wrote:
>> On Thu, Mar 07, 2019 at 11:36:18PM +0800, Coly Li wrote:
>>> On 2019/3/7 11:06 äå, Shile Zhang wrote:
>>>> On 2019/3/7 18:34, Coly Li wrote:
>>>>> On 2019/3/7 1:15 äå, shile.zhang@xxxxxxxxxxxxxxxxx wrote:
>>>>>> From: Shile Zhang <shile.zhang@xxxxxxxxxxxxxxxxx>
>>>>>>
>>>>>> Read /sys/fs/bcache/<uuid>/cacheN/priority_stats can take very long
>>>>>> time with huge cache after long run.
>>>>>>
>>>>>> Signed-off-by: Shile Zhang <shile.zhang@xxxxxxxxxxxxxxxxx>
>>>>> Hi Shile,
>>>>>
>>>>> Do you test your change ? It will be helpful with more performance
>>>>> data
>>>>> (what problem that you improved).
>>>> In case of 960GB SSD cache device, once read of the 'priority_stats'
>>>> costs about 600ms in our test environment.
>>>>
>>> After the fix, how much time it takes ?
>>>
>>>
>>>> The perf tool shown that near 50% CPU time consumed by 'sort()', this
>>>> means once sort will hold the CPU near 300ms.
>>>>
>>>> In our case, the statistics collector reads the 'priority_stats'
>>>> periodically, it will trigger the schedule latency jitters of the
>>>>
>>>> task which shared same CPU core.
>>>>
>>> Hmm, it seems you just make the sort slower, and nothing more changes.
>>> Am I right ?
>> Well, it has to make the sort slower, but it'll also avoid hogging the
>> CPU (on a non-preemptible kernel), avoiding a potential soft lockup
>> warning and allowing other tasks to run.
>>
> Yes, there is a risk that other tasks have no chance to run due to sort
> hogging the CPU, it is harmful to some schedule-latency sensitive tasks.
> This change just try to reduce the impact of sort, but not a performance
> improvement of it. I'm not sure if a better way can handle it more
> efficiency.
I know the above concept, since I would expect when people talking about
performance improvement, it would be better to provide real performance
number. Under some conditions it might be hard, but in this exact
example, it won't.

Could you please provide some data that on your configuration, how slow
'sort' becomes ?

AND, reading priority_stats in period for performance monitoring is not
good idea. The problem is not from 'sort', it is from
mutex_lock(&ca->set->bucket_lock) lines above the sort in sysfs.c, when
you have a quite large or very busy cache device, you may see normal I/O
performance drops due to too much time holding bucket_lock here.

Anyway, this is just my suggestion. Back to this patch, please provide
performance number.

Thanks.

Coly Li