Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

From: Coly Li
Date: Wed Aug 14 2019 - 12:50:14 EST


On 2019/8/14 10:23 äå, Heitor Alves de Siqueira wrote:
> Hi Coly,
>
> We've had users impacted by system stalls and were able to trace it back to the
> bcache priority_stats query. After investigating a bit further, it seems that
> the sorting step in the quantiles calculation can cause heavy CPU
> contention. This has a severe performance impact on any task that is running in
> the same CPU as the sysfs query, and caused issues even for non-bcache
> workloads.
>
> We did some test runs with fio to get a better picture of the impact on
> read/write workloads while a priority_stats query is running, and came up with
> some interesting results. The bucket locking doesn't seem to have that
> much performance impact even in full-write workloads, but the 'sort' can affect
> bcache device throughput and latency pretty heavily (and any other tasks that
> are "unlucky" to be scheduled together with it). In some of our tests, there was
> a performance reduction of almost 90% in random read IOPS to the bcache device
> (refer to the comparison graph at [0]). There's a few more details on the
> Launchpad bug [1] we've created to track this, together with the complete fio
> results + comparison graphs.
>
> The cond_resched() patch suggested by Shile Zhang actually improved performance
> a lot, and eliminated the stalls we've observed during the priority_stats
> query. Even though it may cause the sysfs query to take a bit longer, it seems
> like a decent tradeoff for general performance when running that query on a
> system under heavy load. It's also a cheap short-term solution until we can look
> into a more complex re-write of the priority_stats calculation (perhaps one that
> doesn't require the locking?). Could we revisit Shile's patch, and consider
> merging it?
>
> Thanks!
> Heitor
>
> [0] https://people.canonical.com/~halves/priority_stats/read/4k-iops-2Dsmooth.png
> [1] https://bugs.launchpad.net/bugs/1840043
>

Hi Heitor,

With your very detailed explanation I come to understand why people
cares about performance impact of pririty_stats. In the case of system
monitoring, how long priority_stats returns is less important than
overall system throughput.

Yes I agree with your opinion and the benchmark chart makes me confident
with the original patch. I will add this patch to v5.4 window with
tested-by: Heitor Alves de Siqueira <halves@xxxxxxxxxxxxx>

Thanks for your detailed information. And thanks for Shile Zhang
originally composing this patch.

--

Coly Li