Re: [PATCH v6 0/6] Optimize cgroup context switch

From: Ian Rogers
Date: Fri Feb 14 2020 - 14:32:26 EST


On a thread related to these patches Peter had previously asked for
what the performance numbers looked like. I've tested on Westmere and
Cascade Lake platforms. The benchmark is a set of processes in
different cgroups reading/writing to a file descriptor, where the read
context switches. To ensure the context switch all the processes are
pinned to a particular CPU, the benchmark is tested to ensure the
expected context-switches matches those performed. The benchmark
increases the number of perf events and cgroups, it also looks at the
effect of just monitoring 1 cgroup in an increasing set of cgroups.

Before the patches on Westmere if we do system wide profiling of 10
events and then increase the cgroups to 208 and monitor just one, the
context switch times go from 4.6us to 15.3us. If we monitor each
cgroup then the context switch times are 172.5us. With the patches,
the time for monitoring 1 cgroup goes from 4.6us to 14.9us, but when
monitoring all cgroups the context switch times are 14.1us. The small
speed up when monitoring 1 cgroup out of a set is that in most
context switches the O(n) search for an event in a cgroup is now
O(log(n)). When all cgroups are monitored the number of events in the
kernel is the product of the number of events and cgroups, giving a
larger value for 'n' and a more dramatic speed up - 172.5us becomes
14.9us.

In summary what we see for performance is that before the patches we
see context switch times being affected by the number of cgroups
monitored, after the patches there is still a context switch cost in
monitoring events, but it is similar whether 1 or all cgroups are
being monitored. This fits with the intuition of what the patches are
trying to do by avoiding searches of events that are for cgroups the
current task isn't within.The results are consistent but less dramatic
for smaller numbers of events and cgroups. We've not identified a slow
down from the patches, but there is a degree of noise in the timing
data. Broadly, with turbo disabled on the test machines the patches
make context switch performance the same or faster. For a more
representative number of events and cgroups, say 6 and 32, we see
context switch time improve from 29.4us to 13.2us when all cgroups are
monitored.

Thanks,
Ian


On Thu, Feb 13, 2020 at 11:51 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> Avoid iterating over all per-CPU events during cgroup changing context
> switches by organizing events by cgroup.
>
> To make an efficient set of iterators, introduce a min max heap
> utility with test.
>
> The v6 patch reduces the patch set by 4 patches, it updates the cgroup
> id and fixes part of the min_heap rename from v5.
>
> The v5 patch set renames min_max_heap to min_heap as suggested by
> Peter Zijlstra, it also addresses comments around preferring
> __always_inline over inline.
>
> The v4 patch set addresses review comments on the v3 patch set by
> Peter Zijlstra.
>
> These patches include a caching algorithm to improve the search for
> the first event in a group by Kan Liang <kan.liang@xxxxxxxxxxxxxxx> as
> well as rebasing hit "optimize event_filter_match during sched_in"
> from https://lkml.org/lkml/2019/8/7/771.
>
> The v2 patch set was modified by Peter Zijlstra in his perf/cgroup
> branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git
>
> These patches follow Peter's reorganization and his fixes to the
> perf_cpu_context min_heap storage code.
>
> Ian Rogers (5):
> lib: introduce generic min-heap
> perf: Use min_heap in visit_groups_merge
> perf: Add per perf_cpu_context min_heap storage
> perf/cgroup: Grow per perf_cpu_context heap storage
> perf/cgroup: Order events in RB tree by cgroup id
>
> Peter Zijlstra (1):
> perf/cgroup: Reorder perf_cgroup_connect()
>
> include/linux/min_heap.h | 135 ++++++++++++++++++++
> include/linux/perf_event.h | 7 ++
> kernel/events/core.c | 251 +++++++++++++++++++++++++++++++------
> lib/Kconfig.debug | 10 ++
> lib/Makefile | 1 +
> lib/test_min_heap.c | 194 ++++++++++++++++++++++++++++
> 6 files changed, 563 insertions(+), 35 deletions(-)
> create mode 100644 include/linux/min_heap.h
> create mode 100644 lib/test_min_heap.c
>
> --
> 2.25.0.265.gbab2e86ba0-goog
>