Re: [RFC][PATCH] perf: Rewrite core context handling

From: Alexey Budankov
Date: Mon Oct 15 2018 - 03:26:14 EST


Hi,

On 10.10.2018 13:45, Peter Zijlstra wrote:
> Hi all,
>
> There have been various issues and limitations with the way perf uses
> (task) contexts to track events. Most notable is the single hardware PMU
> task context, which has resulted in a number of yucky things (both
> proposed and merged).
>
> Notably:
>
> - HW breakpoint PMU
> - ARM big.little PMU
> - Intel Branch Monitoring PMU
>
> Since we now track the events in RB trees, we can 'simply' add a pmu
> order to them and have them grouped that way, reducing to a single
> context. Of course, reality never quite works out that simple, and below
> ends up adding an intermediate data structure to bridge the context ->
> pmu mapping.
>
> Something a little like:
>
> ,------------------------[1:n]---------------------.
> V V
> perf_event_context <-[1:n]-> perf_event_pmu_context <--- perf_event
> ^ ^ | |
> `--------[1:n]---------' `-[n:1]-> pmu <-[1:n]-'
>
> This patch builds (provided you disable CGROUP_PERF), boots and survives
> perf-top without the machine catching fire.
>
> There's still a fair bit of loose ends (look for XXX), but I think this
> is the direction we should be going.
>
> Comments?
>
> Not-Quite-Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
> arch/powerpc/perf/core-book3s.c | 4
> arch/x86/events/core.c | 4
> arch/x86/events/intel/core.c | 6
> arch/x86/events/intel/ds.c | 6
> arch/x86/events/intel/lbr.c | 16
> arch/x86/events/perf_event.h | 6
> include/linux/perf_event.h | 80 +-
> include/linux/sched.h | 2
> kernel/events/core.c | 1412 ++++++++++++++++++++--------------------
> 9 files changed, 815 insertions(+), 721 deletions(-)

Rewrite is impressive however it doesn't result in code base reduction as it is.
Nonetheless there is a clear demand for per pmu events groups tracking and rotation
in single cpu context (HW breakpoints, ARM big.little, Intel LBRs) and there is
a supply thru groups ordering on RB-tree.

This might be driven into the kernel by some new Perf features that would base on
that RB-tree groups ordering or by refactoring of existing code but in the way it
would result in overall code base reduction thus lowering support cost.

Thanks,
Alexey