Re: [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF

From: Arnaldo Carvalho de Melo
Date: Wed Mar 17 2021 - 09:12:25 EST


Em Wed, Mar 17, 2021 at 02:29:28PM +0900, Namhyung Kim escreveu:
> Hi Song,
>
> On Wed, Mar 17, 2021 at 6:18 AM Song Liu <songliubraving@xxxxxx> wrote:
> >
> > perf uses performance monitoring counters (PMCs) to monitor system
> > performance. The PMCs are limited hardware resources. For example,
> > Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
> >
> > Modern data center systems use these PMCs in many different ways:
> > system level monitoring, (maybe nested) container level monitoring, per
> > process monitoring, profiling (in sample mode), etc. In some cases,
> > there are more active perf_events than available hardware PMCs. To allow
> > all perf_events to have a chance to run, it is necessary to do expensive
> > time multiplexing of events.
> >
> > On the other hand, many monitoring tools count the common metrics (cycles,
> > instructions). It is a waste to have multiple tools create multiple
> > perf_events of "cycles" and occupy multiple PMCs.
>
> Right, it'd be really helpful when the PMCs are frequently or mostly shared.
> But it'd also increase the overhead for uncontended cases as BPF programs
> need to run on every context switch. Depending on the workload, it may
> cause a non-negligible performance impact. So users should be aware of it.

Would be interesting to, humm, measure both cases to have a firm number
of the impact, how many instructions are added when sharing using
--bpf-counters?

I.e. compare the "expensive time multiplexing of events" with its
avoidance by using --bpf-counters.

Song, have you perfmormed such measurements?

- Arnaldo

> Thanks,
> Namhyung
>
> >
> > bperf tries to reduce such wastes by allowing multiple perf_events of
> > "cycles" or "instructions" (at different scopes) to share PMUs. Instead
> > of having each perf-stat session to read its own perf_events, bperf uses
> > BPF programs to read the perf_events and aggregate readings to BPF maps.
> > Then, the perf-stat session(s) reads the values from these BPF maps.
> >
> > Changes v1 => v2:
> > 1. Add documentation.
> > 2. Add a shell test.
> > 3. Rename options, default path of the atto-map, and some variables.
> > 4. Add a separate patch that moves clock_gettime() in __run_perf_stat()
> > to after enable_counters().
> > 5. Make perf_cpu_map for all cpus a global variable.
> > 6. Use sysfs__mountpoint() for default attr-map path.
> > 7. Use cpu__max_cpu() instead of libbpf_num_possible_cpus().
> > 8. Add flag "enabled" to the follower program. Then move follower attach
> > to bperf__load() and simplify bperf__enable().
> >
> > Song Liu (3):
> > perf-stat: introduce bperf, share hardware PMCs with BPF
> > perf-stat: measure t0 and ref_time after enable_counters()
> > perf-test: add a test for perf-stat --bpf-counters option
> >
> > tools/perf/Documentation/perf-stat.txt | 11 +
> > tools/perf/Makefile.perf | 1 +
> > tools/perf/builtin-stat.c | 20 +-
> > tools/perf/tests/shell/stat_bpf_counters.sh | 34 ++
> > tools/perf/util/bpf_counter.c | 519 +++++++++++++++++-
> > tools/perf/util/bpf_skel/bperf.h | 14 +
> > tools/perf/util/bpf_skel/bperf_follower.bpf.c | 69 +++
> > tools/perf/util/bpf_skel/bperf_leader.bpf.c | 46 ++
> > tools/perf/util/bpf_skel/bperf_u.h | 14 +
> > tools/perf/util/evsel.h | 20 +-
> > tools/perf/util/target.h | 4 +-
> > 11 files changed, 742 insertions(+), 10 deletions(-)
> > create mode 100755 tools/perf/tests/shell/stat_bpf_counters.sh
> > create mode 100644 tools/perf/util/bpf_skel/bperf.h
> > create mode 100644 tools/perf/util/bpf_skel/bperf_follower.bpf.c
> > create mode 100644 tools/perf/util/bpf_skel/bperf_leader.bpf.c
> > create mode 100644 tools/perf/util/bpf_skel/bperf_u.h
> >
> > --
> > 2.30.2

--

- Arnaldo