Re: [PATCH v3] perf record: add dummy event during system wide synthesis

From: Ian Rogers
Date: Wed May 20 2020 - 02:00:02 EST


On Tue, May 19, 2020 at 6:54 PM Arnaldo Carvalho de Melo
<arnaldo.melo@xxxxxxxxx> wrote:
>
> Em Wed, Apr 22, 2020 at 10:36:15AM -0700, Ian Rogers escreveu:
> > During the processing of /proc during event synthesis new processes may
> > start. Add a dummy event if /proc is to be processed, to capture mmaps
> > for starting processes. This reuses the existing logic for
> > initial-delay.
> >
> > v3 fixes the attr test of test-record-C0
> > v2 fixes the dummy event configuration and a branch stack issue.
>
> Something I noticed only now is that this ends up in the perf.data file,
> and we don't need it at all there, i.e.
>
> # perf record -I
>
> I.e. system wide, asking for registers now ends up with:
>
> [root@quaco ~]# perf record -I
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 2.855 MB perf.data (4902 samples) ]
> [root@quaco ~]# perf evlist
> cycles
> dummy:HG
> [root@quaco ~]# perf evlist -v
> cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD|REGS_INTR, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, sample_regs_intr: 0xff0fff
> dummy:HG: type: 1, size: 120, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD|REGS_INTR, read_format: ID, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, sample_regs_intr: 0xff0fff
> [root@quaco ~]#
>
> For perf top is ok to reuse the main evlist, as those are not going to
> hit the disk, but for 'perf record' it pollutes the perf.data file with
> that dummy event.
>
> This was a problem introduced with initial-delay, that IIRC predates the
> side band thread tho, I'll have to think about it, just writing this
> down to revisit this, as may raise some eyebrows by now being more
> exposed.

Agreed. We've had to adjust some tooling like the protobuf convertor
because of this:
https://github.com/google/perf_data_converter/pull/88

Thanks,
Ian

> - Arnaldo
>
> > Suggested-by: Stephane Eranian <eranian@xxxxxxxxxx>
> > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > ---
> > tools/perf/builtin-record.c | 19 +++++++---
> > tools/perf/tests/attr/system-wide-dummy | 50 +++++++++++++++++++++++++
> > tools/perf/tests/attr/test-record-C0 | 12 +++++-
> > tools/perf/util/evsel.c | 5 ++-
> > 4 files changed, 78 insertions(+), 8 deletions(-)
> > create mode 100644 tools/perf/tests/attr/system-wide-dummy
> >
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 1ab349abe904..8d1e93351298 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -805,19 +805,28 @@ static int record__open(struct record *rec)
> > int rc = 0;
> >
> > /*
> > - * For initial_delay we need to add a dummy event so that we can track
> > - * PERF_RECORD_MMAP while we wait for the initial delay to enable the
> > - * real events, the ones asked by the user.
> > + * For initial_delay or system wide, we need to add a dummy event so
> > + * that we can track PERF_RECORD_MMAP to cover the delay of waiting or
> > + * event synthesis.
> > */
> > - if (opts->initial_delay) {
> > + if (opts->initial_delay || target__has_cpu(&opts->target)) {
> > if (perf_evlist__add_dummy(evlist))
> > return -ENOMEM;
> >
> > + /* Disable tracking of mmaps on lead event. */
> > pos = evlist__first(evlist);
> > pos->tracking = 0;
> > + /* Set up dummy event. */
> > pos = evlist__last(evlist);
> > pos->tracking = 1;
> > - pos->core.attr.enable_on_exec = 1;
> > + /*
> > + * Enable the dummy event when the process is forked for
> > + * initial_delay, immediately for system wide.
> > + */
> > + if (opts->initial_delay)
> > + pos->core.attr.enable_on_exec = 1;
> > + else
> > + pos->immediate = 1;
> > }
> >
> > perf_evlist__config(evlist, opts, &callchain_param);
> > diff --git a/tools/perf/tests/attr/system-wide-dummy b/tools/perf/tests/attr/system-wide-dummy
> > new file mode 100644
> > index 000000000000..eba723cc0d38
> > --- /dev/null
> > +++ b/tools/perf/tests/attr/system-wide-dummy
> > @@ -0,0 +1,50 @@
> > +# Event added by system-wide or CPU perf-record to handle the race of
> > +# processes starting while /proc is processed.
> > +[event]
> > +fd=1
> > +group_fd=-1
> > +cpu=*
> > +pid=-1
> > +flags=8
> > +type=1
> > +size=120
> > +config=9
> > +sample_period=4000
> > +sample_type=455
> > +read_format=4
> > +# Event will be enabled right away.
> > +disabled=0
> > +inherit=1
> > +pinned=0
> > +exclusive=0
> > +exclude_user=0
> > +exclude_kernel=0
> > +exclude_hv=0
> > +exclude_idle=0
> > +mmap=1
> > +comm=1
> > +freq=1
> > +inherit_stat=0
> > +enable_on_exec=0
> > +task=1
> > +watermark=0
> > +precise_ip=0
> > +mmap_data=0
> > +sample_id_all=1
> > +exclude_host=0
> > +exclude_guest=0
> > +exclude_callchain_kernel=0
> > +exclude_callchain_user=0
> > +mmap2=1
> > +comm_exec=1
> > +context_switch=0
> > +write_backward=0
> > +namespaces=0
> > +use_clockid=0
> > +wakeup_events=0
> > +bp_type=0
> > +config1=0
> > +config2=0
> > +branch_sample_type=0
> > +sample_regs_user=0
> > +sample_stack_user=0
> > diff --git a/tools/perf/tests/attr/test-record-C0 b/tools/perf/tests/attr/test-record-C0
> > index 93818054ae20..317730b906dd 100644
> > --- a/tools/perf/tests/attr/test-record-C0
> > +++ b/tools/perf/tests/attr/test-record-C0
> > @@ -9,6 +9,14 @@ cpu=0
> > # no enable on exec for CPU attached
> > enable_on_exec=0
> >
> > -# PERF_SAMPLE_IP | PERF_SAMPLE_TID PERF_SAMPLE_TIME | # PERF_SAMPLE_PERIOD
> > +# PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME |
> > +# PERF_SAMPLE_ID | PERF_SAMPLE_PERIOD
> > # + PERF_SAMPLE_CPU added by -C 0
> > -sample_type=391
> > +sample_type=455
> > +
> > +# Dummy event handles mmaps, comm and task.
> > +mmap=0
> > +comm=0
> > +task=0
> > +
> > +[event:system-wide-dummy]
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index 6a571d322bb2..ca8f9533d8f9 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -1163,11 +1163,14 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts,
> > }
> >
> > /*
> > + * A dummy event never triggers any actual counter and therefore
> > + * cannot be used with branch_stack.
> > + *
> > * For initial_delay, a dummy event is added implicitly.
> > * The software event will trigger -EOPNOTSUPP error out,
> > * if BRANCH_STACK bit is set.
> > */
> > - if (opts->initial_delay && is_dummy_event(evsel))
> > + if (is_dummy_event(evsel))
> > perf_evsel__reset_sample_bit(evsel, BRANCH_STACK);
> > }
> >
> > --
> > 2.26.2.303.gf8c07b1a785-goog
> >
>
> --
>
> - Arnaldo