Re: [RFC PATCH v16 8/8] perf test: Add test for Intel TPEBS counting mode

From: Namhyung Kim
Date: Thu Jul 11 2024 - 17:45:39 EST


Hello,

On Tue, Jul 09, 2024 at 06:23:51AM +0000, Wang, Weilin wrote:
> > On Mon, Jul 8, 2024 at 9:58 PM Wang, Weilin <weilin.wang@xxxxxxxxx>
> > wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Namhyung Kim <namhyung@xxxxxxxxxx>
> > > > Sent: Monday, July 8, 2024 9:44 PM
> > > > To: Wang, Weilin <weilin.wang@xxxxxxxxx>
> > > > Cc: Ian Rogers <irogers@xxxxxxxxxx>; Arnaldo Carvalho de Melo
> > > > <acme@xxxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Ingo Molnar
> > > > <mingo@xxxxxxxxxx>; Alexander Shishkin
> > > > <alexander.shishkin@xxxxxxxxxxxxxxx>; Jiri Olsa <jolsa@xxxxxxxxxx>; Hunter,
> > > > Adrian <adrian.hunter@xxxxxxxxx>; Kan Liang <kan.liang@xxxxxxxxxxxxxxx>;
> > > > linux-perf-users@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Taylor,
> > Perry
> > > > <perry.taylor@xxxxxxxxx>; Alt, Samantha <samantha.alt@xxxxxxxxx>;
> > Biggers,
> > > > Caleb <caleb.biggers@xxxxxxxxx>
> > > > Subject: Re: [RFC PATCH v16 8/8] perf test: Add test for Intel TPEBS
> > counting
> > > > mode
> > > >
> > > > Hello Weilin,
> > > >
> > > > On Sat, Jul 6, 2024 at 4:30 PM <weilin.wang@xxxxxxxxx> wrote:
> > > > >
> > > > > From: Weilin Wang <weilin.wang@xxxxxxxxx>
> > > > >
> > > > > Intel TPEBS sampling mode is supported through perf record. The
> > counting
> > > > mode
> > > > > code uses perf record to capture retire_latency value and use it in metric
> > > > > calculation. This test checks the counting mode code.
> > > > >
> > > > > Signed-off-by: Weilin Wang <weilin.wang@xxxxxxxxx>
> > > > > ---
> > > > > .../perf/tests/shell/test_stat_intel_tpebs.sh | 18
> > ++++++++++++++++++
> > > > > 1 file changed, 18 insertions(+)
> > > > > create mode 100755 tools/perf/tests/shell/test_stat_intel_tpebs.sh
> > > > >
> > > > > diff --git a/tools/perf/tests/shell/test_stat_intel_tpebs.sh
> > > > b/tools/perf/tests/shell/test_stat_intel_tpebs.sh
> > > > > new file mode 100755
> > > > > index 000000000000..fea8cb1b8367
> > > > > --- /dev/null
> > > > > +++ b/tools/perf/tests/shell/test_stat_intel_tpebs.sh
> > > > > @@ -0,0 +1,18 @@
> > > > > +#!/bin/bash
> > > > > +# test Intel TPEBS counting mode
> > > > > +# SPDX-License-Identifier: GPL-2.0
> > > > > +
> > > > > +set -e
> > > > > +
> > > > > +# Use this event for testing because it should exist in all platforms
> > > > > +event=cache-misses:R
> > > > > +
> > > > > +# Without this cmd option, default value or zero is returned
> > > > > +echo "Testing without --record-tpebs"
> > > > > +result=$(perf stat -e "$event" true 2>&1)
> > > > > +[[ "$result" =~ $event ]] || exit 1
> > > > > +
> > > > > +# In platforms that do not support TPEBS, it should execute without
> > error.
> > > > > +echo "Testing with --record-tpebs"
> > > > > +result=$(perf stat -e "$event" --record-tpebs -a sleep 0.01 2>&1)
> > > >
> > > > It never finishes on my AMD machine.
> > > >
> > > Hi Namhyung,
> > >
> > > Do you see any message while it executes? Is the perf record forked
> > successfully
> > > but failed to return?
> >
> > I don't know.. all I can get is like below:
> >
> > $ sudo ./perf test tpebs -vv
> > 121: test Intel TPEBS counting mode:
> > --- start ---
> > test child forked, pid 583475
> > Testing without --record-tpebs
> > Testing with --record-tpebs
> > ^C
>
> I think the problem is that the forked "perf record" encountered error, which
> caused the control fifo failed to send a "ACK" back and the PIPE hanged.
>
> Could you please try out the diff below and see if the test would finish?
>
> As for the "perf record" error, I think it might because of the fake
> event(cache-misses:R) cannot be supported in AMD. Could you please test run
> a "perf stat -e cache-misses:R --record-tpebs true" and see if it complains about
> the event?

So I tried the below patch and it worked.

$ ./perf test -v tpebs
121: test Intel TPEBS counting mode:
--- start ---
test child forked, pid 2190174
Testing without --record-tpebs
Testing with --record-tpebs
---- end(-1) ----
121: test Intel TPEBS counting mode : FAILED!

It would be better if it can skip rather than fail on
non-supported machines.

Also I saw this message when I run the command manually.

$ ./perf stat -e cache-misses:R --record-tpebs -v true
Control descriptor is not initialized
Retire_latency of event cache-misses:R is required
Prepare perf record for retire_latency
Error:
The cache-misses:pu event is not supported.
incompatible file format
incompatible file format (rerun with -v to learn more)
failed: did not received an ack
cache-misses:R: 0 1 1

Performance counter stats for 'true':

0 cache-misses:R

0.000004939 seconds time elapsed

0.000000000 seconds user
0.000000000 seconds sys

I'm not sure why it showed the incompatible file format message.

>
> diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
> index a0585a6571b5..5b8e104f36f1 100644
> --- a/tools/perf/util/intel-tpebs.c
> +++ b/tools/perf/util/intel-tpebs.c
> @@ -263,6 +263,7 @@ int tpebs_start(struct evlist *evsel_list)
> }
>
> if (tpebs_event_size > 0) {
> + struct pollfd pollfd = { .events = POLLIN, };
> int control_fd[2], ack_fd[2], len;
> char ack_buf[8];
>
> @@ -297,6 +298,19 @@ int tpebs_start(struct evlist *evsel_list)
> goto out;
> }
>
> + /* wait for an ack */
> + pollfd.fd = ack_fd[0];
> +
> + if (!poll(&pollfd, 1, 2000)) {

Is it 2 seconds? Any specific reason for the value?
At least we need a comment to explain the value (or just saying it's a
random one).

Thanks,
Namhyung


> + pr_err("failed: perf record ack timeout\n");
> + goto out;
> + }
> +
> + if (!(pollfd.revents & POLLIN)) {
> + pr_err("failed: did not received an ack\n");
> + goto out;
> + }
> +
> ret = read(ack_fd[0], ack_buf, sizeof(ack_buf));
> if (ret > 0)
> ret = strcmp(ack_buf, "ack\n");
>
> Thanks,
> Weilin