RE: [RFC PATCH v10 3/8] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.

From: Wang, Weilin
Date: Fri May 31 2024 - 02:46:59 EST




> -----Original Message-----
> From: Namhyung Kim <namhyung@xxxxxxxxxx>
> Sent: Thursday, May 30, 2024 11:41 PM
> To: Wang, Weilin <weilin.wang@xxxxxxxxx>
> Cc: Ian Rogers <irogers@xxxxxxxxxx>; Arnaldo Carvalho de Melo
> <acme@xxxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Ingo Molnar
> <mingo@xxxxxxxxxx>; Alexander Shishkin
> <alexander.shishkin@xxxxxxxxxxxxxxx>; Jiri Olsa <jolsa@xxxxxxxxxx>; Hunter,
> Adrian <adrian.hunter@xxxxxxxxx>; Kan Liang <kan.liang@xxxxxxxxxxxxxxx>;
> linux-perf-users@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Taylor, Perry
> <perry.taylor@xxxxxxxxx>; Alt, Samantha <samantha.alt@xxxxxxxxx>; Biggers,
> Caleb <caleb.biggers@xxxxxxxxx>
> Subject: Re: [RFC PATCH v10 3/8] perf stat: Fork and launch perf record when
> perf stat needs to get retire latency value for a metric.
>
> On Tue, May 28, 2024 at 11:43 PM <weilin.wang@xxxxxxxxx> wrote:
> >
> > From: Weilin Wang <weilin.wang@xxxxxxxxx>
> >
> > When retire_latency value is used in a metric formula, evsel would fork a perf
> > record process with "-e" and "-W" options. Perf record will collect required
> > retire_latency values in parallel while perf stat is collecting counting values.
> >
> > At the point of time that perf stat stops counting, evsel would stop perf
> record
> > by sending sigterm signal to perf record process. Sampled data will be
> process
> > to get retire latency value.
> >
> > Another thread is required to synchronize between perf stat and perf record
> > when we pass data through pipe.
> >
> > Signed-off-by: Weilin Wang <weilin.wang@xxxxxxxxx>
> > ---
> [SNIP]
> > +int tpebs_set_evsel(struct evsel *evsel, int cpu_map_idx, int thread)
> > +{
> > + struct perf_counts_values *count;
> > + struct tpebs_retire_lat *t;
> > + bool found = false;
> > + __u64 val;
> > + int ret;
> > +
> > + /* Non reitre_latency evsel should never enter this function. */
> > + if (!evsel__is_retire_lat(evsel))
> > + return -1;
> > +
> > + ret = tpebs_stop();
> > + if (ret)
> > + return ret;
> > +
> > + count = perf_counts(evsel->counts, cpu_map_idx, thread);
> > +
> > + list_for_each_entry(t, &tpebs_results, nd) {
> > + if (!strcmp(t->tpebs_name, evsel->name) || !strcmp(t-
> >tpebs_name, evsel->metric_id)) {
> > + found = true;
> > + break;
> > + }
> > + }
> > +
> > + /* Set ena and run to non-zero */
> > + count->ena = count->run = 1;
> > + count->lost = 0;
> > +
> > + if (!found) {
> > + /*
> > + * Set default value or 0 when retire_latency for this event is
> > + * not found from sampling data (enable_tpebs_recording not set
> > + * or 0 sample recorded).
> > + */
> > + val = 0;
> > + return 0;
> > + }
> > +
> > + /*
> > + * Only set retire_latency value to the first CPU and thread.
> > + */
> > + if (cpu_map_idx == 0 && thread == 0) {
> > + /* Lost precision when casting from double to __u64. Any
> improvement? */
>
> As I said before I think you can set t->val * 1000 and then
> set the evsel->scale to 1e3 or 1e-3.

Hi Namhyung,

Sorry if this is a repeated message. I thought I replied to your suggestion
on this last time. I'm thinking we should keep it like this for now and make
this change unless we find the precision loss is critical. Because I thought
we don't want to add special code to handle the calculation and final print
to keep code simple.

I kept this comment here so that we don't forget about it. Please let me
know if you'd like me to remove it.

Thanks,
Weilin

>
> Thanks,
> Namhyung
>
>
> > + val = t->val;
> > + } else
> > + val = 0;
> > +
> > + count->val = val;
> > + return 0;
> > +}
> > +