Re: [RFC PATCH v6 4/5] perf stat: Add retire latency print functions to print out at the very end of print out

From: Namhyung Kim
Date: Mon Apr 01 2024 - 17:15:36 EST


On Mon, Apr 1, 2024 at 2:08 PM Wang, Weilin <weilin.wang@xxxxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: Namhyung Kim <namhyung@xxxxxxxxxx>
> > Sent: Monday, April 1, 2024 2:04 PM
> > To: Wang, Weilin <weilin.wang@xxxxxxxxx>
> > Cc: Ian Rogers <irogers@xxxxxxxxxx>; Arnaldo Carvalho de Melo
> > <acme@xxxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Ingo Molnar
> > <mingo@xxxxxxxxxx>; Alexander Shishkin
> > <alexander.shishkin@xxxxxxxxxxxxxxx>; Jiri Olsa <jolsa@xxxxxxxxxx>; Hunter,
> > Adrian <adrian.hunter@xxxxxxxxx>; Kan Liang <kan.liang@xxxxxxxxxxxxxxx>;
> > linux-perf-users@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Taylor, Perry
> > <perry.taylor@xxxxxxxxx>; Alt, Samantha <samantha.alt@xxxxxxxxx>; Biggers,
> > Caleb <caleb.biggers@xxxxxxxxx>
> > Subject: Re: [RFC PATCH v6 4/5] perf stat: Add retire latency print functions to
> > print out at the very end of print out
> >
> > On Fri, Mar 29, 2024 at 12:12 PM <weilin.wang@xxxxxxxxx> wrote:
> > >
> > > From: Weilin Wang <weilin.wang@xxxxxxxxx>
> > >
> > > Add print out functions so that users could read retire latency values.
> > >
> > > Example output:
> > > In this simple example, there is no MEM_INST_RETIRED.STLB_HIT_STORES
> > sample.
> > > Therefore, the MEM_INST_RETIRED.STLB_HIT_STORES:p retire_latency
> > value, count
> > > and sum are all 0.
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > 181,047,168 cpu_core/TOPDOWN.SLOTS/ # 0.6 %
> > tma_dtlb_store
> > > 3,195,608 cpu_core/topdown-retiring/
> > > 40,156,649 cpu_core/topdown-mem-bound/
> > > 3,550,925 cpu_core/topdown-bad-spec/
> > > 117,571,818 cpu_core/topdown-fe-bound/
> > > 57,118,087 cpu_core/topdown-be-bound/
> > > 69,179 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/
> > > 4,582 cpu_core/MEM_INST_RETIRED.STLB_HIT_STORES/
> > > 30,183,104 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/
> > > 30,556,790 cpu_core/CPU_CLK_UNHALTED.THREAD/
> > > 168,486 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/
> > > 0.00 MEM_INST_RETIRED.STLB_HIT_STORES:p 0 0
> >
> > The output is not aligned and I think it's hard to read.
> > I think it should print the result like this:
> >
> > <sum> <event-name> # <val> average retired latency
>
> Since we would like to use the average retire latency, I would think put average
> at the beginning would be more consistent. So in format like:
> <val> <event-name> <sum> <count> or <val> <event-name> <count> <sum> ?

But it's not consistent with others. When I see the perf stat
output, I'd expect it shows the total count. And the average
latency is a derived value so I think it can be treated as a metric.

Thanks,
Namhyung