Re: [PATCH 6/7] perf expr: Add metric literals for topology.
From: Jiri Olsa
Date: Wed Nov 10 2021 - 13:29:26 EST
On Wed, Nov 10, 2021 at 09:58:57AM -0800, Ian Rogers wrote:
> On Wed, Nov 10, 2021 at 6:35 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> >
> > On Wed, Nov 10, 2021 at 06:19:04AM -0800, Ian Rogers wrote:
> > > On Wed, Nov 10, 2021 at 4:56 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Nov 05, 2021 at 10:09:42AM -0700, Ian Rogers wrote:
> > > > > Allow the number of cpus, cores, dies and packages to be queried by a
> > > > > metric expression.
> > > > >
> > > > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > > > > ---
> > > > > tools/perf/tests/expr.c | 12 +++++++++++-
> > > > > tools/perf/util/expr.c | 27 +++++++++++++++++++++++++++
> > > > > 2 files changed, 38 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
> > > > > index 9ee2dc91c27b..0c09ccc76665 100644
> > > > > --- a/tools/perf/tests/expr.c
> > > > > +++ b/tools/perf/tests/expr.c
> > > > > @@ -66,7 +66,7 @@ int test__expr(struct test *t __maybe_unused, int subtest __maybe_unused)
> > > > > {
> > > > > struct expr_id_data *val_ptr;
> > > > > const char *p;
> > > > > - double val;
> > > > > + double val, num_cpus, num_cores, num_dies, num_packages;
> > > > > int ret;
> > > > > struct expr_parse_ctx *ctx;
> > > > >
> > > > > @@ -161,6 +161,16 @@ int test__expr(struct test *t __maybe_unused, int subtest __maybe_unused)
> > > > > NULL, ctx) == 0);
> > > > > TEST_ASSERT_VAL("find ids", hashmap__size(ctx->ids) == 0);
> > > > >
> > > > > + /* Test toplogy constants appear well ordered. */
> > > > > + expr__ctx_clear(ctx);
> > > > > + TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0);
> > > > > + TEST_ASSERT_VAL("#num_cores", expr__parse(&num_cores, ctx, "#num_cores") == 0);
> > > > > + TEST_ASSERT_VAL("#num_cpus >= #num_cores", num_cpus >= num_cores);
> > > > > + TEST_ASSERT_VAL("#num_dies", expr__parse(&num_dies, ctx, "#num_dies") == 0);
> > > > > + TEST_ASSERT_VAL("#num_cores >= #num_dies", num_cores >= num_dies);
> > > > > + TEST_ASSERT_VAL("#num_packages", expr__parse(&num_packages, ctx, "#num_packages") == 0);
> > > > > + TEST_ASSERT_VAL("#num_dies >= #num_packages", num_dies >= num_packages);
> > > > > +
> > > > > expr__ctx_free(ctx);
> > > > >
> > > > > return 0;
> > > > > diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> > > > > index 7464739c2890..15af8b8ef5e7 100644
> > > > > --- a/tools/perf/util/expr.c
> > > > > +++ b/tools/perf/util/expr.c
> > > > > @@ -5,6 +5,8 @@
> > > > > #include <stdlib.h>
> > > > > #include <string.h>
> > > > > #include "metricgroup.h"
> > > > > +#include "cpumap.h"
> > > > > +#include "cputopo.h"
> > > > > #include "debug.h"
> > > > > #include "expr.h"
> > > > > #include "expr-bison.h"
> > > > > @@ -375,9 +377,34 @@ double expr_id_data__value(const struct expr_id_data *data)
> > > > >
> > > > > double expr__get_literal(const char *literal)
> > > > > {
> > > > > + static struct cpu_topology *topology;
> > > > > +
> > > > > if (!strcmp("#smt_on", literal))
> > > > > return smt_on() > 0 ? 1.0 : 0.0;
> > > > >
> > > > > + if (!strcmp("#num_cpus", literal))
> > > > > + return cpu__max_present_cpu();
> > > > > +
> > > > > + /*
> > > > > + * Assume that topology strings are consistent, such as CPUs "0-1"
> > > > > + * wouldn't be listed as "0,1", and so after deduplication the number of
> > > > > + * these strings gives an indication of the number of packages, dies,
> > > > > + * etc.
> > > > > + */
> > > > > + if (!topology) {
> > > > > + topology = cpu_topology__new();
> > > >
> > > > any chance we could propagate expr_scanner_ctx in here and store topology
> > > > to it and release it at the end? I think we have several places like this,
> > > > so it'd be nice not to make more if it's possible ;-)
> > >
> > > The topology here is static and so will only get computed once per
> > > execution rather than once pre expression parse. I was worried about
> > > the cost of recomputing the topology for something like 'perf stat -I
> > > 1000 -M ...' in which case the static will do less recomputation.
> >
> > can't we have the topology created/release on the top fo the parsing
> > and released after all expressions are parsed?
> >
> > or we could come up with some generic way to handle this kind of release
>
> Creating the topology at the top of metric parsing would incur a sysfs
> parsing cost on metrics regardless of whether they used the topology.
> I feel a lazy approach is better to avoid this cost. We could make
> this part of the expression context and lazily initialize it there.
> It'd be good to keep the expression context around in 'perf stat' in
> that case. I quite like this approach as really the topology should be
> part of the perf environment and part of the header. It is interesting
> to think how can we get metrics to work with 'perf stat record'.
> Currently you will get metrics for something like 'perf stat record -e
> instructions,cycles -a sleep 2; perf stat report', but none baked in
> metrics don't work and we don't even have the metric-id to make this
> work (even using '-M IPC' doesn't work). Putting the environment in
> the session, making the topology part of it, writing/reading data
> based on that seems like worthwhile clean up but beyond the scope of
> what I was trying to do here.
>
> In terms of releasing for the sake of memory leaks, leak sanitizer
> doesn't treat memory allocated and reachable from a global as a leak.
> We can't precompute the topology and so this style of approach is what
> I'm used to. Putting the topology into the parsing context and lazily
> initializing would be the second best option imo, and I can do that if
> this is a blocker.
ok, let's leave it as it is.. thanks for the details
jirka