Re: [PATCH v4 12/12] perf test: improve pmu event metric testing

From: Ian Rogers
Date: Sun May 03 2020 - 13:31:56 EST


On Sun, May 3, 2020 at 10:06 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>
> On Sun, May 03, 2020 at 08:26:22AM -0700, Ian Rogers wrote:
> > On Sun, May 3, 2020 at 7:56 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > >
> > > On Fri, May 01, 2020 at 10:33:33AM -0700, Ian Rogers wrote:
> > > > Add a basic floating point number test to expr.
> > > > Break pmu-events test into 2 and add a test to verify that all pmu metric
> > > > expressions simply parse. Try to parse all metric ids/events, failing if
> > > > metrics for the current architecture fail to parse.
> > > >
> > > > Tested on skylakex with the patch set in place. May fail on other
> > > > architectures if metrics are invalid.
> > >
> > > yep, failing for me (-vvv output below).. could you plz
> > > detect that and skip the test ?
> >
> > Thanks, filtering the verbose output we have just 1 parse event failure:
> >
> > Parse event failed: id 'arb/event=0x80,umask=0x2,thresh=1/' metric
> > 'DRAM_Parallel_Reads' expr 'arb@event\=0x80\,umask\=0x2@ /
> > arb@event\=0x80\,umask\=0x2\,thresh\=1@'
> > Error string 'unknown term 'thresh' for pmu 'uncore_arb'' help 'valid
> > terms: event,edge,inv,umask,cmask,config,config1,config2,name,period,freq,branch_type,time,call-graph,stack-size,no-inherit,inherit,max-stack,nr,no-overwrite,overwrite,driver-config,percore,aux-output,aux-sample-size'
> >
> > This looks like a bug in skl-metrics.json:
> >
> > {
> > "BriefDescription": "Average number of parallel data read
> > requests to external memory. Accounts for demand loads and L1/L2
> > prefetches",
> > "MetricExpr": "arb@event\\=0x80\\,umask\\=0x2@ /
> > arb@event\\=0x80\\,umask\\=0x2\\,thresh\\=1@",
> > "MetricGroup": "Memory_BW",
> > "MetricName": "DRAM_Parallel_Reads"
> > },
> >
> > which can be fixed by removing "\\,thresh\\=1" but looking at the
> > expression this will just make the expression yield a value of 1. As
> > this is an Intel json file could they comment? Jiri, could you be
> > missing a patch on the kernel side? We could lower this failure to
> > just a diagnostic message to land this set of patches, let me know
> > what you'd like me to do.
>
> I applied this on current Arnaldo's perf/core.. not sure there's
> more pending changes out there
>
> I'd like not to delay this patchset too long.. could we push the
> first 10 patches and solve the rest in separate change?

Thanks, I've attached a patch that can be squashed into 12 to make the
error non-fatal. Patch 11 is trying to make the diagnostics around
adding a PMU event clearer and aside warning messages, and removal of,
has no functional effect. I don't mind the first 10 being merged and
these coming later. I don't mind just patch 11 coming later as it'd be
nice to have the test so metrics can get fixed.

Thanks,
Ian

> thanks,
> jirka
>
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 5ab1809b741b..c18b9ce8cace 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -404,12 +404,13 @@ static int check_parse_id(const char *id, bool same_cpu, struct pmu_event *pe)
memset(&error, 0, sizeof(error));
ret = parse_events(evlist, id, &error);
if (ret && same_cpu) {
- pr_debug("Parse event failed: id '%s' metric '%s' expr '%s'\n",
- id, pe->metric_name, pe->metric_expr);
- pr_debug("Error string '%s' help '%s'\n",
+ fprintf(stderr,
+ "\nWARNING: Parse event failed metric '%s' id '%s' expr '%s'\n",
+ pe->metric_name, id, pe->metric_expr);
+ fprintf(stderr, "Error string '%s' help '%s'\n",
error.str, error.help);
} else if (ret) {
- pr_debug("Parse event failed, but for an event that may not be supported by this CPU.\nid '%s' metric '%s' expr '%s'\n",
+ pr_debug3("Parse event failed, but for an event that may not be supported by this CPU.\nid '%s' metric '%s' expr '%s'\n",
id, pe->metric_name, pe->metric_expr);
}
evlist__delete(evlist);
@@ -417,7 +418,8 @@ static int check_parse_id(const char *id, bool same_cpu, struct pmu_event *pe)
free(error.help);
free(error.first_str);
free(error.first_help);
- return same_cpu ? ret : 0;
+ /* TODO: too many metrics are broken to fail on this test currently. */
+ return 0;
}

static int test_parsing(void)