Re: [PATCH v2] perf stat: Fix crash on arm64

From: Ian Rogers

Date: Thu Mar 26 2026 - 17:21:24 EST


On Thu, Mar 26, 2026 at 9:39 AM Leo Yan <leo.yan@xxxxxxx> wrote:
>
> On Wed, Mar 25, 2026 at 03:27:05PM -0700, Ian Rogers wrote:
>
> [...]
>
> > > As Arnaldo mentioned in v1, I also found Segmentation fault when
> > > testing this patch:
> > >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > metricgroup__copy_metric_events (evlist=0xaaaaab037750, cgrp=0x0, new_metric_events=0xaaaaab038210, old_metric_events=0xaaaaab038d20) at util/metricgroup.c:1662
> > > 1662 evsel = evlist__find_evsel(evlist, old_me->evsel->core.idx);
> > > (gdb) bt
> > > #0 metricgroup__copy_metric_events (evlist=0xaaaaab037750, cgrp=0x0, new_metric_events=0xaaaaab038210, old_metric_events=0xaaaaab038d20) at util/metricgroup.c:1662
> > > #1 0x0000aaaaaab05870 in add_default_events () at builtin-stat.c:2110
> > > #2 0x0000aaaaaab08300 in cmd_stat (argc=0, argv=0xfffffffffaa0) at builtin-stat.c:2838
> > > #3 0x0000aaaaaab40998 in run_builtin (p=0xaaaaaaf9d428 <commands+360>, argc=4, argv=0xfffffffffaa0) at perf.c:348
> > > #4 0x0000aaaaaab40c14 in handle_internal_command (argc=4, argv=0xfffffffffaa0) at perf.c:398
> > > #5 0x0000aaaaaab40ddc in run_argv (argcp=0xfffffffff8bc, argv=0xfffffffff8b0) at perf.c:442
> > > #6 0x0000aaaaaab41110 in main (argc=4, argv=0xfffffffffaa0) at perf.c:549
> > >
> > > Last week I tested v1 and confirmed the issue was gone with the change,
> > > I will dig a bit in tomorrow and share back if any finding.
> > >
> > > Apologies for my lazy, as I should double check once Arnaldo
> > > pointed out in v1.
> >
> >
> > From the stack trace I don't see anything that should allow this.
> > Either the list of metrics is broken, or the struct metric_event's
> > evsel is NULL. But that should never happen as we wouldn't have a
> > metric at that point. The sorting shouldn't affect that. If you can
> > reproduce the issue, verbose logs may help.
>
> I believe I encountered a different issue, which is irrelevant to this
> patch. So Breno's patch is fine for me.
>
> On one of my board, I can see the log:
>
> Events in 'frontend_bound' fully contained within 'retiring'
> Events in 'bad_speculation' fully contained within 'retiring'
> Events in 'backend_bound' fully contained within 'retiring'

I looked at nvidia/t410/metrics.json but I'm not clear on where the
issue is coming from.

>
> Then, in parse_groups(), it selects fully contained list:
>
> list_for_each_entry(n, &metric_list, nd) {
> ...
> if (expr__subset_of_ids(n->pctx, m->pctx)) {
> metric_evlist = n->evlist;
> break;
> }
> }
>
> // As metric_evlist has been set, so below does not run
> if (!metric_evlist) {
> ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier,
> m->group_events, tool_events, &m->evlist);
> if (ret)
> goto out;
>
> metric_evlist = m->evlist;
> }
>
> The problem is it skips to call parse_ids() and m->pctx is not
> initialized. This leads to pick_display_evsel() returns NULL and
> pass NULL pointer to metricgroup__lookup(). In the end, the NULL
> pointer is consumed in metricgroup__copy_metric_events().
>
> How about below change ? I tested it which can dismiss the issue.
>
> Subject: [PATCH] perf metricgroup: Fix segment fault in
> metricgroup__copy_metric_events()
>
> In parse_groups(), when find a event is fully contained by a previous
> event, it skips to call parse_ids(), as a result, m->pctx->ids is not
> initialized. Then, setup_metric_events() returns an empty metric
> events, pick_display_evsel() consumes the returned metric_events and
> feeds to metricgroup__lookup() with passing "evsel = NULL".

Fully contained groups exist on x86, why isn't this problem breaking
whenever this happens? Stepping through "contained" examples, I see
that the ids aren't initialized. I think something else must be
happening.

> metricgroup__copy_metric_events() access old_me->evsel->core.idx, due
> to the evsel is set to NULL in metricgroup__lookup(), it causes
> segment fault.
>
> This commit corrects the fully contained case to allow it to parse IDs,
> thus this can effectively avoid setup NULL pointer in
> metricgroup__lookup().
>
> Fixes: 5ecd5a0c7d1c ("perf metrics: Modify setup and deduplication")
> Signed-off-by: Leo Yan <leo.yan@xxxxxxx>
> ---
> tools/perf/util/metricgroup.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 7e39d469111b..528a6462876b 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -1431,7 +1431,7 @@ static int parse_groups(struct evlist *perf_evlist,
> list_for_each_entry(m, &metric_list, nd) {
> struct metric_event *me;
> struct evsel **metric_events;
> - struct evlist *metric_evlist = NULL;
> + struct evlist *metric_evlist = NULL, *contained = NULL;
> struct metric *n;
> struct metric_expr *expr;
>
> @@ -1463,19 +1463,18 @@ static int parse_groups(struct evlist *perf_evlist,
> if (expr__subset_of_ids(n->pctx, m->pctx)) {
> pr_debug("Events in '%s' fully contained within '%s'\n",
> m->metric_name, n->metric_name);
> - metric_evlist = n->evlist;
> + contained = n->evlist;
> break;
> }
> -
> }
> }
> if (!metric_evlist) {
> + metric_evlist = contained ? contained : m->evlist;
> +
> ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier,
> - m->group_events, tool_events, &m->evlist);
> + m->group_events, tool_events, &metric_evlist);

Won't this match the behavior of metric_no_merge/--metric-no-merge,
since for every metric the events for that metric are being appended
to the evlist?

Thanks,
Ian


> if (ret)
> goto out;
> -
> - metric_evlist = m->evlist;
> }
> ret = setup_metric_events(fake_pmu ? "all" : m->pmu, m->pctx->ids,
> metric_evlist, &metric_events);
> --
> 2.34.1