Re: [PATCH v2] perf stat: Fix crash on arm64
From: Leo Yan
Date: Wed Mar 25 2026 - 17:01:21 EST
On Wed, Mar 25, 2026 at 03:24:30AM -0700, Breno Leitao wrote:
> Perf stat is crashing on arm64 hosts with the following issue:
>
> # make -C tools/perf DEBUG=1
> # perf stat sleep 1
> perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
> [1] 1220794 IOT instruction (core dumped) ./perf stat
>
> The sorting function introduced by commit a745c0831c15c ("perf stat:
> Sort default events/metrics") compares events based on their individual
> properties. This can cause events from different groups to be
> interleaved, resulting in group members appearing before their leaders
> in the sorted evlist.
>
> When the iterator opens events in list order, a group member may be
> processed before its leader has been opened.
>
> For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
> could be sorted before its leader, causing the crash when CPU_CYCLES
> tries to get its group fd from the not-yet-opened leader.
>
> Fix this by comparing events based on their leader's attributes instead
> of their own attributes when the events are in different groups. This
> ensures all members of a group share the same sort key as their leader,
> keeping groups together and guaranteeing leaders are opened before their
> members.
>
> Reported-by: Denis Yaroshevskiy <dyaroshev@xxxxxxxx>
> Fixes: a745c0831c15c ("perf stat: Sort default events/metrics")
> Tested-by: Dmitry Ilvokhin <d@xxxxxxxxxxxx>
> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
As Arnaldo mentioned in v1, I also found Segmentation fault when
testing this patch:
Program received signal SIGSEGV, Segmentation fault.
metricgroup__copy_metric_events (evlist=0xaaaaab037750, cgrp=0x0, new_metric_events=0xaaaaab038210, old_metric_events=0xaaaaab038d20) at util/metricgroup.c:1662
1662 evsel = evlist__find_evsel(evlist, old_me->evsel->core.idx);
(gdb) bt
#0 metricgroup__copy_metric_events (evlist=0xaaaaab037750, cgrp=0x0, new_metric_events=0xaaaaab038210, old_metric_events=0xaaaaab038d20) at util/metricgroup.c:1662
#1 0x0000aaaaaab05870 in add_default_events () at builtin-stat.c:2110
#2 0x0000aaaaaab08300 in cmd_stat (argc=0, argv=0xfffffffffaa0) at builtin-stat.c:2838
#3 0x0000aaaaaab40998 in run_builtin (p=0xaaaaaaf9d428 <commands+360>, argc=4, argv=0xfffffffffaa0) at perf.c:348
#4 0x0000aaaaaab40c14 in handle_internal_command (argc=4, argv=0xfffffffffaa0) at perf.c:398
#5 0x0000aaaaaab40ddc in run_argv (argcp=0xfffffffff8bc, argv=0xfffffffff8b0) at perf.c:442
#6 0x0000aaaaaab41110 in main (argc=4, argv=0xfffffffffaa0) at perf.c:549
Last week I tested v1 and confirmed the issue was gone with the change,
I will dig a bit in tomorrow and share back if any finding.
Apologies for my lazy, as I should double check once Arnaldo
pointed out in v1.
Thanks,
Leo