Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
From: Marc Zyngier
Date: Tue Nov 21 2023 - 10:24:34 EST
On Tue, 21 Nov 2023 13:40:31 +0000,
Marc Zyngier <maz@xxxxxxxxxx> wrote:
>
> [Adding key people on Cc]
>
> On Tue, 21 Nov 2023 12:08:48 +0000,
> Hector Martin <marcan@xxxxxxxxx> wrote:
> >
> > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
>
> I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> asymmetric ARM platform. It isn't clear what criteria is used to pick
> the PMU, but nothing works anymore.
>
> The saving grace in my case is that Debian still ships a 6.1 perftool
> package, but that's obviously not going to last.
>
> I'm happy to test potential fixes.
At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
-vvv. And it is quite entertaining (this is taskset to an 'icestorm'
CPU):
<quote>
maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
apple_firestorm_pmu/cycles/ -e cycles ls
Using CPUID 0x00000000612f0280
Attempt to add: apple_icestorm_pmu/cycles=0/
..after resolving event: apple_icestorm_pmu/cycles=0/
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xb00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -95
Attempt to add: apple_firestorm_pmu/cycles=0/
..after resolving event: apple_firestorm_pmu/cycles=0/
Control descriptor is not initialized
Opening: apple_icestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 1045843 cpu -1 group_fd -1 flags 0x8 = 3
Opening: apple_firestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 1045843 cpu -1 group_fd -1 flags 0x8 = 4
Opening: cycles
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 1045843 cpu -1 group_fd -1 flags 0x8 = 5
arch builtin-diff.o builtin-mem.o common-cmds.h perf-completion.sh
bench builtin-evlist.c builtin-probe.c CREDITS perf.h
Build builtin-evlist.o builtin-probe.o design.txt perf-in.o
builtin-annotate.c builtin-ftrace.c builtin-record.c dlfilters perf-iostat
builtin-annotate.o builtin-ftrace.o builtin-record.o Documentation perf-iostat.sh
builtin-bench.c builtin.h builtin-report.c FEATURE-DUMP perf.o
builtin-bench.o builtin-help.c builtin-report.o include perf-read-vdso.c
builtin-buildid-cache.c builtin-help.o builtin-sched.c jvmti perf-sys.h
builtin-buildid-cache.o builtin-inject.c builtin-script.c libapi PERF-VERSION-FILE
builtin-buildid-list.c builtin-inject.o builtin-script.o libperf perf-with-kcore
builtin-buildid-list.o builtin-kallsyms.c builtin-stat.c libsubcmd pmu-events
builtin-c2c.c builtin-kallsyms.o builtin-stat.o libsymbol python
builtin-c2c.o builtin-kmem.c builtin-timechart.c Makefile python_ext_build
builtin-config.c builtin-kvm.c builtin-top.c Makefile.config scripts
builtin-config.o builtin-kvm.o builtin-top.o Makefile.perf tests
builtin-daemon.c builtin-kwork.c builtin-trace.c MANIFEST trace
builtin-daemon.o builtin-list.c builtin-version.c perf ui
builtin-data.c builtin-list.o builtin-version.o perf-archive util
builtin-data.o builtin-lock.c check-headers.sh perf-archive.sh
builtin-diff.c builtin-mem.c command-list.txt perf.c
apple_icestorm_pmu/cycles/: -1: 0 873709 0
apple_firestorm_pmu/cycles/: -1: 0 873709 0
cycles: -1: 0 873709 0
apple_icestorm_pmu/cycles/: 0 873709 0
apple_firestorm_pmu/cycles/: 0 873709 0
cycles: 0 873709 0
Performance counter stats for 'ls':
<not counted> apple_icestorm_pmu/cycles/ (0.00%)
<not counted> apple_firestorm_pmu/cycles/ (0.00%)
<not counted> cycles (0.00%)
0.000002250 seconds time elapsed
0.000000000 seconds user
0.000000000 seconds sys
</quote>
If I run the same thing on another CPU cluster (firestorm), I get
this:
<quote>
maz@valley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
apple_firestorm_pmu/cycles/ -e cycles ls
Using CPUID 0x00000000612f0280
Attempt to add: apple_icestorm_pmu/cycles=0/
..after resolving event: apple_icestorm_pmu/cycles=0/
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xb00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -95
Attempt to add: apple_firestorm_pmu/cycles=0/
..after resolving event: apple_firestorm_pmu/cycles=0/
Control descriptor is not initialized
Opening: apple_icestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 1045925 cpu -1 group_fd -1 flags 0x8 = 3
Opening: apple_firestorm_pmu/cycles/
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 1045925 cpu -1 group_fd -1 flags 0x8 = 4
Opening: cycles
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 1045925 cpu -1 group_fd -1 flags 0x8 = 5
arch builtin-diff.o builtin-mem.o common-cmds.h perf-completion.sh
bench builtin-evlist.c builtin-probe.c CREDITS perf.h
Build builtin-evlist.o builtin-probe.o design.txt perf-in.o
builtin-annotate.c builtin-ftrace.c builtin-record.c dlfilters perf-iostat
builtin-annotate.o builtin-ftrace.o builtin-record.o Documentation perf-iostat.sh
builtin-bench.c builtin.h builtin-report.c FEATURE-DUMP perf.o
builtin-bench.o builtin-help.c builtin-report.o include perf-read-vdso.c
builtin-buildid-cache.c builtin-help.o builtin-sched.c jvmti perf-sys.h
builtin-buildid-cache.o builtin-inject.c builtin-script.c libapi PERF-VERSION-FILE
builtin-buildid-list.c builtin-inject.o builtin-script.o libperf perf-with-kcore
builtin-buildid-list.o builtin-kallsyms.c builtin-stat.c libsubcmd pmu-events
builtin-c2c.c builtin-kallsyms.o builtin-stat.o libsymbol python
builtin-c2c.o builtin-kmem.c builtin-timechart.c Makefile python_ext_build
builtin-config.c builtin-kvm.c builtin-top.c Makefile.config scripts
builtin-config.o builtin-kvm.o builtin-top.o Makefile.perf tests
builtin-daemon.c builtin-kwork.c builtin-trace.c MANIFEST trace
builtin-daemon.o builtin-list.c builtin-version.c perf ui
builtin-data.c builtin-list.o builtin-version.o perf-archive util
builtin-data.o builtin-lock.c check-headers.sh perf-archive.sh
builtin-diff.c builtin-mem.c command-list.txt perf.c
apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
cycles: -1: 1034653 469125 469125
apple_icestorm_pmu/cycles/: 1035101 469125 469125
apple_firestorm_pmu/cycles/: 1035035 469125 469125
cycles: 1034653 469125 469125
Performance counter stats for 'ls':
1,035,101 apple_icestorm_pmu/cycles/
1,035,035 apple_firestorm_pmu/cycles/
1,034,653 cycles
0.000001333 seconds time elapsed
0.000000000 seconds user
0.000000000 seconds sys
</quote>
which doesn't make any sense either. I really don't understand what
this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
nor what this 'cycle=0' stuff is.
/puzzled
M.
--
Without deviation from the norm, progress is not possible.