On Tue, Nov 12, 2024 at 11:53 AM Leo Yan <leo.yan@xxxxxxx> wrote:
On Sat, Oct 26, 2024 at 05:17:57AM -0700, Ian Rogers wrote:
Whilst for many tools it is an expected behavior that failure to open
a perf event is a failure, ARM decided to name PMU events the same as
legacy events and then failed to rename such events on a server uncore
SLC PMU. As perf's default behavior when no PMU is specified is to
open the event on all PMUs that advertise/"have" the event, this
yielded failures when trying to make the priority of legacy and
sysfs/json events uniform - something requested by RISC-V and ARM. A
legacy event user on ARM hardware may find their event opened on an
uncore PMU which for perf record will fail. Arnaldo suggested skipping
such events which this patch implements. Rather than have the skipping
conditional on running on ARM, the skipping is done on all
architectures as such a fundamental behavioral difference could lead
to problems with tools built/depending on perf.
An example of perf record failing to open events on x86 is:
```
$ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.
Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
$ perf report --stats
Aggregated stats:
TOTAL events: 17255
MMAP events: 284 ( 1.6%)
COMM events: 1961 (11.4%)
EXIT events: 1 ( 0.0%)
FORK events: 1960 (11.4%)
SAMPLE events: 87 ( 0.5%)
MMAP2 events: 12836 (74.4%)
KSYMBOL events: 83 ( 0.5%)
BPF_EVENT events: 36 ( 0.2%)
FINISHED_ROUND events: 2 ( 0.0%)
ID_INDEX events: 1 ( 0.0%)
THREAD_MAP events: 1 ( 0.0%)
CPU_MAP events: 1 ( 0.0%)
TIME_CONV events: 1 ( 0.0%)
FINISHED_INIT events: 1 ( 0.0%)
cycles stats:
SAMPLE events: 87
```
Thanks for James reminding me. Tested on AVA platform:
# tree /sys/bus/event_source/devices/arm_dsu_*/events
...
/sys/bus/event_source/devices/arm_dsu_9/events
├── bus_access
├── bus_cycles
├── cycles
├── l3d_cache
├── l3d_cache_allocate
├── l3d_cache_refill
├── l3d_cache_wb
└── memory_error
# ./perf record -- sleep 0.1
Error:
Failure to open event 'cycles:PH' on PMU 'arm_dsu_0' which will be
removed.
cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'
Error:
Failure to open event 'cycles:PH' on PMU 'arm_dsu_1' which will be
removed.
cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'
...
Error:
Failure to open event 'cycles:PH' on PMU 'arm_dsu_15' which will be
removed.
cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.008 MB perf.data (8 samples) ]
# ./perf report --stats
Aggregated stats:
TOTAL events: 67
MMAP events: 40 (59.7%)
COMM events: 1 ( 1.5%)
SAMPLE events: 8 (11.9%)
KSYMBOL events: 6 ( 9.0%)
BPF_EVENT events: 6 ( 9.0%)
FINISHED_ROUND events: 1 ( 1.5%)
ID_INDEX events: 1 ( 1.5%)
THREAD_MAP events: 1 ( 1.5%)
CPU_MAP events: 1 ( 1.5%)
TIME_CONV events: 1 ( 1.5%)
FINISHED_INIT events: 1 ( 1.5%)
cycles:P stats:
SAMPLE events: 8
# ./perf stat -- sleep 0.1
Performance counter stats for 'sleep 0.1':
0.87 msec task-clock # 0.009 CPUs utilized
1 context-switches # 1.148 K/sec
0 cpu-migrations # 0.000 /sec
52 page-faults # 59.685 K/sec
877,835 instructions # 1.14 insn per cycle
# 0.25 stalled cycles per insn
772,102 cycles # 886.210 M/sec
191,914 stalled-cycles-frontend # 24.86% frontend cycles idle
219,183 stalled-cycles-backend # 28.39% backend cycles idle
184,099 branches # 211.307 M/sec
8,548 branch-misses # 4.64% of all branches
0.101623529 seconds time elapsed
0.001645000 seconds user
0.000000000 seconds sys
Tested-by: Leo Yan <leo.yan@xxxxxxx>
Thanks Leo! As the Tested-by makes sense only if you've applied all 4
patches, which your testing and James' testing shows you've both done,
I'll add the tags to all 4 patches. I'll do likewise with Atish,
rebase and resend the patches.
Thanks,
Ian