[PATCH 0/7] Add data type profiling support for arm64
From: Li Huafei
Date: Fri Mar 14 2025 - 04:22:00 EST
Hi,
This patchset supports arm64 perf data type profiling. Data type
profiling was introduced by Namhyung [1], which associates PMU sampling
(here referring to memory access-related event sampling) with the
referenced data types, providing developers with an effective tool for
analyzing the impact of memory usage and layout. For more detailed
background, please refer to [2].
Namhyung initially supported this feature only on x86, and later Athira
added support for it on powerpc [3]. Unlike the x86 implementation, the
powerpc implementation parses operands directly from raw instruction
code instead of using the results from assembler disassembly. As Athira
mentioned, this is mainly because not all memory access instructions on
powerpc have explicit memory reference assembler notations '()' in their
assembly code. On arm64, all memory access instructions have the
notation '[]', so my implementation is similar to x86, using the
disassembly results from objdump, llvm, or libcapstone, and parsing
based on strings. I believe this has the advantage of reusing the
complex instruction parsing logic of the assembler, but it may not
perform as well as raw instruction parsing in terms of efficiency.
Below is a brief description of this patchset:
- Patch 1 first identifies load and store instructions and provides a
parsing function.
- Patches 2-3 are refactoring patches. They primarily move the code for
extracting registers and offsets to specific architecture
implementations. Additionally, a new callback function
'extract_reg_offset' is introduced to avoid having too many
architecture-specific implementations in the function
'annotate_get_insn_location()'.
- Patch 4 implements the extract_reg_offset callback for arm64.
Currently, it does not support parsing instructions with register
pairs or register offsets in operands. Register pairs often appear in
stack push/pop instructions, and register offsets are common when
accessing per-CPU variables, both of which require special handling.
- Patch 5 adds support for instruction tracing on arm64, primarily
addressing the issue where DWARF does not generate information for
intermediate pointers in pointer chains.
- Patches 6-7 further enhance instruction tracing. Patch 6 supports
parsing accesses to global variables, while Patch 7 focuses on
resolving accesses to the kernel's current pointer.
There are still areas for improvement in the current implementation:
- Support more types of memory access instructions, such as those
involving register pairs and register offsets.
- Handle all data processing instructions (e.g., mov, add), as these
instructions can change the state of registers and may affect the
accuracy of instruction tracking.
- Supporting parsing of special memory access scenarios like per-CPU
variables and arrays.
The patch set is based on 6.14-rc6 (commit 80e54e84911a). After applying
this patch set, the date type profiling results on arm64 are as follows
(SPE support is required):
# perf mem record -a -K -- sleep 1
# perf annotate --data-type --type-stat --stdio
Only instruction-based sampling period is currently supported by Arm SPE.
Annotate data type stats:
total 556, ok 357 (64.2%), bad 199 (35.8%)
-----------------------------------------------------------
10 : no_sym
36 : no_insn_ops
65 : no_var
70 : no_typeinfo
18 : bad_offset
59 : insn_track
Annotate type: 'struct rq' in [kernel.kallsyms] (29 samples):
============================================================================
Percent offset size field
100.00 0 0xe80 struct rq {
0.00 0 0x4 raw_spinlock_t __lock {
0.00 0 0x4 arch_spinlock_t raw_lock {
0.00 0 0x4 union {
0.00 0 0x4 atomic_t val {
0.00 0 0x4 int counter;
};
0.00 0 0x2 struct {
0.00 0 0x1 u8 locked;
0.00 0x1 0x1 u8 pending;
};
0.00 0 0x4 struct {
0.00 0 0x2 u16 locked_pending;
0.00 0x2 0x2 u16 tail;
};
};
};
};
13.79 0x4 0x4 unsigned int nr_running;
13.79 0x8 0x4 unsigned int nr_numa_running;
0.00 0xc 0x4 unsigned int nr_preferred_running;
0.00 0x10 0x4 unsigned int numa_migrate_on;
0.00 0x18 0x8 long unsigned int last_blocked_load_update_tick;
0.00 0x20 0x4 unsigned int has_blocked_load;
0.00 0x40 0x20 call_single_data_t nohz_csd {
0.00 0x40 0x10 struct __call_single_node node {
0.00 0x40 0x8 struct llist_node llist {
0.00 0x40 0x8 struct llist_node* next;
};
0.00 0x48 0x4 union {
0.00 0x48 0x4 unsigned int u_flags;
0.00 0x48 0x4 atomic_t a_flags {
0.00 0x48 0x4 int counter;
};
};
...
Thanks,
Huafei
[1] https://lore.kernel.org/lkml/20231213001323.718046-1-namhyung@xxxxxxxxxx/
[2] https://lwn.net/Articles/955709/
[3] https://lore.kernel.org/all/20240718084358.72242-1-atrajeev@xxxxxxxxxxxxxxxxxx/#r
Li Huafei (7):
perf annotate: Handle arm64 load and store instructions
perf annotate: Advance the mem_ref check to mov__parse()
perf annotate: Add 'extract_reg_offset' callback function to extract
register number and access offset
perf annotate: Support for the 'extract_reg_offset' callback function
in arm64
perf annotate-data: Support instruction tracking for arm64
perf annotate-data: Handle arm64 global variable access
perf annotate-data: Handle the access to the 'current' pointer on
arm64
tools/perf/arch/arm64/annotate/instructions.c | 302 +++++++++++++++++-
.../perf/arch/powerpc/annotate/instructions.c | 10 +
tools/perf/arch/x86/annotate/instructions.c | 99 ++++++
tools/perf/util/Build | 1 +
tools/perf/util/annotate-data.c | 23 +-
tools/perf/util/annotate-data.h | 4 +-
tools/perf/util/annotate.c | 112 +------
tools/perf/util/disasm.c | 14 +
tools/perf/util/disasm.h | 4 +
tools/perf/util/dwarf-regs-arm64.c | 25 ++
tools/perf/util/include/dwarf-regs.h | 7 +
11 files changed, 490 insertions(+), 111 deletions(-)
create mode 100644 tools/perf/util/dwarf-regs-arm64.c
--
2.25.1