Re: [PATCH v2 00/16] perf arm64: Support data type profiling
From: James Clark
Date: Wed Apr 22 2026 - 05:54:13 EST
On 03/04/2026 10:47, Tengda Wu wrote:
This patch series implements data type profiling support for arm64,
building upon the foundational work previously contributed by Huafei [1].
While the initial version laid the groundwork for arm64 data type analysis,
this series iterates on that work by refining instruction parsing and
extending support for core architectural features.
The series is organized as follows:
1. Fix disassembly mismatches (Patches 01-02)
Current perf annotate supports three disassembly backends: llvm,
capstone, and objdump. On arm64, inconsistencies between the output
of these backends (specifically llvm/capstone vs. objdump) often
prevent the tracker from correctly identifying registers and offsets.
These patches resolve these mismatches, ensuring consistent instruction
parsing across all supported backends.
Did you try recording the Perf datasym workload? With llvm-objdump I only get hits on data1 and not data2. And with binutils I don't get any hits on that struct at all, although the rest of the samples in ld-linux-aarch64.so etc look roughly the same between binutils and llvm. I would have thought such a simple example like datasym would work with both.
$ perf record -e arm_spe_0/load_filter=1,store_filter=1,
min_latency=30/u -c 10000 -- perf test -w datasym
$ perf annotate --data-type --type-stat --itrace=i1i --stdio
With llvm-objdump-14:
Annotate data type stats:
total 25, ok 19 (76.0%), bad 6 (24.0%)
-----------------------------------------------------------
1 : no_sym
1 : no_mem_ops
3 : no_var
1 : no_typeinfo
9 : insn_track
Annotate type: 'struct buf' in build/local/perf (6663 samples):
===============================================================
Percent offset size field
100.00 0 0x40 struct buf {
100.00 0 0x1 char data1;
0.00 0x1 0x37 char[] reserved;
0.00 0x38 0x1 char data2;
};
With binutils that entry is missing:
Annotate data type stats:
total 25, ok 14 (56.0%), bad 11 (44.0%)
-----------------------------------------------------------
1 : no_sym
1 : no_cuinfo
3 : no_var
6 : no_typeinfo
4 : insn_track
...
But with the following patch I get plausible output for datasym with llvm where both entries in the struct have hits. It looks like you need to add the offset when calling get_global_var_type() for TSR_KIND_GLOBAL_ADDR otherwise all entries point to the first member of the struct:
Annotate data type stats:
total 4, ok 2 (50.0%), bad 2 (50.0%)
-----------------------------------------------------------
1 : no_sym
1 : no_typeinfo
2 : insn_track
Annotate type: 'struct buf' in build/local/perf (35 samples):
=====================================================================
Percent offset size field
100.00 0 0x39 struct buf {
40.00 0 0x1 char data1;
0.00 0x1 0x37 char[] reserved;
60.00 0x38 0x1 char data2;
};
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 7161417d1c76..0e5825121227 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1287,7 +1287,9 @@ static enum type_match_result check_matching_type(struct type_state *state,
* The register holds the address of a global variable. Try to
* find the variable by the address and get its type.
*/
- if (get_global_var_type(cu_die, dloc, dloc->ip, state->regs[reg].addr,
+ var_addr = state->regs[reg].addr + dloc->op->offset;
+
+ if (get_global_var_type(cu_die, dloc, dloc->ip, var_addr,
&var_offset, type_die)) {
dloc->type_offset = var_offset;
2. Infrastructure for arm64 operand parsing (Patches 03-07)
These patches establish the necessary infrastructure for arm64-specific
operand handling. This includes implementing new callbacks and data
structures to manage arm64's unique addressing modes and register sets.
This foundation is essential for the subsequent type-tracking logic.
3. Core instruction tracking (Patches 08-16)
These patches implement the core logic for type tracking on arm64,
covering a wide range of instructions including:
* Memory Access: ldr/str variants (including stack-based access).
* Arithmetic & Data Processing: mov, add, and adrp.
* Special Access: System register access (mrs) and per-cpu variable
tracking.
The implementation draws inspiration from the existing x86 logic while
adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
perf annotate can successfully resolve memory locations and register
types, enabling comprehensive data type profiling on arm64 platforms.
Example Result
==============
# perf mem record -a -K -- sleep 1
# perf annotate --data-type --type-stat --stdio
Annotate data type stats:
total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
-----------------------------------------------------------
29 : no_sym
196 : no_var
806 : no_typeinfo
82 : bad_offset
1370 : insn_track
Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
============================================================================
Percent offset size field
100.00 0 0x40 struct page {
9.95 0 0x8 long unsigned int flags;
52.83 0x8 0x28 union {
52.83 0x8 0x28 struct {
37.21 0x8 0x10 union {
37.21 0x8 0x10 struct list_head lru {
37.21 0x8 0x8 struct list_head* next;
0.00 0x10 0x8 struct list_head* prev;
};
37.21 0x8 0x10 struct {
37.21 0x8 0x8 void* __filler;
0.00 0x10 0x4 unsigned int mlock_count;
...
Changes since v1: (reworked from Huafei's series):
- Fix inconsistencies in arm64 instruction output across llvm, capstone,
and objdump disassembly backends.
- Support arm64-specific addressing modes and operand formats. (Leo Yan)
- Extend instruction tracking to support mov and add instructions,
along with per-cpu and stack variables.
- Include real-world examples in commit messages to demonstrate
practical effects. (Namhyung Kim)
- Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@xxxxxxxxxx/
Please let me know if you have any feedback.
Thanks,
Tengda
[1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@xxxxxxxxxx/
[2] https://developer.arm.com/documentation/102374/0103
[3] https://github.com/flynd/asmsheets/releases/tag/v8
---
Tengda Wu (16):
perf llvm: Fix arm64 adrp instruction disassembly mismatch with
objdump
perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
perf annotate-arm64: Generalize arm64_mov__parse to support standard
operands
perf annotate-arm64: Handle load and store instructions
perf annotate: Introduce extract_op_location callback for
arch-specific parsing
perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
perf annotate-arm64: Implement extract_op_location() callback
perf annotate-arm64: Enable instruction tracking support
perf annotate-arm64: Support load instruction tracking
perf annotate-arm64: Support store instruction tracking
perf annotate-arm64: Support stack variable tracking
perf annotate-arm64: Support 'mov' instruction tracking
perf annotate-arm64: Support 'add' instruction tracking
perf annotate-arm64: Support 'adrp' instruction to track global
variables
perf annotate-arm64: Support per-cpu variable access tracking
perf annotate-arm64: Support 'mrs' instruction to track 'current'
pointer
.../perf/util/annotate-arch/annotate-arm64.c | 642 +++++++++++++++++-
.../util/annotate-arch/annotate-powerpc.c | 10 +
tools/perf/util/annotate-arch/annotate-x86.c | 88 ++-
tools/perf/util/annotate-data.c | 72 +-
tools/perf/util/annotate-data.h | 7 +-
tools/perf/util/annotate.c | 108 +--
tools/perf/util/annotate.h | 12 +
tools/perf/util/capstone.c | 107 ++-
tools/perf/util/disasm.c | 5 +
tools/perf/util/disasm.h | 5 +
.../util/dwarf-regs-arch/dwarf-regs-arm64.c | 20 +
tools/perf/util/dwarf-regs.c | 2 +-
tools/perf/util/include/dwarf-regs.h | 1 +
tools/perf/util/llvm.c | 50 ++
14 files changed, 984 insertions(+), 145 deletions(-)
base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb