Re: [PATCH v2 00/16] perf arm64: Support data type profiling
From: Tengda Wu
Date: Thu Apr 16 2026 - 21:56:42 EST
On 2026/4/16 23:31, James Clark wrote:
>
>
> On 03/04/2026 10:47, Tengda Wu wrote:
>> This patch series implements data type profiling support for arm64,
>> building upon the foundational work previously contributed by Huafei [1].
>> While the initial version laid the groundwork for arm64 data type analysis,
>> this series iterates on that work by refining instruction parsing and
>> extending support for core architectural features.
>>
>> The series is organized as follows:
>>
>> 1. Fix disassembly mismatches (Patches 01-02)
>> Current perf annotate supports three disassembly backends: llvm,
>> capstone, and objdump. On arm64, inconsistencies between the output
>> of these backends (specifically llvm/capstone vs. objdump) often
>> prevent the tracker from correctly identifying registers and offsets.
>> These patches resolve these mismatches, ensuring consistent instruction
>> parsing across all supported backends.
>>
>> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>> These patches establish the necessary infrastructure for arm64-specific
>> operand handling. This includes implementing new callbacks and data
>> structures to manage arm64's unique addressing modes and register sets.
>> This foundation is essential for the subsequent type-tracking logic.
>>
>> 3. Core instruction tracking (Patches 08-16)
>> These patches implement the core logic for type tracking on arm64,
>> covering a wide range of instructions including:
>>
>> * Memory Access: ldr/str variants (including stack-based access).
>> * Arithmetic & Data Processing: mov, add, and adrp.
>> * Special Access: System register access (mrs) and per-cpu variable
>> tracking.
>>
>> The implementation draws inspiration from the existing x86 logic while
>> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
>> perf annotate can successfully resolve memory locations and register
>> types, enabling comprehensive data type profiling on arm64 platforms.
>>
>> Example Result
>> ==============
>>
>> # perf mem record -a -K -- sleep 1
>> # perf annotate --data-type --type-stat --stdio
>
> Hi Tengda,
>
> Did you run this with any itrace options? If I run your command I get repeated blocks of duplicate stats and types, which is very confusing. One for each sample type that we generate decoding SPE.
>
> For example the default perf report output has all these groups:
>
> Available samples
> 0 arm_spe_0/
> ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=30/
> 0 dummy:u
> 3 l1d-miss
> 18 l1d-access
> 0 llc-miss
> 0 llc-access
> 0 tlb-miss
> 22 tlb-access
> 0 branch
> 0 remote-access
> 22 memory
> 22 instructions
>
> Obviously there are 22 samples total (instructions) and they get duplicated into whatever other categories they happen to have flags for.
>
Yes, I agree. The duplication makes the type stats misleading.
De-duplication is definitely necessary here.
> To remove the duplicates you have to do --itrace=i1i. Could that need to be default for perf annotate with SPE?
>
I'll look into making this the default behavior for SPE data-type
annotation.
>> Annotate data type stats:
>> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
>> -----------------------------------------------------------
>> 29 : no_sym
>> 196 : no_var
>> 806 : no_typeinfo
>> 82 : bad_offset
>> 1370 : insn_track
>>
Here are the results with --itrace=i1i (a slight decrease in accuracy):
Annotate data type stats:
total 1138, ok 877 (77.1%), bad 261 (22.9%)
-----------------------------------------------------------
6 : no_sym
44 : no_var
197 : no_typeinfo
14 : bad_offset
238 : insn_track
This will be the new baseline, and I will work on further optimizations
from here.
Best regards,
Tengda
>> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
>> ============================================================================
>> Percent offset size field
>> 100.00 0 0x40 struct page {
>> 9.95 0 0x8 long unsigned int flags;
>> 52.83 0x8 0x28 union {
>> 52.83 0x8 0x28 struct {
>> 37.21 0x8 0x10 union {
>> 37.21 0x8 0x10 struct list_head lru {
>> 37.21 0x8 0x8 struct list_head* next;
>> 0.00 0x10 0x8 struct list_head* prev;
>> };
>> 37.21 0x8 0x10 struct {
>> 37.21 0x8 0x8 void* __filler;
>> 0.00 0x10 0x4 unsigned int mlock_count;
>> ...
>>
>> Changes since v1: (reworked from Huafei's series):
>>
>> - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>> and objdump disassembly backends.
>> - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>> - Extend instruction tracking to support mov and add instructions,
>> along with per-cpu and stack variables.
>> - Include real-world examples in commit messages to demonstrate
>> practical effects. (Namhyung Kim)
>> - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>> https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@xxxxxxxxxx/
>>
>> Please let me know if you have any feedback.
>>
>> Thanks,
>> Tengda
>>
>> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@xxxxxxxxxx/
>> [2] https://developer.arm.com/documentation/102374/0103
>> [3] https://github.com/flynd/asmsheets/releases/tag/v8
>>
>> ---
>>
>> Tengda Wu (16):
>> perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>> objdump
>> perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>> perf annotate-arm64: Generalize arm64_mov__parse to support standard
>> operands
>> perf annotate-arm64: Handle load and store instructions
>> perf annotate: Introduce extract_op_location callback for
>> arch-specific parsing
>> perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>> perf annotate-arm64: Implement extract_op_location() callback
>> perf annotate-arm64: Enable instruction tracking support
>> perf annotate-arm64: Support load instruction tracking
>> perf annotate-arm64: Support store instruction tracking
>> perf annotate-arm64: Support stack variable tracking
>> perf annotate-arm64: Support 'mov' instruction tracking
>> perf annotate-arm64: Support 'add' instruction tracking
>> perf annotate-arm64: Support 'adrp' instruction to track global
>> variables
>> perf annotate-arm64: Support per-cpu variable access tracking
>> perf annotate-arm64: Support 'mrs' instruction to track 'current'
>> pointer
>>
>> .../perf/util/annotate-arch/annotate-arm64.c | 642 +++++++++++++++++-
>> .../util/annotate-arch/annotate-powerpc.c | 10 +
>> tools/perf/util/annotate-arch/annotate-x86.c | 88 ++-
>> tools/perf/util/annotate-data.c | 72 +-
>> tools/perf/util/annotate-data.h | 7 +-
>> tools/perf/util/annotate.c | 108 +--
>> tools/perf/util/annotate.h | 12 +
>> tools/perf/util/capstone.c | 107 ++-
>> tools/perf/util/disasm.c | 5 +
>> tools/perf/util/disasm.h | 5 +
>> .../util/dwarf-regs-arch/dwarf-regs-arm64.c | 20 +
>> tools/perf/util/dwarf-regs.c | 2 +-
>> tools/perf/util/include/dwarf-regs.h | 1 +
>> tools/perf/util/llvm.c | 50 ++
>> 14 files changed, 984 insertions(+), 145 deletions(-)
>>
>>
>> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb