Re: [RFC PATCH v1 0/5] perf annotate: Add ARM64 data type profiling support
From: Tengda Wu
Date: Wed Jun 24 2026 - 02:40:01 EST
Hi Shuai,
On 2026/6/24 9:51, Shuai Xue wrote:
> Hi, Namhyung
>
> On 6/24/26 12:56 AM, Namhyung Kim wrote:
>> Hello,
>>
>> On Tue, Jun 23, 2026 at 09:02:29PM +0800, Shuai Xue wrote:
>>> `perf test -v "perf data type profiling tests"` fails on ARM64:
>>>
>>> Basic Rust perf annotate test
>>> perf mem record -o /tmp/perf.data perf test -w code_with_type
>>> perf annotate --code-with-type -i /tmp/perf.data --stdio --percent-limit 1
>>> Basic annotate [Failed: missing target data type]
>>>
>>> The root cause is that ARM64 lacks the instruction parsing infrastructure
>>> required for data type profiling. Specifically:
>>>
>>> 1. annotate_get_insn_location() cannot extract register numbers and
>>> memory offsets from ARM64 load/store instructions, because ARM64
>>> does not set objdump.register_char or objdump.memory_ref_char
>>> (unlike x86 which uses '%' and '(').
>>>
>>> 2. arch_supports_insn_tracking() does not include ARM64, so
>>> find_data_type_block() cannot perform instruction-level type state
>>> tracking.
>>>
>>> 3. init_type_state() has no ARM64 branch, leaving stack_reg as 0 (x0)
>>> after memset, which causes x0-based memory accesses to be
>>> misidentified as stack accesses.
>>>
>>> As a result, perf annotate --code-with-type silently produces no type
>>> annotations on ARM64, and the test grep for "# data-type: struct Buf"
>>> fails.
>>>
>>> This series adds ARM64 data type profiling support following the PowerPC
>>> model: decode raw 32-bit instruction words rather than parsing objdump
>>> text. ARM64's fixed-width encoding and trivial DWARF register mapping
>>> (x0-x30 = DWARF 0-30) make this approach clean and robust.
>>>
>>> Three classes of instructions are tracked for register state propagation:
>>> - ADRP: compute PC-relative page address for global variable resolution
>>> - ADD (immediate): combine with ADRP result to form full variable address
>>> - MOV (register): propagate type state between registers
>>>
>>> This covers the common `adrp + add + ldr/str` pattern that ARM64
>>> compilers emit for global variable access.
>>>
>>> Known limitations:
>>> - The `adrp + ldr` pattern (with :lo12: folded into the load offset,
>>> without an intermediate ADD) is not yet handled. This requires
>>> extending check_matching_type() to resolve TSR_KIND_CONST with the
>>> load offset, which can be added incrementally.
>>> - Pointer chain tracking (load-from-memory propagating type to the
>>> destination register) is not implemented, matching PowerPC's current
>>> scope.
>>>
>>> Testing:
>>> All four sub-tests in `perf test "perf data type profiling tests"`
>>> pass reliably on ARM64 (AArch64, SPE-capable hardware):
>>> - Basic/Pipe Rust: struct Buf (code_with_type workload)
>>> - Basic/Pipe C: struct buf (datasym workload, global variable)
>>>
>>> Patch breakdown:
>>> 1/5 Widen type_state_reg::imm_value from u32 to u64 (prerequisite
>>> for storing 64-bit addresses from ADRP)
>>> 2/5 Add arch__is_arm64() detection, raw instruction parsing from
>>> objdump output, and enable show_asm_raw for ARM64
>>> 3/5 Add get_arm64_regs() to extract registers and memory offsets
>>> from load/store instruction encodings (4 addressing modes)
>>> 4/5 Wire up ARM64 in annotate_get_insn_location(),
>>> arch_supports_insn_tracking(), and init_type_state()
>>> 5/5 Main patch: instruction classification, ADRP/ADD/MOV register
>>> state tracking, and architecture initialization
>>>
>>> Shuai Xue (5):
>>> perf annotate-data: Widen type_state_reg::imm_value to u64
>>> perf disasm: Add ARM64 architecture detection and raw instruction
>>> parsing
>>> perf dwarf-regs: Add ARM64 register and offset extraction from raw
>>> instructions
>>> perf annotate: Wire up ARM64 data type profiling infrastructure
>>> perf annotate-arch: Add ARM64 data type profiling support
>>
>> Thanks for the contribution!
>>
>> There was another series on this, please take a look. I hope you guys
>> can collaborate.
>>
>> https://lore.kernel.org/r/20260403094800.1418825-1-wutengda@xxxxxxxxxxxxxxx
>>
>
> Thanks for the pointer! I wasn't aware of Wutengda's series.
>
> I'll take a close look at it and compare our approaches. Since both
> series target ARM64 data type profiling, I'll reach out to Wutengda
> directly so we can align our efforts and avoid duplicated work.
>
>
> Best regards,
> Shuai Xue
>
Thanks for reaching out!
I'm currently finalizing the v3 of my patch series. Most of the work is
already done (19/21 patches verified), and I plan to send it out within
the next few days.
The overall approach remains unchanged and continues to follow the x86
implementation. However, compared to v2, it includes several bug fixes
and optimizations addressing known issues.
Once v3 is out, I'd appreciate it if you could review the series and see
if your specific use cases or ideas can be integrated on top of it.
Best regards,
Tengda