[PATCH V3 00/14] Add data type profiling support for powerpc

From: Athira Rajeev
Date: Sat Jun 01 2024 - 02:10:37 EST


The patchset from Namhyung added support for data type profiling
in perf tool. This enabled support to associate PMU samples to data
types they refer using DWARF debug information. With the upstream
perf, currently it possible to run perf report or perf annotate to
view the data type information on x86.

Initial patchset posted here had changes need to enable data type
profiling support for powerpc.

https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@xxxxxxxxxx/T/

Main change were:
1. powerpc instruction nmemonic table to associate load/store
instructions with move_ops which is use to identify if instruction
is a memory access one.
2. To get register number and access offset from the given
instruction, code uses fields from "struct arch" -> objump.
Added entry for powerpc here.
3. A get_arch_regnum to return register number from the
register name string.

But the apporach used in the initial patchset used parsing of
disassembled code which the current perf tool implementation does.

Example: lwz r10,0(r9)

This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. Also to find whether there is a memory
reference in the operands, "memory_ref_char" field of objdump is used.
For x86, "(" is used as memory_ref_char to tackle instructions of the
form "mov (%rax), %rcx".

In case of powerpc, not all instructions using "(" are the only memory
instructions. Example, above instruction can also be of extended form (X
form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
and extract the source/target registers, second patchset added support to use
raw instruction. With raw instruction, macros are added to extract opcode
and register fields.
Link to second patchset:
https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@xxxxxxxxxxxxxxxxxx/

Example representation using --show-raw-insn in objdump gives result:

38 01 81 e8 ld r4,312(r1)

Here "38 01 81 e8" is the raw instruction representation. In powerpc,
this translates to instruction form: "ld RT,DS(RA)" and binary code
as:
_____________________________________
| 58 | RT | RA | DS | |
-------------------------------------
0 6 11 16 30 31

Second patchset used "objdump" again to read the raw instruction.
But since there is no need to disassemble and binary code can be read
directly from the DSO, third patchset (ie this patchset) uses below
apporach. The apporach preferred in powerpc to parse sample for data
type profiling in V3 patchset is:
- Read directly from DSO using dso__data_read_offset
- If that fails for any case, fallback to using libcapstone
- If libcapstone is not supported, approach will use objdump

Patchset adds support to pick the opcode and reg fields from this
raw/binary instruction code. This approach came in from review comment
by Segher Boessenkool and Christophe for the initial patchset.

Apart from that, instruction tracking is enabled for powerpc and
support function is added to find variables defined as registers
Example, in powerpc, below two registers are
defined to represent variable:
1. r13: represents local_paca
register struct paca_struct *local_paca asm("r13");

2. r1: represents stack_pointer
register void *__stack_pointer asm("r1");

These are handled in this patchset.

- Patch 1 is to rearrange register state type structures to header file
so that it can referred from other arch specific files
- Patch 2 is to make instruction tracking as a callback to"struct arch"
so that it can be implemented by other archs easily and defined in arch
specific files
- Patch 3 adds support to capture and parse raw instruction in powerpc
using dso__data_read_offset utility
- Patch 4 adds logic to support using objdump when doing default "perf
report" or "perf annotate" since it that needs disassembled instruction.
- Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
- Patch 6 update parameters for reg extract functions to use raw
instruction on powerpc
- Patch 7 add support to identify memory instructions of opcode 31 in
powerpc
- Patch 8 adds more instructions to support instruction tracking in powerpc
- Patch 9 and 10 handles instruction tracking for powerpc.
- Patch 11 add support to use libcapstone in powerpc
- Patch 12 and patch 13 handles support to find global register variables
- Patch 14 handles insn-stat option for perf annotate

Note:
- There are remaining unknowns (25%) as seen in annotate Instruction stats
below.
- This patchset is not tested on powerpc32. In next step of enhancements
along with handling remaining unknowns, plan to cover powerpc32 changes
based on how testing goes.

With the current patchset:

./perf record -a -e mem-loads sleep 1
./perf report -s type,typeoff --hierarchy --group --stdio
./perf annotate --data-type --insn-stat

perf annotate logs:
==================

Annotate Instruction stats
total 609, ok 446 (73.2%), bad 163 (26.8%)

Name/opcode: Good Bad
-----------------------------------------------------------
58 : 323 80
32 : 49 43
34 : 33 11
OP_31_XOP_LDX : 8 20
40 : 23 0
OP_31_XOP_LWARX : 5 1
OP_31_XOP_LWZX : 2 3
OP_31_XOP_LDARX : 3 0
33 : 0 2
OP_31_XOP_LBZX : 0 1
OP_31_XOP_LWAX : 0 1
OP_31_XOP_LHZX : 0 1

perf report logs:
=================

Total Lost Samples: 0

Samples: 1K of event 'mem-loads'
Event count (approx.): 937238

Overhead Data Type Data Type Offset
........ ......... ................

48.60% (unknown) (unknown) +0 (no field)
12.85% long unsigned int long unsigned int +0 (current_stack_pointer)
4.68% struct paca_struct struct paca_struct +2312 (__current)
4.57% struct paca_struct struct paca_struct +2354 (irq_soft_mask)
2.69% struct paca_struct struct paca_struct +2808 (canary)
2.68% struct paca_struct struct paca_struct +8 (paca_index)
2.24% struct paca_struct struct paca_struct +48 (data_offset)
1.41% struct vm_fault struct vm_fault +0 (vma)
1.29% struct task_struct struct task_struct +276 (flags)
1.03% struct pt_regs struct pt_regs +264 (user_regs.msr)
0.90% struct security_hook_list struct security_hook_list +0 (list.next)
0.76% struct irq_desc struct irq_desc +304 (irq_data.chip)
0.76% struct rq struct rq +2856 (cpu)

Thanks
Athira Rajeev

Changelog:
>From v2->v3:
- Addressed review comments from Christophe and Namhyung for V2
- Changed the apporach in powerpc to parse sample for data
type profiling as:
Read directly from DSO using dso__data_read_offset
If that fails for any case, fallback to using libcapstone
If libcapstone is not supported, approach will use objdump
- Include instructions with opcode as 31 and correctly categorize
them as memory or arithmetic instructions.
- Include more instructions for instruction tracking in powerpc

>From v1->v2:
- Addressed suggestion from Christophe Leroy and Segher Boessenkool
to use the binary code (raw insn) to fetch opcode, register and
offset fields.
- Added support for instruction tracking in powerpc
- Find the register defined variables (r13 and r1 which points to
local_paca and current_stack_pointer in powerpc)

Athira Rajeev (14):
tools/perf: Move the data structures related to register type to
header file
tools/perf: Add "update_insn_state" callback function to handle arch
specific instruction tracking
tools/perf: Add support to capture and parse raw instruction in
powerpc using dso__data_read_offset utility
tools/perf: Use sort keys to determine whether to pick objdump to
disassemble
tools/perf: Add disasm_line__parse to parse raw instruction for
powerpc
tools/perf: Update parameters for reg extract functions to use raw
instruction on powerpc
tools/perf: Add support to identify memory instructions of opcode 31
in powerpc
tools/perf: Add some of the arithmetic instructions to support
instruction tracking in powerpc
tools/perf: Add more instructions for instruction tracking
tools/perf: Update instruction tracking for powerpc
tools/perf: Add support to use libcapstone in powerpc
tools/perf: Add support to find global register variables using
find_data_type_global_reg
tools/perf: Add support for global_die to capture name of variable in
case of register defined variable
tools/perf: Set instruction name to be used with insn-stat when using
raw instruction

tools/include/linux/string.h | 2 +
tools/lib/string.c | 13 +
.../perf/arch/powerpc/annotate/instructions.c | 260 +++++++++
tools/perf/arch/powerpc/util/dwarf-regs.c | 53 ++
tools/perf/arch/x86/annotate/instructions.c | 383 +++++++++++++
tools/perf/builtin-annotate.c | 4 +-
tools/perf/util/annotate-data.c | 519 +++---------------
tools/perf/util/annotate-data.h | 78 +++
tools/perf/util/annotate.c | 35 +-
tools/perf/util/annotate.h | 1 +
tools/perf/util/disasm.c | 442 ++++++++++++++-
tools/perf/util/disasm.h | 18 +-
tools/perf/util/dwarf-aux.c | 1 +
tools/perf/util/dwarf-aux.h | 1 +
tools/perf/util/include/dwarf-regs.h | 4 +
tools/perf/util/sort.c | 7 +-
16 files changed, 1364 insertions(+), 457 deletions(-)

--
2.43.0