Re: [V4 00/16] Add data type profiling support for powerpc

From: Athira Rajeev
Date: Thu Jun 20 2024 - 11:31:55 EST




> On 14 Jun 2024, at 10:56 PM, Athira Rajeev <atrajeev@xxxxxxxxxxxxxxxxxx> wrote:
>
> The patchset from Namhyung added support for data type profiling
> in perf tool. This enabled support to associate PMU samples to data
> types they refer using DWARF debug information. With the upstream
> perf, currently it possible to run perf report or perf annotate to
> view the data type information on x86.
>
> Initial patchset posted here had changes need to enable data type
> profiling support for powerpc.
>
> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@xxxxxxxxxx/T/
>
> Main change were:
> 1. powerpc instruction nmemonic table to associate load/store
> instructions with move_ops which is use to identify if instruction
> is a memory access one.
> 2. To get register number and access offset from the given
> instruction, code uses fields from "struct arch" -> objump.
> Added entry for powerpc here.
> 3. A get_arch_regnum to return register number from the
> register name string.
>
> But the apporach used in the initial patchset used parsing of
> disassembled code which the current perf tool implementation does.
>
> Example: lwz r10,0(r9)
>
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. Also to find whether there is a memory
> reference in the operands, "memory_ref_char" field of objdump is used.
> For x86, "(" is used as memory_ref_char to tackle instructions of the
> form "mov (%rax), %rcx".
>
> In case of powerpc, not all instructions using "(" are the only memory
> instructions. Example, above instruction can also be of extended form (X
> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> and extract the source/target registers, second patchset added support to use
> raw instruction. With raw instruction, macros are added to extract opcode
> and register fields.
> Link to second patchset:
> https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@xxxxxxxxxxxxxxxxxx/
>
> Example representation using --show-raw-insn in objdump gives result:
>
> 38 01 81 e8 ld r4,312(r1)
>
> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> this translates to instruction form: "ld RT,DS(RA)" and binary code
> as:
> _____________________________________
> | 58 | RT | RA | DS | |
> -------------------------------------
> 0 6 11 16 30 31
>
> Second patchset used "objdump" again to read the raw instruction.
> But since there is no need to disassemble and binary code can be read
> directly from the DSO, third patchset (ie this patchset) uses below
> apporach. The apporach preferred in powerpc to parse sample for data
> type profiling in V3 patchset is:
> - Read directly from DSO using dso__data_read_offset
> - If that fails for any case, fallback to using libcapstone
> - If libcapstone is not supported, approach will use objdump
>
> Patchset adds support to pick the opcode and reg fields from this
> raw/binary instruction code. This approach came in from review comment
> by Segher Boessenkool and Christophe for the initial patchset.
>
> Apart from that, instruction tracking is enabled for powerpc and
> support function is added to find variables defined as registers
> Example, in powerpc, below two registers are
> defined to represent variable:
> 1. r13: represents local_paca
> register struct paca_struct *local_paca asm("r13");
>
> 2. r1: represents stack_pointer
> register void *__stack_pointer asm("r1");
>
> These are handled in this patchset.
>
> - Patch 1 is to rearrange register state type structures to header file
> so that it can referred from other arch specific files
> - Patch 2 is to make instruction tracking as a callback to"struct arch"
> so that it can be implemented by other archs easily and defined in arch
> specific files
> - Patch 3 adds support to capture and parse raw instruction in powerpc
> using dso__data_read_offset utility
> - Patch 4 adds logic to support using objdump when doing default "perf
> report" or "perf annotate" since it that needs disassembled instruction.
> - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
> - Patch 6 update parameters for reg extract functions to use raw
> instruction on powerpc
> - Patch 7 add support to identify memory instructions of opcode 31 in
> powerpc
> - Patch 8 adds more instructions to support instruction tracking in powerpc
> - Patch 9 and 10 handles instruction tracking for powerpc.
> - Patch 11, 12 and 13 add support to use libcapstone in powerpc
> - Patch 14 and patch 15 handles support to find global register variables
> - Patch 16 handles insn-stat option for perf annotate
>
> Note:
> - There are remaining unknowns (25%) as seen in annotate Instruction stats
> below.
> - This patchset is not tested on powerpc32. In next step of enhancements
> along with handling remaining unknowns, plan to cover powerpc32 changes
> based on how testing goes.
>
> With the current patchset:
>
> ./perf record -a -e mem-loads sleep 1
> ./perf report -s type,typeoff --hierarchy --group --stdio
> ./perf annotate --data-type --insn-stat
>
> perf annotate logs:
> ==================
>
> Annotate Instruction stats
> total 609, ok 446 (73.2%), bad 163 (26.8%)
>
> Name/opcode: Good Bad
> -----------------------------------------------------------
> 58 : 323 80
> 32 : 49 43
> 34 : 33 11
> OP_31_XOP_LDX : 8 20
> 40 : 23 0
> OP_31_XOP_LWARX : 5 1
> OP_31_XOP_LWZX : 2 3
> OP_31_XOP_LDARX : 3 0
> 33 : 0 2
> OP_31_XOP_LBZX : 0 1
> OP_31_XOP_LWAX : 0 1
> OP_31_XOP_LHZX : 0 1
>
> perf report logs:
> =================
>
> Total Lost Samples: 0
>
> Samples: 1K of event 'mem-loads'
> Event count (approx.): 937238
>
> Overhead Data Type Data Type Offset
> ........ ......... ................
>
> 48.60% (unknown) (unknown) +0 (no field)
> 12.85% long unsigned int long unsigned int +0 (current_stack_pointer)
> 4.68% struct paca_struct struct paca_struct +2312 (__current)
> 4.57% struct paca_struct struct paca_struct +2354 (irq_soft_mask)
> 2.69% struct paca_struct struct paca_struct +2808 (canary)
> 2.68% struct paca_struct struct paca_struct +8 (paca_index)
> 2.24% struct paca_struct struct paca_struct +48 (data_offset)
> 1.41% struct vm_fault struct vm_fault +0 (vma)
> 1.29% struct task_struct struct task_struct +276 (flags)
> 1.03% struct pt_regs struct pt_regs +264 (user_regs.msr)
> 0.90% struct security_hook_list struct security_hook_list +0 (list.next)
> 0.76% struct irq_desc struct irq_desc +304 (irq_data.chip)
> 0.76% struct rq struct rq +2856 (cpu)
>
> Thanks
> Athira Rajeev

Hi All

Requesting for review comments for this patchset

Thanks
Athira
>
> Changelog:
> From v3->v4:
> - Addressed review comments from Ian by using capston_init from
> "util/print_insn.c" instead of "open_capston_handle".
> - Addressed review comment from Namhyung by moving "opcode"
> field from "struct ins" to "struct disasm_line"
>
> From v2->v3:
> - Addressed review comments from Christophe and Namhyung for V2
> - Changed the apporach in powerpc to parse sample for data
> type profiling as:
> Read directly from DSO using dso__data_read_offset
> If that fails for any case, fallback to using libcapstone
> If libcapstone is not supported, approach will use objdump
> - Include instructions with opcode as 31 and correctly categorize
> them as memory or arithmetic instructions.
> - Include more instructions for instruction tracking in powerpc
>
> From v1->v2:
> - Addressed suggestion from Christophe Leroy and Segher Boessenkool
> to use the binary code (raw insn) to fetch opcode, register and
> offset fields.
> - Added support for instruction tracking in powerpc
> - Find the register defined variables (r13 and r1 which points to
> local_paca and current_stack_pointer in powerpc)
>
> Athira Rajeev (16):
> tools/perf: Move the data structures related to register type to
> header file
> tools/perf: Add "update_insn_state" callback function to handle arch
> specific instruction tracking
> tools/perf: Add support to capture and parse raw instruction in
> powerpc using dso__data_read_offset utility
> tools/perf: Use sort keys to determine whether to pick objdump to
> disassemble
> tools/perf: Add disasm_line__parse to parse raw instruction for
> powerpc
> tools/perf: Update parameters for reg extract functions to use raw
> instruction on powerpc
> tools/perf: Add support to identify memory instructions of opcode 31
> in powerpc
> tools/perf: Add some of the arithmetic instructions to support
> instruction tracking in powerpc
> tools/perf: Add more instructions for instruction tracking
> tools/perf: Update instruction tracking for powerpc
> tools/perf: Make capstone_init non-static so that it can be used
> during symbol disassemble
> tools/perf: Use capstone_init and remove open_capstone_handle from
> disasm.c
> tools/perf: Add support to use libcapstone in powerpc
> tools/perf: Add support to find global register variables using
> find_data_type_global_reg
> tools/perf: Add support for global_die to capture name of variable in
> case of register defined variable
> tools/perf: Set instruction name to be used with insn-stat when using
> raw instruction
>
> tools/include/linux/string.h | 2 +
> tools/lib/string.c | 13 +
> tools/perf/arch/arm64/annotate/instructions.c | 3 +-
> .../arch/loongarch/annotate/instructions.c | 6 +-
> .../perf/arch/powerpc/annotate/instructions.c | 260 +++++++++
> tools/perf/arch/powerpc/util/dwarf-regs.c | 53 ++
> tools/perf/arch/s390/annotate/instructions.c | 5 +-
> tools/perf/arch/x86/annotate/instructions.c | 383 +++++++++++++
> tools/perf/builtin-annotate.c | 4 +-
> tools/perf/util/annotate-data.c | 519 +++---------------
> tools/perf/util/annotate-data.h | 78 +++
> tools/perf/util/annotate.c | 35 +-
> tools/perf/util/annotate.h | 6 +-
> tools/perf/util/disasm.c | 475 ++++++++++++++--
> tools/perf/util/disasm.h | 13 +-
> tools/perf/util/dwarf-aux.c | 1 +
> tools/perf/util/dwarf-aux.h | 1 +
> tools/perf/util/include/dwarf-regs.h | 4 +
> tools/perf/util/print_insn.c | 15 +-
> tools/perf/util/print_insn.h | 5 +
> tools/perf/util/sort.c | 7 +-
> 21 files changed, 1386 insertions(+), 502 deletions(-)
>
> --
> 2.43.0
>