Re: [PATCH V6 0/3] riscv: Add perf callchain support

From: Greentime Hu
Date: Wed Sep 04 2019 - 03:26:04 EST


Mao Han <han_mao@xxxxxxxxx> æ 2019å8æ29æ éå äå2:57åéï
>
> This patch set add perf callchain(FP/DWARF) support for RISC-V.
> It comes from the csky version callchain support with some
> slight modifications. The patchset base on Linux 5.3-rc6.
>
> Changes since v5:
> - use walk_stackframe from stacktrace.c to handle
> kernel callchain unwinding(fix invalid mem access)
>
> Changes since v4:
> - Add missing PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET
> verified with extra CFLAGS(-Wall -Werror)
>
> Changes since v3:
> - Add more strict check for unwind_frame_kernel
> - update for kernel 5.3
>
> Changes since v2:
> - fix inconsistent comment
> - force to build kernel with -fno-omit-frame-pointer if perf
> event is enabled
>
> Changes since v1:
> - simplify implementation and code convention
>
> Cc: Paul Walmsley <paul.walmsley@xxxxxxxxxx>
> Cc: Greentime Hu <green.hu@xxxxxxxxx>
> Cc: Palmer Dabbelt <palmer@xxxxxxxxxx>
> Cc: linux-riscv <linux-riscv@xxxxxxxxxxxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Guo Ren <guoren@xxxxxxxxxx>
>
> Mao Han (3):
> riscv: Add perf callchain support
> riscv: Add support for perf registers sampling
> riscv: Add support for libdw
>
> arch/riscv/Kconfig | 2 +
> arch/riscv/Makefile | 3 +
> arch/riscv/include/uapi/asm/perf_regs.h | 42 ++++++++++++
> arch/riscv/kernel/Makefile | 4 +-
> arch/riscv/kernel/perf_callchain.c | 95 ++++++++++++++++++++++++++
> arch/riscv/kernel/perf_regs.c | 44 ++++++++++++
> arch/riscv/kernel/stacktrace.c | 2 +-
> tools/arch/riscv/include/uapi/asm/perf_regs.h | 42 ++++++++++++
> tools/perf/Makefile.config | 6 +-
> tools/perf/arch/riscv/Build | 1 +
> tools/perf/arch/riscv/Makefile | 4 ++
> tools/perf/arch/riscv/include/perf_regs.h | 96 +++++++++++++++++++++++++++
> tools/perf/arch/riscv/util/Build | 2 +
> tools/perf/arch/riscv/util/dwarf-regs.c | 72 ++++++++++++++++++++
> tools/perf/arch/riscv/util/unwind-libdw.c | 57 ++++++++++++++++
> 15 files changed, 469 insertions(+), 3 deletions(-)
> create mode 100644 arch/riscv/include/uapi/asm/perf_regs.h
> create mode 100644 arch/riscv/kernel/perf_callchain.c
> create mode 100644 arch/riscv/kernel/perf_regs.c
> create mode 100644 tools/arch/riscv/include/uapi/asm/perf_regs.h
> create mode 100644 tools/perf/arch/riscv/Build
> create mode 100644 tools/perf/arch/riscv/Makefile
> create mode 100644 tools/perf/arch/riscv/include/perf_regs.h
> create mode 100644 tools/perf/arch/riscv/util/Build
> create mode 100644 tools/perf/arch/riscv/util/dwarf-regs.c
> create mode 100644 tools/perf/arch/riscv/util/unwind-libdw.c
>

Tested-by: Greentime Hu <greentime.hu@xxxxxxxxxx>

I tested this patchset based on v5.3-rc6 and it can use dwarf or fp to
backtrace in Unleashed board.

# perf record -e cpu-clock --call-graph dwarf ls -l /
total 4
drwxr-xr-x 2 root root 0 Aug 26 2019 bin
drwxr-xr-x 5 root root 12720 Jan 1 00:00 dev
drwxr-xr-x 5 root root 0 Jan 1 00:00 etc
-rwxr-xr-x 1 root root 178 Aug 26 2019 init
drwxr-xr-x 2 root root 0 Aug 26 2019 lib
lrwxrwxrwx 1 root root 3 Aug 19 2019 lib64 -> lib
lrwxrwxrwx 1 root root 11 Aug 19 2019 linuxrc -> bin/busybox
drwxr-xr-x 2 root root 0 Aug 19 2019 media
drwxr-xr-x 2 root root 0 Aug 19 2019 mnt
drwxr-xr-x 2 root root 0 Aug 19 2019 opt
dr-xr-xr-x 66 root root 0 Jan 1 00:00 proc
drwx------ 3 root root 0 Jan 1 00:01 root
drwxr-xr-x 3 root root 140 Jan 1 00:00 run
drwxr-xr-x 2 root root 0 Aug 19 2019 sbin
dr-xr-xr-x 11 root root 0 Jan 1 00:00 sys
drwxrwxrwt 2 root root 60 Jan 1 00:00 tmp
drwxr-xr-x 6 root root 0 Aug 26 2019 usr
drwxr-xr-x 4 root root 0 Aug 26 2019 var
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.175 MB perf.data (21 samples) ]

# perf record -e cpu-clock --call-graph fp ls -l /
total 4
drwxr-xr-x 2 root root 0 Aug 26 2019 bin
drwxr-xr-x 5 root root 12720 Jan 1 00:00 dev
drwxr-xr-x 5 root root 0 Jan 1 00:00 etc
-rwxr-xr-x 1 root root 178 Aug 26 2019 init
drwxr-xr-x 2 root root 0 Aug 26 2019 lib
lrwxrwxrwx 1 root root 3 Aug 19 2019 lib64 -> lib
lrwxrwxrwx 1 root root 11 Aug 19 2019 linuxrc -> bin/busybox
drwxr-xr-x 2 root root 0 Aug 19 2019 media
drwxr-xr-x 2 root root 0 Aug 19 2019 mnt
drwxr-xr-x 2 root root 0 Aug 19 2019 opt
dr-xr-xr-x 66 root root 0 Jan 1 00:00 proc
drwx------ 3 root root 0 Jan 1 00:00 root
drwxr-xr-x 3 root root 140 Jan 1 00:00 run
drwxr-xr-x 2 root root 0 Aug 19 2019 sbin
dr-xr-xr-x 11 root root 0 Jan 1 00:00 sys
drwxrwxrwt 2 root root 60 Jan 1 00:00 tmp
drwxr-xr-x 6 root root 0 Aug 26 2019 usr
drwxr-xr-x 4 root root 0 Aug 26 2019 var
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.004 MB perf.data (19 samples) ]

# perf test
1: vmlinux symtab matches kallsyms : Skip
2: Detect openat syscall event : FAILED!
3: Detect openat syscall event on all cpus : FAILED!
4: Read samples using the mmap interface : FAILED!
5: Test data source output : Ok
6: Parse event definition strings : FAILED!
7: Simple expression parser : Ok
8: PERF_RECORD_* events & perf_sample fields : FAILED!
9: Parse perf pmu format : Ok
10: DSO data read : Ok
11: DSO data cache : Ok
12: DSO data reopen : Ok
13: Roundtrip evsel->name : Ok
14: Parse sched tracepoints fields : Ok
15: syscalls:sys_enter_openat event fields : FAILED!
16: Setup struct perf_event_attr : Skip
17: Match and link multiple hists : Ok
18: 'import perf' in python : FAILED!
19: Breakpoint overflow signal handler : FAILED!
20: Breakpoint overflow sampling : FAILED!
21: Breakpoint accounting : Skip
22: Watchpoint :
22.1: Read Only Watchpoint : FAILED!
22.2: Write Only Watchpoint : FAILED!
22.3: Read / Write Watchpoint : FAILED!
22.4: Modify Watchpoint : FAILED!
23: Number of exit events of a simple workload : Ok
24: Software clock events period values : Ok
25: Object code reading : Ok
26: Sample parsing : Ok
27: Use a dummy software event to keep tracking: Ok
28: Parse with no sample_id_all bit set : Ok
29: Filter hist entries : Ok
30: Lookup mmap thread : Ok
31: Share thread mg : Ok
32: Sort output of hist entries : Ok
33: Cumulate child hist entries : Ok
34: Track with sched_switch : FAILED!
35: Filter fds with revents mask in a fdarray : Ok
36: Add fd to a fdarray, making it autogrow : Ok
37: kmod_path__parse : Ok
38: Thread map : Ok
39: LLVM search and compile :
39.1: Basic BPF llvm compile : Skip
39.2: kbuild searching : Skip
39.3: Compile source for BPF prologue generation: Skip
39.4: Compile source for BPF relocation : Skip
40: Session topology : FAILED!
41: BPF filter :
41.1: Basic BPF filtering : Skip
41.2: BPF pinning : Skip
41.3: BPF prologue generation : Skip
41.4: BPF relocation checker : Skip
42: Synthesize thread map : Ok
43: Remove thread map : Ok
44: Synthesize cpu map : Ok
45: Synthesize stat config : Ok
46: Synthesize stat : Ok
47: Synthesize stat round : Ok
48: Synthesize attr update : Ok
49: Event times : Ok
50: Read backward ring buffer : Skip
51: Print cpu map : Ok
52: Probe SDT events : Skip
53: is_printable_array : Ok
54: Print bitmap : Ok
55: perf hooks : Ok
56: builtin clang support : Skip (not compiled in)
57: unit_number__scnprintf : Ok
58: mem2node : Ok
59: time utils : Ok
60: map_groups__merge_in : Ok
#