Re: [RFC/PATCHSET 0/5] perf tools: Support deferred user callchains (v2)
From: Liang, Kan
Date: Wed Sep 18 2024 - 16:27:11 EST
On 2024-09-17 6:28 p.m., Namhyung Kim wrote:
> Hello,
>
> This is a counterpart for Josh's kernel change v2 [1] to support deferred
> user callchains. The change is transparent and users should not notice
> anything with the deferred callchains.
>
> $ perf record -g sleep 1
>
> I added --[no-]merge-callchains option to control output of perf script.
> You can verify it has the deferred callchains like this:
>
> $ perf script --no-merge-callchains
> perf 801 [000] 18.031793: 1 cycles:P:
> ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
>
> perf 801 [000] 18.031814: DEFERRED CALLCHAIN
> 7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
>
> ...
>
> When the callchain is merged (it's the default) it'd look like below:
>
> $ perf script
> perf 801 [000] 18.031793: 1 cycles:P:
> ffffffff91a14c36 __intel_pmu_enable_all.isra.0+0x56 ([kernel.kallsyms])
> ffffffff91d373e9 perf_ctx_enable+0x39 ([kernel.kallsyms])
> ffffffff91d36af7 event_function+0xd7 ([kernel.kallsyms])
> ffffffff91d34222 remote_function+0x42 ([kernel.kallsyms])
> ffffffff91c1ebe1 generic_exec_single+0x61 ([kernel.kallsyms])
> ffffffff91c1edac smp_call_function_single+0xec ([kernel.kallsyms])
> ffffffff91d37a9d event_function_call+0x10d ([kernel.kallsyms])
> ffffffff91d33557 perf_event_for_each_child+0x37 ([kernel.kallsyms])
> ffffffff91d47324 _perf_ioctl+0x204 ([kernel.kallsyms])
> ffffffff91d47c43 perf_ioctl+0x33 ([kernel.kallsyms])
> ffffffff91e2f216 __x64_sys_ioctl+0x96 ([kernel.kallsyms])
> ffffffff9265f1ae do_syscall_64+0x9e ([kernel.kallsyms])
> ffffffff92800130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
> 7fb5fc22034b __GI___ioctl+0x3b (/usr/lib/x86_64-linux-gnu/libc.so.6)
>
> ...
>
> Notice that the last line and it has the __GI___ioctl in the same
> callchain. It should work with other tools like perf report.
It seems it only works with perf report -D, when I test it on a
non-hybrid machine.
$perf record -e branches -g -c 3000000 ~/tchain_edit
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.397 MB perf.data ]
$ perf report -D | tail -n 17
Aggregated stats:
TOTAL events: 8235
MMAP events: 78 ( 0.9%)
COMM events: 2 ( 0.0%)
EXIT events: 1 ( 0.0%)
SAMPLE events: 4060 (49.3%)
MMAP2 events: 2 ( 0.0%)
KSYMBOL events: 12 ( 0.1%)
BPF_EVENT events: 12 ( 0.1%)
CALLCHAIN_DEFERRED events: 4060 (49.3%)
FINISHED_ROUND events: 3 ( 0.0%)
ID_INDEX events: 1 ( 0.0%)
THREAD_MAP events: 1 ( 0.0%)
CPU_MAP events: 1 ( 0.0%)
TIME_CONV events: 1 ( 0.0%)
FINISHED_INIT events: 1 ( 0.0%)
$ perf report
Error:
The perf.data data has no samples!
# To display the perf.data header info, please use
--header/--header-only options.
#
On a hybrid machine, perf record errors out.
$perf record -g true
[ perf record: Woken up 1 times to write data ]
0x58a8 [0x38]: failed to process type: 22 [Bad address]
[ perf record: Captured and wrote 0.022 MB perf.data ]
Thanks,
Kan
>
> The code is available at 'perf/defer-callchain-v2' branch in
> https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
> [1] https://lore.kernel.org/lkml/cover.1726268190.git.jpoimboe@xxxxxxxxxx
>
>
> Namhyung Kim (5):
> perf tools: Sync UAPI perf_event.h header
> perf tools: Minimal DEFERRED_CALLCHAIN support
> perf record: Enable defer_callchain for user callchains
> perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
> perf tools: Merge deferred user callchains
>
> tools/include/uapi/linux/perf_event.h | 21 +++++-
> tools/lib/perf/include/perf/event.h | 7 ++
> tools/perf/Documentation/perf-script.txt | 5 ++
> tools/perf/builtin-script.c | 92 +++++++++++++++++++++++
> tools/perf/util/callchain.c | 24 ++++++
> tools/perf/util/callchain.h | 3 +
> tools/perf/util/event.c | 1 +
> tools/perf/util/evlist.c | 1 +
> tools/perf/util/evlist.h | 1 +
> tools/perf/util/evsel.c | 32 +++++++-
> tools/perf/util/evsel.h | 1 +
> tools/perf/util/machine.c | 1 +
> tools/perf/util/perf_event_attr_fprintf.c | 1 +
> tools/perf/util/sample.h | 3 +-
> tools/perf/util/session.c | 78 +++++++++++++++++++
> tools/perf/util/tool.c | 2 +
> tools/perf/util/tool.h | 4 +-
> 17 files changed, 273 insertions(+), 4 deletions(-)
>