Re: perf: Question about machine__create_extra_kernel_maps and trampoline symbols
From: Ian Rogers
Date: Thu Feb 13 2025 - 12:02:32 EST
On Thu, Feb 13, 2025 at 5:10 AM Krzysztof Łopatowski
<krzysztof.m.lopatowski@xxxxxxxxx> wrote:
>
> Hi,
>
> I'm investigating performance issues with perf's kallsyms parsing. Running
> `perf record -g perf trace -a --max-events 1` on an x86_64 Ubuntu 24.10 on a VM
> (perf version 6.11) showed that about 61% of time was spent in
> 'kallsyms__parse'.
> Total execution time was 370 ms. When running latest version from
> tmp.perf-tools-next
> It's 530ms total and 38% in 'kallsyms__parse' because the old version
> doesn't have
> bpf skeletons enabled.
> During regular execution this function is called three times:
>
> 1. In machine__get_running_kernel_start - searching for _text
> 2. In machine__get_running_kernel_start - searching for _edata
> 3. In machine__create_extra_kernel_maps - which is the focus of my question
>
> Regarding the third call (implemented in tools/perf/arch/x86/util/machine.c),
> I notice it searches for:
> - _entry_trampoline
> - __entry_SYSCALL_64_trampoline
>
> I'm puzzled by the dynamic allocation in add_extra_kernel_map, which seems to
> expect multiple __entry_SYSCALL_64_trampoline symbols. This functionality was
> introduced in:
> https://lore.kernel.org/all/1526986485-6562-1-git-send-email-adrian.hunter@xxxxxxxxx/
>
> I've attempted to trigger the trampoline logic in two ways:
>
> 1. Using the example provided (uname_x_n.c), which only recorded these symbols:
> - entry_SYSCALL_64_after_hwframe
> - entry_SYSCALL_64
> - entry_SYSCALL_64_safe_stack
>
> 2. Setting kprobes and kretprobes to try to make the kernel create these special
> trampoline symbols, but this approach also didn't work.
>
> Questions for the perf developer community:
> 1. Is there a reliable way to trigger this trampoline logic in perf? I'd like to
> create a perf test for this functionality.
> 2. If machine__create_extra_kernel_maps is obsolete (since it's
> x86_64-specific),
> could we remove it to reduce /proc/kallsyms parsing time by at least 50%?
>
> I'm working on a patch to simplify machine__create_kernel_maps to call
> kallsyms__parse only once. However, I would appreciate guidance from those more
> familiar with perf.
>
> Side note: Could exposing the kernel's lookup_symbol_name function
> (from kernel/kallsyms.c) to userspace eliminate the need for reading
> /proc/kallsyms?
Thanks for caring and your analysis! I think Adrian can best speak on
the possibility for performance wins. We do have a kallsyms parsing
benchmark:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/bench/kallsyms-parse.c?h=perf-tools-next
I was curious to know if the regression is also visible there?
Thanks,
Ian