Re: [PATCH V3 00/17] perf tools and x86 PTI entry trampolines
From: Arnaldo Carvalho de Melo
Date: Wed May 23 2018 - 14:42:39 EST
Em Tue, May 22, 2018 at 01:54:28PM +0300, Adrian Hunter escreveu:
> Original Cover email:
>
> Perf tools do not know about x86 PTI entry trampolines - see example
> below. These patches add a workaround, namely "perf tools: Workaround
> missing maps for x86 PTI entry trampolines", which has the limitation
> that it hard codes the addresses. Note that the workaround will work for
> old kernels and old perf.data files, but not for future kernels if the
> trampoline addresses are ever changed.
>
> At present, perf tools uses /proc/kallsyms to construct a memory map for
> the kernel. Recording such a map in the perf.data file is necessary to
> deal with kernel relocation and KASLR.
>
> While it is reasonable on its own terms, to add symbols for the trampolines
> to /proc/kallsyms, the motivation here is to have perf tools use them to
> create memory maps in the same fashion as is done for the kernel text.
>
> So the first 2 patches add symbols to /proc/kallsyms for the trampolines:
>
> kallsyms: Simplify update_iter_mod()
> kallsyms, x86: Export addresses of syscall trampolines
>
> perf tools have the ability to use /proc/kcore (in conjunction with
> /proc/kallsyms) as the kernel image. So the next 2 patches add program
> headers for the trampolines to the kcore ELF:
>
> x86: Add entry trampolines to kcore
> x86: kcore: Give entry trampolines all the same offset in kcore
>
> It is worth noting that, with the kcore changes alone, perf tools require
> no changes to recognise the trampolines when using /proc/kcore.
>
> Similarly, if perf tools are used with a matching kallsyms only (by denying
> access to /proc/kcore or a vmlinux image), then the kallsyms patches are
> sufficient to recognise the trampolines with no changes needed to the
> tools.
>
> However, in the general case, when using vmlinux or dealing with
> relocations, perf tools needs memory maps for the trampolines. Because the
> kernel text map is constructed as a special case, using the same approach
> for the trampolines means treating them as a special case also, which
> requires a number of changes to perf tools, and the remaining patches deal
> with that.
>
>
> Example: make a program that does lots of small syscalls e.g.
>
> $ cat uname_x_n.c
>
> #include <sys/utsname.h>
> #include <stdlib.h>
>
> int main(int argc, char *argv[])
> {
> long n = argc > 1 ? strtol(argv[1], NULL, 0) : 0;
> struct utsname u;
>
> while (n--)
> uname(&u);
>
> return 0;
> }
>
> and then:
>
> sudo perf record uname_x_n 100000
> sudo perf report --stdio
>
> Before the changes, there are unknown symbols:
>
> # Overhead Command Shared Object Symbol
> # ........ ......... ................ ..................................
> #
> 41.91% uname_x_n [kernel.vmlinux] [k] syscall_return_via_sysret
> 19.22% uname_x_n [kernel.vmlinux] [k] copy_user_enhanced_fast_string
> 18.70% uname_x_n [unknown] [k] 0xfffffe00000e201b
> 4.09% uname_x_n libc-2.19.so [.] __GI___uname
> 3.08% uname_x_n [kernel.vmlinux] [k] do_syscall_64
> 3.02% uname_x_n [unknown] [k] 0xfffffe00000e2025
> 2.32% uname_x_n [kernel.vmlinux] [k] down_read
> 2.27% uname_x_n ld-2.19.so [.] _dl_start
> 1.97% uname_x_n [unknown] [k] 0xfffffe00000e201e
> 1.25% uname_x_n [kernel.vmlinux] [k] up_read
> 1.02% uname_x_n [unknown] [k] 0xfffffe00000e200c
> 0.99% uname_x_n [kernel.vmlinux] [k] entry_SYSCALL_64
> 0.16% uname_x_n [kernel.vmlinux] [k] flush_signal_handlers
> 0.01% perf [kernel.vmlinux] [k] native_sched_clock
> 0.00% perf [kernel.vmlinux] [k] native_write_msr
>
> After the changes there are not:
>
> # Overhead Command Shared Object Symbol
> # ........ ......... ................ ..................................
> #
> 41.91% uname_x_n [kernel.vmlinux] [k] syscall_return_via_sysret
> 24.70% uname_x_n [kernel.vmlinux] [k] entry_SYSCALL_64_trampoline
> 19.22% uname_x_n [kernel.vmlinux] [k] copy_user_enhanced_fast_string
> 4.09% uname_x_n libc-2.19.so [.] __GI___uname
> 3.08% uname_x_n [kernel.vmlinux] [k] do_syscall_64
> 2.32% uname_x_n [kernel.vmlinux] [k] down_read
> 2.27% uname_x_n ld-2.19.so [.] _dl_start
> 1.25% uname_x_n [kernel.vmlinux] [k] up_read
> 0.99% uname_x_n [kernel.vmlinux] [k] entry_SYSCALL_64
> 0.16% uname_x_n [kernel.vmlinux] [k] flush_signal_handlers
> 0.01% perf [kernel.vmlinux] [k] native_sched_clock
> 0.00% perf [kernel.vmlinux] [k] native_write_msr
So, with just the userspace patches I get, recording with the new tool,
and then report'ing with old and new tools:
Before:
[root@seventh c]# perf-4.17.rc6.ga048a0-torvalds.master report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 83 of event 'cycles:ppp'
# Event count (approx.): 86724689
#
# Overhead Command Shared Object Symbol
# ........ ......... ................ ..................................
#
35.12% uname_x_n [kernel.vmlinux] [k] syscall_return_via_sysret
20.86% uname_x_n [unknown] [k] 0xfffffe000005e01b
11.09% uname_x_n [kernel.vmlinux] [k] copy_user_enhanced_fast_string
8.58% uname_x_n [kernel.vmlinux] [k] __x64_sys_newuname
4.93% uname_x_n libc-2.26.so [.] __GI___uname
2.92% uname_x_n ld-2.26.so [.] dl_main
2.66% uname_x_n [kernel.vmlinux] [k] __x86_indirect_thunk_rax
2.46% uname_x_n [kernel.vmlinux] [k] do_syscall_64
2.18% uname_x_n [unknown] [k] 0xfffffe000005e01e
2.17% uname_x_n uname_x_n [.] main
2.14% uname_x_n [unknown] [k] 0xfffffe000005e00c
1.98% uname_x_n [unknown] [k] 0xfffffe000005e025
1.37% uname_x_n [kernel.vmlinux] [k] down_read
1.27% uname_x_n [kernel.vmlinux] [k] entry_SYSCALL_64
0.23% uname_x_n [kernel.vmlinux] [k] get_random_u64
0.01% perf [kernel.vmlinux] [k] end_repeat_nmi
0.00% perf [kernel.vmlinux] [k] native_write_msr
#
# (Tip: Use --symfs <dir> if your symbol files are in non-standard locations)
#
After:
[root@seventh c]# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 83 of event 'cycles:ppp'
# Event count (approx.): 86724689
#
# Overhead Command Shared Object Symbol
# ........ ......... ................ ..................................
#
35.12% uname_x_n [kernel.vmlinux] [k] syscall_return_via_sysret
27.18% uname_x_n [kernel.vmlinux] [k] entry_SYSCALL_64_trampoline
11.09% uname_x_n [kernel.vmlinux] [k] copy_user_enhanced_fast_string
8.58% uname_x_n [kernel.vmlinux] [k] __x64_sys_newuname
4.93% uname_x_n libc-2.26.so [.] __GI___uname
2.92% uname_x_n ld-2.26.so [.] dl_main
2.66% uname_x_n [kernel.vmlinux] [k] __x86_indirect_thunk_rax
2.46% uname_x_n [kernel.vmlinux] [k] do_syscall_64
2.17% uname_x_n uname_x_n [.] main
1.37% uname_x_n [kernel.vmlinux] [k] down_read
1.27% uname_x_n [kernel.vmlinux] [k] entry_SYSCALL_64
0.23% uname_x_n [kernel.vmlinux] [k] get_random_u64
0.01% perf [kernel.vmlinux] [k] end_repeat_nmi
0.00% perf [kernel.vmlinux] [k] native_write_msr
#
# (Tip: Generate a script for your data: perf script -g <lang>)
#
[root@seventh c]#
[root@seventh c]#
What am I missing while testing this,
- Arnaldo