[PATCH 02/43] perf thread-stack: Fix thread stack return from kernel for kernel-only case
From: Arnaldo Carvalho de Melo
Date: Mon Jul 01 2019 - 22:29:29 EST
From: Adrian Hunter <adrian.hunter@xxxxxxxxx>
Commit f08046cb3082 ("perf thread-stack: Represent jmps to the start of a
different symbol") had the side-effect of introducing more stack entries
before return from kernel space.
When user space is also traced, those entries are popped before entry to
user space, but when user space is not traced, they get stuck at the
bottom of the stack, making the stack grow progressively larger.
Fix by detecting a return-from-kernel branch type, and popping kernel
addresses from the stack then.
Note, the problem and fix affect the exported Call Graph / Tree but not
the callindent option used by "perf script --call-trace".
Example:
perf-with-kcore record example -e intel_pt//k -- ls
perf-with-kcore script example --itrace=bep -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls
~/libexec/perf-core/scripts/python/exported-sql-viewer.py example.db
Menu option: Reports -> Context-Sensitive Call Graph
Before: (showing Call Path column only)
Call Path
â perf
â ls
â 12111:12111
â setup_new_exec
â __task_pid_nr_ns
â perf_event_pid_type
â perf_event_comm_output
â perf_iterate_ctx
â perf_iterate_sb
â perf_event_comm
â __set_task_comm
â load_elf_binary
â search_binary_handler
â __do_execve_file.isra.41
â __x64_sys_execve
â do_syscall_64
â entry_SYSCALL_64_after_hwframe
â swapgs_restore_regs_and_return_to_usermode
â native_iret
â error_entry
â do_page_fault
â error_exit
â retint_user
â prepare_exit_to_usermode
â native_iret
â error_entry
â do_page_fault
â error_exit
â retint_user
â prepare_exit_to_usermode
â native_iret
â error_entry
â do_page_fault
â error_exit
â retint_user
â prepare_exit_to_usermode
â native_iret
After: (showing Call Path column only)
Call Path
â perf
â ls
â 12111:12111
â setup_new_exec
â __task_pid_nr_ns
â perf_event_pid_type
â perf_event_comm_output
â perf_iterate_ctx
â perf_iterate_sb
â perf_event_comm
â __set_task_comm
â load_elf_binary
â search_binary_handler
â __do_execve_file.isra.41
â __x64_sys_execve
â do_syscall_64
â entry_SYSCALL_64_after_hwframe
â page_fault
â entry_SYSCALL_64
â do_syscall_64
â __x64_sys_brk
â __x64_sys_access
â __x64_sys_openat
â __x64_sys_newfstat
â __x64_sys_mmap
â __x64_sys_close
â __x64_sys_read
â __x64_sys_mprotect
â __x64_sys_arch_prctl
â __x64_sys_munmap
â exit_to_usermode_loop
â __x64_sys_set_tid_address
â __x64_sys_set_robust_list
â __x64_sys_rt_sigaction
â __x64_sys_rt_sigprocmask
â __x64_sys_prlimit64
â __x64_sys_statfs
â __x64_sys_ioctl
â __x64_sys_getdents64
â __x64_sys_write
â __x64_sys_exit_group
Committer notes:
The first arg to the perf-with-kcore needs to be the same for the
'record' and 'script' lines, otherwise we'll record the perf.data file
and kcore_dir/ files in one directory ('example') to then try to use it
from the 'bep' directory, fix the instructions above it so that both use
'example'.
Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>
Tested-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Fixes: f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol")
Link: http://lkml.kernel.org/r/20190619064429.14940-2-adrian.hunter@xxxxxxxxx
Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
---
tools/perf/util/thread-stack.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index c485186a8b6d..4c826a2e08d8 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -628,6 +628,23 @@ static int thread_stack__bottom(struct thread_stack *ts,
true, false);
}
+static int thread_stack__pop_ks(struct thread *thread, struct thread_stack *ts,
+ struct perf_sample *sample, u64 ref)
+{
+ u64 tm = sample->time;
+ int err;
+
+ /* Return to userspace, so pop all kernel addresses */
+ while (thread_stack__in_kernel(ts)) {
+ err = thread_stack__call_return(thread, ts, --ts->cnt,
+ tm, ref, true);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
static int thread_stack__no_call_return(struct thread *thread,
struct thread_stack *ts,
struct perf_sample *sample,
@@ -910,7 +927,18 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
ts->rstate = X86_RETPOLINE_DETECTED;
} else if (sample->flags & PERF_IP_FLAG_RETURN) {
- if (!sample->ip || !sample->addr)
+ if (!sample->addr) {
+ u32 return_from_kernel = PERF_IP_FLAG_SYSCALLRET |
+ PERF_IP_FLAG_INTERRUPT;
+
+ if (!(sample->flags & return_from_kernel))
+ return 0;
+
+ /* Pop kernel stack */
+ return thread_stack__pop_ks(thread, ts, sample, ref);
+ }
+
+ if (!sample->ip)
return 0;
/* x86 retpoline 'return' doesn't match the stack */
--
2.20.1