[for-next][PATCH 11/12] tracing: Skip more functions when doing stack tracing of events

From: Steven Rostedt
Date: Sun Jul 03 2016 - 16:35:00 EST


From: "Steven Rostedt (Red Hat)" <rostedt@xxxxxxxxxxx>

# echo 1 > options/stacktrace
# echo 1 > events/sched/sched_switch/enable
# cat trace
<idle>-0 [002] d..2 1982.525169: <stack trace>
=> save_stack_trace
=> __ftrace_trace_stack
=> trace_buffer_unlock_commit_regs
=> event_trigger_unlock_commit
=> trace_event_buffer_commit
=> trace_event_raw_event_sched_switch
=> __schedule
=> schedule
=> schedule_preempt_disabled
=> cpu_startup_entry
=> start_secondary

The above shows that we are seeing 6 functions before ever making it to the
caller of the sched_switch event.

# echo stacktrace > events/sched/sched_switch/trigger
# cat trace
<idle>-0 [002] d..3 2146.335208: <stack trace>
=> trace_event_buffer_commit
=> trace_event_raw_event_sched_switch
=> __schedule
=> schedule
=> schedule_preempt_disabled
=> cpu_startup_entry
=> start_secondary

The stacktrace trigger isn't as bad, because it adds its own skip to the
stacktracing, but still has two events extra.

One issue is that if the stacktrace passes its own "regs" then there should
be no addition to the skip, as the regs will not include the functions being
called. This was an issue that was fixed by commit 7717c6be6999 ("tracing:
Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()" as adding
the skip number for kprobes made the probes not have any stack at all.

But since this is only an issue when regs is being used, a skip should be
added if regs is NULL. Now we have:

# echo 1 > options/stacktrace
# echo 1 > events/sched/sched_switch/enable
# cat trace
<idle>-0 [000] d..2 1297.676333: <stack trace>
=> __schedule
=> schedule
=> schedule_preempt_disabled
=> cpu_startup_entry
=> rest_init
=> start_kernel
=> x86_64_start_reservations
=> x86_64_start_kernel

# echo stacktrace > events/sched/sched_switch/trigger
# cat trace
<idle>-0 [002] d..3 1370.759745: <stack trace>
=> __schedule
=> schedule
=> schedule_preempt_disabled
=> cpu_startup_entry
=> start_secondary

And kprobes are not touched.

Reported-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
---
kernel/trace/trace.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 45e6747589c6..3d9f31b576f3 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2118,7 +2118,17 @@ void trace_buffer_unlock_commit_regs(struct trace_array *tr,
{
__buffer_unlock_commit(buffer, event);

- ftrace_trace_stack(tr, buffer, flags, 0, pc, regs);
+ /*
+ * If regs is not set, then skip the following callers:
+ * trace_buffer_unlock_commit_regs
+ * event_trigger_unlock_commit
+ * trace_event_buffer_commit
+ * trace_event_raw_event_sched_switch
+ * Note, we can still get here via blktrace, wakeup tracer
+ * and mmiotrace, but that's ok if they lose a function or
+ * two. They are that meaningful.
+ */
+ ftrace_trace_stack(tr, buffer, flags, regs ? 0 : 4, pc, regs);
ftrace_trace_userstack(buffer, flags, pc);
}

@@ -2169,6 +2179,13 @@ static void __ftrace_trace_stack(struct ring_buffer *buffer,
trace.skip = skip;

/*
+ * Add two, for this function and the call to save_stack_trace()
+ * If regs is set, then these functions will not be in the way.
+ */
+ if (!regs)
+ trace.skip += 2;
+
+ /*
* Since events can happen in NMIs there's no safe way to
* use the per cpu ftrace_stacks. We reserve it and if an interrupt
* or NMI comes in, it will just have to use the default
--
2.8.1