Re: [PATCH 05/19] x86/dumpstack: fix function graph tracing stack dump reliability issues

From: Josh Poimboeuf
Date: Fri Jul 29 2016 - 20:51:19 EST

On Fri, Jul 29, 2016 at 06:55:21PM -0400, Steven Rostedt wrote:
> On Thu, 21 Jul 2016 16:21:42 -0500
> Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > When function graph tracing is enabled for a function, its return
> > address on the stack is replaced with the address of an ftrace handler
> > (return_to_handler). When dumping the stack of a task with graph
> > tracing enabled, there are some subtle bugs:
> >
> > - The fake return_to_handler() address can be reported as reliable.
> > Instead, because it's not the real caller, it should be considered
> > unreliable.
> I have some mixed emotions about this. First, it's not "fake", the
> function *is* going to return to it, but you are right, that's not the
> function that was called.
> I do like to see these in the trace, because sometimes these functions
> are an issue. But I guess I can live with them being marked as
> "unreliable".

Yeah, this is a little iffy. Calling return_to_handler() "fake" isn't
100% accurate. It wasn't involved in the *call* path, but it will be
involved in the *return* path.

My thinking was that when either saving or dumping the stack, you
normally only care about what led up to that point (the call path),
rather than what will happen in the future (the return path).

That's especially true in the non-oops stack trace case, which isn't
used for debugging. For example, reporting return_to_handler() in the
reliable trace of a perf profiling operation would just be confusing.

And in the oops case, where debugging is important, I think "unreliable"
is more appropriate because it serves as a hint that graph tracing was
involved, instead of trying to assert that it was the real caller, which
could create some confusion.

> > - In print_context_stack(), the real caller's return address is always
> > reported as reliable, even if the return_to_handler() address wasn't
> > referred to by a frame pointer.
> Hmm, if CONFIG_FRAME_POINTER is enabled, perhaps we should only call
> the look up of ftrace_graph_ret_addr(). Hmm, playing with this, yeah,
> we definitely should. It can report the wrong reliability.
> Without doing the reliability check we can get out of sync with the
> ret_stack. I have a patch to go on top of this patch below (hmm, it may
> not apply fully, because I was using a different base tree than you).

Yeah, your patch makes it better. Thanks!

BTW, it would be really nice if ftrace_graph_ret_addr() were idempotent
so we could get the "real" return address without having to pass in a
state variable.

For example we could add an "unsigned long *retp" pointer to
ftrace_ret_stack, which points to the return address on the stack. Then
we could get rid of the index state variable in ftrace_graph_ret_addr,
and also then there would never be a chance of the stack dump getting
out of sync with the ret_stack.

What do you think?