Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64

From: Joel Fernandes
Date: Tue Aug 06 2019 - 11:48:16 EST


On Fri, Aug 02, 2019 at 12:11:24PM -0400, Steven Rostedt wrote:
> On Fri, 2 Aug 2019 12:09:20 -0400
> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> > On Fri, 2 Aug 2019 11:22:59 -0400
> > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > > I think you are not explaining the issue correctly. From looking at the
> > > document, I think what you want to say is that the LR is saved *after*
> > > the data for the function. Is that correct? If so, then yes, it would
> > > cause the stack tracing algorithm to be incorrect.
> > >
> >
> > [..]
> >
> > > Can someone confirm that this is the real issue?
> >
> > Does this patch fix your issue?
> >
>
> Bah, I hit "attach" instead of "insert" (I wondered why it didn't
> insert). Here's the patch without the attachment.
>
> -- Steve
>
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index 5ab5200b2bdc..13a4832cfb00 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -13,6 +13,7 @@
> #define HAVE_FUNCTION_GRAPH_FP_TEST
> #define MCOUNT_ADDR ((unsigned long)_mcount)
> #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1
>
> #ifndef __ASSEMBLY__
> #include <linux/compat.h>
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 5d16f73898db..050c6bd9beac 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack)
> i++;
> }
>
> +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS
> + /*
> + * Most archs store the return address before storing the
> + * function's local variables. But some archs do this backwards.
> + */
> + if (x > 1) {
> + memmove(&stack_trace_index[0], &stack_trace_index[1],
> + sizeof(stack_trace_index[0]) * (x - 1));
> + x--;
> + }
> +#endif
> +
> stack_trace_nr_entries = x;
>
> if (task_stack_end_corrupted(current)) {


I am not fully understanding the fix :(. If the positions of the data and
FP/LR are swapped, then there should be a loop of some sort where the FP/LR
are copied repeatedly to undo the mess we are discussing. But in this patch
I see only one copy happening. May be I just don't understand this code well
enough. Are there any more clues for helping understand the fix?

Also, this stack trace loop (original code) is a bit hairy :) It appears
there is a call to stack_trace_save() followed by another loop that goes
through the returned entries from there and tries to generate a set of
indexes. Isn't the real issue that the entries returned by stack_trace_save()
are a out of whack? I am curious also if other users of stack_trace_save()
will suffer from the same issue.

thanks,

- Joel