Re: [RFC][PATCH 11/14] function_graph: Convert ret_stack to a series of longs

From: Steven Rostedt
Date: Mon Nov 26 2018 - 11:26:10 EST


On Tue, 27 Nov 2018 01:07:55 +0900
Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:

> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -1119,7 +1119,7 @@ struct task_struct {
> > > int curr_ret_depth;
> > >
> > > /* Stack of return addresses for return function tracing: */
> > > - struct ftrace_ret_stack *ret_stack;
> > > + unsigned long *ret_stack;
> > >
> > > /* Timestamp for last schedule: */
> > > unsigned long long ftrace_timestamp;
> > > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> > > index 9b85638ecded..1389fe39f64c 100644
> > > --- a/kernel/trace/fgraph.c
> > > +++ b/kernel/trace/fgraph.c
> > > @@ -23,6 +23,17 @@
> > > #define ASSIGN_OPS_HASH(opsname, val)
> > > #endif
> > >
> > > +#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
> > > +#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
> > > +#define SHADOW_STACK_SIZE (FTRACE_RETFUNC_DEPTH * FGRAPH_RET_SIZE)
> > > +#define SHADOW_STACK_INDEX \
> > > + (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
> > > +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
> > > +
> > > +#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
> > > +#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
> > > +#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
> > > +
> > [...]
> > > @@ -514,7 +531,7 @@ void ftrace_graph_init_task(struct task_struct *t)
> > >
> > > void ftrace_graph_exit_task(struct task_struct *t)
> > > {
> > > - struct ftrace_ret_stack *ret_stack = t->ret_stack;
> > > + unsigned long *ret_stack = t->ret_stack;
> > >
> > > t->ret_stack = NULL;
> > > /* NULL must become visible to IRQs before we free it: */
> > > @@ -526,12 +543,10 @@ void ftrace_graph_exit_task(struct task_struct *t)
> > > /* Allocate a return stack for each task */
> > > static int start_graph_tracing(void)
> > > {
> > > - struct ftrace_ret_stack **ret_stack_list;
> > > + unsigned long **ret_stack_list;
> > > int ret, cpu;
> > >
> > > - ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
> > > - sizeof(struct ftrace_ret_stack *),
> > > - GFP_KERNEL);
> > > + ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
> > >
> >
> > I had dumped the fgraph size related macros to understand the patch better, I
> > got:
> > [ 0.909528] val of FGRAPH_RET_SIZE is 40
> > [ 0.910250] val of FGRAPH_RET_INDEX is 5
> > [ 0.910866] val of FGRAPH_ARRAY_SIZE is 16
> > [ 0.911488] val of FGRAPH_ARRAY_MASK is 255
> > [ 0.912134] val of FGRAPH_MAX_INDEX is 16
> > [ 0.912751] val of FGRAPH_INDEX_SHIFT is 8
> > [ 0.913382] val of FGRAPH_FRAME_SIZE is 168
> > [ 0.914033] val of FGRAPH_FRAME_INDEX is 21
> > FTRACE_RETFUNC_DEPTH is 50
> > [ 0.914686] val of SHADOW_STACK_SIZE is 8400
> >
> > I had a concern about memory overhead per-task. It seems the total memory
> > needed per task for the stack is 8400 bytes (with my configuration with
> > FUNCTION_PROFILE
> > turned off).
> >
> > Where as before it would be 32 * 40 = 1280 bytes. That looks like ~7 times
> > more than before.
>
> Hmm, this seems too big... I thought the shadow-stack size should be
> smaller than 1 page (4kB). Steve, can we give a 4k page for shadow stack
> and define FTRACE_RETFUNC_DEPTH = 4096 / FGRAPH_RET_SIZE ?

For the first pass, I decided not to worry about the size. It made the
code less complex :-)

Yes, I plan on working on making the size of the stack smaller, but
that will probably be added on patches to do so.

>
> > On my system with ~4000 threads, that becomes ~32MB which seems a bit
> > wasteful especially if there was only one or 2 function graph callbacks
> > registered and most of the callback array in the stack isn't used.

Note, all 4000 threads could be doing those trace backs, and if you are
doing full function graph tracing, it will use a lot.

> >
> > Could we make the array size configurable at compile time and start it with a
> > small number like 4 or 6?
>
> Or, we can introduce online setting :)

Yes, that can easily be added. I didn't try to make this into the
perfect solution, I wanted a solid one first, and then massage it into
something that is more efficient, both with memory consumption and
performance.

Joel and Masami, thanks for the feedback.

-- Steve