Re: [PATCH net-next 1/8] perf: optimize perf_fetch_caller_regs

From: Steven Rostedt
Date: Fri Apr 08 2016 - 18:12:26 EST


On Tue, 5 Apr 2016 14:06:26 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote:
> > avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints.
> > It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call
> > with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc
>
> Its not actually allocated; but because its a static uninitialized
> variable we get .bss like behaviour and the initial value is copied to
> all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC.
>
> > and
> > subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs,
> > so we can safely drop memset from all of the above cases and
>
> Indeed.
>
> > move it into
> > perf_ftrace_function_call that calls it with stack allocated pt_regs.
>
> Hmm, is there a reason that's still on-stack instead of using the
> per-cpu thing, Steve?

Well, what do you do when you are tracing with regs in an interrupt
that already set the per cpu regs field? We could create our own
per-cpu one as well, but then that would require checking which level
we are in, as we can have one for normal context, one for softirq
context, one for irq context and one for nmi context.

-- Steve



>
> > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
>
> In any case,
>
> Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>