Re: Getting empty callchain from perf_callchain_kernel()

From: Peter Zijlstra
Date: Wed May 22 2019 - 10:05:27 EST


On Mon, May 20, 2019 at 02:06:54AM +0800, Kairui Song wrote:
> On Fri, May 17, 2019 at 5:10 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
> > > Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
> > > some other bfp functions) is now broken, or, strating an unwind
> > > directly inside a bpf program will end up strangely. It have following
> > > kernel message:
> >
> > Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
> > follow.
>
> bpf_get_stackid_tp will just use the regs passed to it from the trace
> point. And then it will eventually call perf_get_callchain to get the
> call chain.
> With a tracepoint we have the fake regs, so unwinder will start from
> where it is called, and use the fake regs as the indicator of the
> target frame it want, and keep unwinding until reached the actually
> callsite.
>
> But if the stack trace is started withing a bpf func call then it's broken...

I'm confused.. how is this broken? Surely we should eventually find the
original stack frame and be good again, right?

> If the unwinder could trace back through the bpf func call then there
> will be no such problem.

Why couldn't it trace back through the bpf stuff? And how can we fix
that?