Re: [BUG REPORT] perf tools: x86_64: Broken calllchain when sampling taken at 'callq' instruction

From: Ingo Molnar
Date: Tue Dec 01 2015 - 02:28:36 EST



* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Fri, Nov 27, 2015 at 09:38:11AM +0100, Ingo Molnar wrote:
> >
> > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > > On Thu, Nov 19, 2015 at 11:23:00AM +0100, Ingo Molnar wrote:
> > > > PEBS is an asynchronous hardware tracing mechanism, when batched PEBS is used it
> > > > might not even result in any interruption of execution. The 'pt_regs' does not
> > > > necessarily correspond to an interrupted, restartable context - we take the RIP
> > > > from the PEBS machinery and also use LBR and disassembly to determine the previous
> > > > instruction, before reporting it to user-space.
> > >
> > > Note that modern PEBS hardware (hsw+) does the rollback in hardware.
> > > Prior to that we indeed to it manually using the LBR.
> > >
> > > As to pt_regs, we construct a franken pt_regs based on the actual PEBS
> > > buffer overflow PMI and bits from the PEBS record (which also includes
> > > some register state). See
> > > arch/x86/kernel/cpu/perf_event_intel_ds.c:setup_pebs_sample_data().
> > >
> > > We always copy the flags, ip, bp and sp from the PEBS record into the
> > > interrupt pt_regs.
> > >
> > > And note that the PEBS record is constructed at instruction retirement,
> > > so it shows the state _after_ the instruction, with exception of the
> > > (hsw+) real_ip field.
> > >
> > > So the unwinder will have to be taught that if the IP points at a stack
> > > altering instruction (call, push, etc.) it will have to 'undo' the
> > > effects on the actual stack (I appreciate this might be 'interesting'
> > > for things like: pop, ret, etc.).
> >
> > So do we dump both the 'real' and the actual RIP, to not force tooling into having
> > to decode instructions and such?
>
> Nope, we only expose the corrected one.
>
> > (Which is pretty hard and fragile and not always
> > possible with instructions that destroy the original RIP, like JMP, etc.)
>
> Not sure what you're getting at here. We don't need the uncorrected
> instruction.

Well, we need it for stack unwinding, as you point it out:

> But the problem here is that we rewind the instruction stream, but not
> the stack. And the stack unwinder is (obviously) interested in the stack
> state.

Unwinding the stack state would fix it as well - but an equivalent solution would
be to pass along the original RIP would fix it as well: we'd have a
self-consistent pair of RIP/RSP.

Especially since unwinding the RSP is probably hard:

> I'm not sure we want (or need) to go undo the specific instruction's
> stack effect in-kernel. If the !DWARF unwinders are similarly confused
> we might need to put it in kernel (expensive *groan*). If its only the
> DWARF muck then its something that can be done in userspace just
> fine, although we might need to copy slightly more of the stack than SP
> is pointing at, such that we can undo RET/POP etc. which would have data
> beyond the head of stack.
>
> The easiest solution might be to figure out the biggest stack offset for
> any instruction and always capture that much over the head of stack.

so I think the problem here is that the RSP does not match up to the RIP. We can
either pass along the original RIP+RSP, or the fixed up one - but what we do
currently is that we pass along only half of it - which corrupts dwarf unwinding
state that doesn't tolerate such errors.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/